Why Code Knowledge Graphs Are Taking Over in 2025
If you have been following GitHub trending repositories lately, you probably noticed something: code knowledge graphs are having a moment. Projects like Understand-Anything (27,000+ stars) and codegraph (22,000+ stars) are dominating the charts. They turn raw source code into interactive, queryable knowledge graphs that help developers understand complex codebases faster than ever.
But you do not need a fancy startup to build one. In this tutorial, you will learn how to build a basic code knowledge graph from scratch using Python. By the end, you will have a working tool that maps relationships between classes, functions, and imports in any Python project.
What Is a Code Knowledge Graph?
A code knowledge graph is a directed graph where:
- Nodes represent code entities — classes, functions, modules, variables.
- Edges represent relationships — calls, imports, inherits, references.
Think of it as a map of your codebase. Instead of reading files one by one, you can see how everything connects at a glance. This is especially powerful for onboarding to new projects, debugging cross-module issues, or understanding architecture at scale.
Prerequisites
- Python 3.9 or higher installed
- Basic understanding of Python syntax and imports
- A terminal or code editor you are comfortable with
Install the required packages:
pip install networkx pygraphviz astor
We use networkx for graph operations, pygraphviz for visualization, and the built-in ast module for parsing Python source code.
Step 1: Parse Python Files with the AST Module
Python ships with a powerful ast (Abstract Syntax Tree) module. It can parse any Python file into a tree structure you can programmatically walk.
import ast
import os
def get_python_files(directory):
"""Recursively find all .py files in a directory."""
py_files = []
for root, _, files in os.walk(directory):
for f in files:
if f.endswith(".py") and not f.startswith("__"):
py_files.append(os.path.join(root, f))
return py_files
def parse_file(filepath):
"""Parse a Python file and return its AST."""
with open(filepath, "r") as f:
source = f.read()
try:
return ast.parse(source)
except SyntaxError:
return None
The parse_file function reads a Python file and converts it into an AST. If the file has syntax errors, it returns None so we can skip it gracefully.
Step 2: Extract Code Entities from the AST
Now we walk the AST and extract the important pieces: class names, function names, and import statements.
class CodeVisitor(ast.NodeVisitor):
def __init__(self, filepath):
self.filepath = filepath
self.module_name = os.path.splitext(
os.path.basename(filepath)
)[0]
self.classes = []
self.functions = []
self.imports = []
def visit_ClassDef(self, node):
self.classes.append(node.name)
self.generic_visit(node)
def visit_FunctionDef(self, node):
self.functions.append(node.name)
self.generic_visit(node)
def visit_Import(self, node):
for alias in node.names:
self.imports.append(alias.name)
self.generic_visit(node)
def visit_ImportFrom(self, node):
if node.module:
self.imports.append(node.module)
self.generic_visit(node)
def extract_entities(filepath):
"""Extract classes, functions, and imports from a Python file."""
tree = parse_file(filepath)
if not tree:
return None
visitor = CodeVisitor(filepath)
visitor.visit(tree)
return visitor
This visitor pattern walks through every node in the AST. When it encounters a class definition, function definition, or import statement, it records it. The generic_visit call ensures we keep walking deeper into the tree.
Step 3: Build the Knowledge Graph
With entities extracted, we now build the graph using networkx. Each entity becomes a node, and relationships become edges.
import networkx as nx
def build_graph(directory):
"""Build a code knowledge graph from a directory of Python files."""
G = nx.DiGraph()
files = get_python_files(directory)
for filepath in files:
entities = extract_entities(filepath)
if not entities:
continue
# Add module node
G.add_node(
entities.module_name,
type="module",
file=filepath
)
# Add class nodes and connect to module
for cls in entities.classes:
node_id = f"{entities.module_name}.{cls}"
G.add_node(node_id, type="class", file=filepath)
G.add_edge(entities.module_name, node_id, relation="contains")
# Add function nodes and connect to module
for func in entities.functions:
node_id = f"{entities.module_name}.{func}"
G.add_node(node_id, type="function", file=filepath)
G.add_edge(entities.module_name, node_id, relation="contains")
# Add import edges
for imp in entities.imports:
G.add_node(imp, type="import", file=None)
G.add_edge(
entities.module_name,
imp,
relation="imports"
)
return G
Here is what happens:
- Each Python module becomes a node with
type="module". - Classes and functions become child nodes, connected with
"contains"edges. - Import statements become edges with
"imports"relationships.
Step 4: Analyze the Graph
Once the graph is built, you can run various analyses. Here are some useful queries:
def analyze_graph(G):
"""Print useful statistics about the code knowledge graph."""
print(f"Total nodes: {G.number_of_nodes()}")
print(f"Total edges: {G.number_of_edges()}")
print()
# Count by type
types = nx.get_node_attributes(G, "type")
for t in ["module", "class", "function", "import"]:
count = sum(1 for v in types.values() if v == t)
print(f" {t.capitalize()}s: {count}")
print()
# Find most connected modules (highest out-degree)
print("Most connected modules:")
module_nodes = [n for n, d in G.nodes(data=True)
if d.get("type") == "module"]
degrees = sorted(
[(n, G.out_degree(n)) for n in module_nodes],
key=lambda x: x[1], reverse=True
)
for name, degree in degrees[:5]:
print(f" {name}: {degree} connections")
# Usage
G = build_graph("./my_project")
analyze_graph(G)
Step 5: Visualize the Graph
A knowledge graph is most useful when you can see it. Here is how to export it as an image:
import matplotlib.pyplot as plt
def visualize_graph(G, output="code_graph.png"):
"""Generate a visual representation of the code graph."""
plt.figure(figsize=(16, 12))
# Color nodes by type
types = nx.get_node_attributes(G, "type")
color_map = {
"module": "#3498db",
"class": "#2ecc71",
"function": "#e74c3c",
"import": "#95a5a6"
}
colors = [color_map.get(types.get(n), "#999999")
for n in G.nodes()]
# Layout and draw
pos = nx.spring_layout(G, k=0.5, iterations=50)
nx.draw(
G, pos,
node_color=colors,
node_size=800,
with_labels=True,
font_size=8,
font_weight="bold",
arrowsize=15
)
plt.title("Code Knowledge Graph", fontsize=14)
plt.savefig(output, dpi=150, bbox_inches="tight")
print(f"Graph saved to {output}")
visualize_graph(G)
Comparison: What Each Visualization Tool Offers
| Tool | Best For | Difficulty | Output |
|---|---|---|---|
| Matplotlib + NetworkX | Quick static images | Easy | PNG/PDF |
| PyGraphviz | Professional layouts | Medium | DOT/PNG/SVG |
| Pyvis | Interactive HTML | Easy | HTML |
| Gephi (external) | Large-scale analysis | Hard | Interactive GUI |
Next Steps and Extensions
This is a foundation. Here are ways to make it more powerful:
- Cross-file analysis: Track function calls between modules by parsing
ast.Callnodes and matching them to definitions in other files. - Inheritance tracking: Use
ast.ClassDef.basesto map class hierarchies across the entire project. - Dependency depth: Calculate the longest import chain to identify overly coupled modules.
- Query interface: Add a simple REPL or web UI to ask questions like "Which modules import requests?" or "What calls this function?"
- Integration with AI: Feed the graph structure to an LLM to generate architectural documentation or suggest refactoring opportunities.
Conclusion
Code knowledge graphs are not just a trending topic on GitHub — they are a genuinely useful way to understand and navigate complex codebases. With Python built-in ast module and networkx, you can build a working prototype in under 100 lines of code.
Whether you are onboarding to a new codebase, auditing a legacy system, or just curious about how your project is structured, a knowledge graph gives you answers that grep and text search cannot.
Start small. Pick a Python project, run the script, and see what connections you discover.