Why Code Knowledge Graphs Are Taking Over in 2025

If you have been following GitHub trending repositories lately, you probably noticed something: code knowledge graphs are having a moment. Projects like Understand-Anything (27,000+ stars) and codegraph (22,000+ stars) are dominating the charts. They turn raw source code into interactive, queryable knowledge graphs that help developers understand complex codebases faster than ever.

But you do not need a fancy startup to build one. In this tutorial, you will learn how to build a basic code knowledge graph from scratch using Python. By the end, you will have a working tool that maps relationships between classes, functions, and imports in any Python project.

What Is a Code Knowledge Graph?

A code knowledge graph is a directed graph where:

  • Nodes represent code entities — classes, functions, modules, variables.
  • Edges represent relationships — calls, imports, inherits, references.

Think of it as a map of your codebase. Instead of reading files one by one, you can see how everything connects at a glance. This is especially powerful for onboarding to new projects, debugging cross-module issues, or understanding architecture at scale.

Prerequisites

  • Python 3.9 or higher installed
  • Basic understanding of Python syntax and imports
  • A terminal or code editor you are comfortable with

Install the required packages:

pip install networkx pygraphviz astor

We use networkx for graph operations, pygraphviz for visualization, and the built-in ast module for parsing Python source code.

Step 1: Parse Python Files with the AST Module

Python ships with a powerful ast (Abstract Syntax Tree) module. It can parse any Python file into a tree structure you can programmatically walk.

import ast
import os

def get_python_files(directory):
    """Recursively find all .py files in a directory."""
    py_files = []
    for root, _, files in os.walk(directory):
        for f in files:
            if f.endswith(".py") and not f.startswith("__"):
                py_files.append(os.path.join(root, f))
    return py_files

def parse_file(filepath):
    """Parse a Python file and return its AST."""
    with open(filepath, "r") as f:
        source = f.read()
    try:
        return ast.parse(source)
    except SyntaxError:
        return None

The parse_file function reads a Python file and converts it into an AST. If the file has syntax errors, it returns None so we can skip it gracefully.

Step 2: Extract Code Entities from the AST

Now we walk the AST and extract the important pieces: class names, function names, and import statements.

class CodeVisitor(ast.NodeVisitor):
    def __init__(self, filepath):
        self.filepath = filepath
        self.module_name = os.path.splitext(
            os.path.basename(filepath)
        )[0]
        self.classes = []
        self.functions = []
        self.imports = []

    def visit_ClassDef(self, node):
        self.classes.append(node.name)
        self.generic_visit(node)

    def visit_FunctionDef(self, node):
        self.functions.append(node.name)
        self.generic_visit(node)

    def visit_Import(self, node):
        for alias in node.names:
            self.imports.append(alias.name)
        self.generic_visit(node)

    def visit_ImportFrom(self, node):
        if node.module:
            self.imports.append(node.module)
        self.generic_visit(node)

def extract_entities(filepath):
    """Extract classes, functions, and imports from a Python file."""
    tree = parse_file(filepath)
    if not tree:
        return None
    visitor = CodeVisitor(filepath)
    visitor.visit(tree)
    return visitor

This visitor pattern walks through every node in the AST. When it encounters a class definition, function definition, or import statement, it records it. The generic_visit call ensures we keep walking deeper into the tree.

Step 3: Build the Knowledge Graph

With entities extracted, we now build the graph using networkx. Each entity becomes a node, and relationships become edges.

import networkx as nx

def build_graph(directory):
    """Build a code knowledge graph from a directory of Python files."""
    G = nx.DiGraph()
    files = get_python_files(directory)

    for filepath in files:
        entities = extract_entities(filepath)
        if not entities:
            continue

        # Add module node
        G.add_node(
            entities.module_name,
            type="module",
            file=filepath
        )

        # Add class nodes and connect to module
        for cls in entities.classes:
            node_id = f"{entities.module_name}.{cls}"
            G.add_node(node_id, type="class", file=filepath)
            G.add_edge(entities.module_name, node_id, relation="contains")

        # Add function nodes and connect to module
        for func in entities.functions:
            node_id = f"{entities.module_name}.{func}"
            G.add_node(node_id, type="function", file=filepath)
            G.add_edge(entities.module_name, node_id, relation="contains")

        # Add import edges
        for imp in entities.imports:
            G.add_node(imp, type="import", file=None)
            G.add_edge(
                entities.module_name,
                imp,
                relation="imports"
            )

    return G

Here is what happens:

  1. Each Python module becomes a node with type="module".
  2. Classes and functions become child nodes, connected with "contains" edges.
  3. Import statements become edges with "imports" relationships.

Step 4: Analyze the Graph

Once the graph is built, you can run various analyses. Here are some useful queries:

def analyze_graph(G):
    """Print useful statistics about the code knowledge graph."""
    print(f"Total nodes: {G.number_of_nodes()}")
    print(f"Total edges: {G.number_of_edges()}")
    print()

    # Count by type
    types = nx.get_node_attributes(G, "type")
    for t in ["module", "class", "function", "import"]:
        count = sum(1 for v in types.values() if v == t)
        print(f"  {t.capitalize()}s: {count}")
    print()

    # Find most connected modules (highest out-degree)
    print("Most connected modules:")
    module_nodes = [n for n, d in G.nodes(data=True)
                    if d.get("type") == "module"]
    degrees = sorted(
        [(n, G.out_degree(n)) for n in module_nodes],
        key=lambda x: x[1], reverse=True
    )
    for name, degree in degrees[:5]:
        print(f"  {name}: {degree} connections")

# Usage
G = build_graph("./my_project")
analyze_graph(G)

Step 5: Visualize the Graph

A knowledge graph is most useful when you can see it. Here is how to export it as an image:

import matplotlib.pyplot as plt

def visualize_graph(G, output="code_graph.png"):
    """Generate a visual representation of the code graph."""
    plt.figure(figsize=(16, 12))

    # Color nodes by type
    types = nx.get_node_attributes(G, "type")
    color_map = {
        "module": "#3498db",
        "class": "#2ecc71",
        "function": "#e74c3c",
        "import": "#95a5a6"
    }
    colors = [color_map.get(types.get(n), "#999999")
              for n in G.nodes()]

    # Layout and draw
    pos = nx.spring_layout(G, k=0.5, iterations=50)
    nx.draw(
        G, pos,
        node_color=colors,
        node_size=800,
        with_labels=True,
        font_size=8,
        font_weight="bold",
        arrowsize=15
    )

    plt.title("Code Knowledge Graph", fontsize=14)
    plt.savefig(output, dpi=150, bbox_inches="tight")
    print(f"Graph saved to {output}")

visualize_graph(G)

Comparison: What Each Visualization Tool Offers

ToolBest ForDifficultyOutput
Matplotlib + NetworkXQuick static imagesEasyPNG/PDF
PyGraphvizProfessional layoutsMediumDOT/PNG/SVG
PyvisInteractive HTMLEasyHTML
Gephi (external)Large-scale analysisHardInteractive GUI

Next Steps and Extensions

This is a foundation. Here are ways to make it more powerful:

  • Cross-file analysis: Track function calls between modules by parsing ast.Call nodes and matching them to definitions in other files.
  • Inheritance tracking: Use ast.ClassDef.bases to map class hierarchies across the entire project.
  • Dependency depth: Calculate the longest import chain to identify overly coupled modules.
  • Query interface: Add a simple REPL or web UI to ask questions like "Which modules import requests?" or "What calls this function?"
  • Integration with AI: Feed the graph structure to an LLM to generate architectural documentation or suggest refactoring opportunities.

Conclusion

Code knowledge graphs are not just a trending topic on GitHub — they are a genuinely useful way to understand and navigate complex codebases. With Python built-in ast module and networkx, you can build a working prototype in under 100 lines of code.

Whether you are onboarding to a new codebase, auditing a legacy system, or just curious about how your project is structured, a knowledge graph gives you answers that grep and text search cannot.

Start small. Pick a Python project, run the script, and see what connections you discover.