AI Engineering Academy · Lesson

Indexing: Embedding and Storing Chunks

Embed each chunk using the OpenAI embeddings API and upsert the resulting vectors with metadata into a vector store, building a searchable index of your documents.

The Indexing Stage: Overview

After loading and chunking your documents, you reach the indexing stage: converting text chunks into vector embeddings and storing them in a searchable vector database. This is the final offline step before queries can be answered. The quality of your embeddings and the efficiency of your storage and indexing strategy directly determine how fast and accurate your RAG system will be at query time.

Generating Embeddings via OpenAI API

The most common approach is to call OpenAI's embeddings API with your chunk text. The text-embedding-3-small model produces 1536-dimensional vectors and costs $0.02 per million tokens — extremely cheap for most workloads. Send multiple texts in a single API call (up to 2048 inputs) to maximize throughput. The response contains one embedding vector per input text in the same order.

from openai import OpenAI

client = OpenAI()

def embed_batch(texts, model='text-embedding-3-small'):
    response = client.embeddings.create(
        model=model,
        input=texts  # up to 2048 texts per call
    )
    return [item.embedding for item in response.data]

# Embed one batch of 100 chunk texts
texts = [chunk['text'] for chunk in chunks[:100]]
vectors = embed_batch(texts)
print(f'Embedding dimension: {len(vectors[0])}')
print(f'Embedded {len(vectors)} chunks')

All lessons in this course

Document Loading and Text Extraction
Chunking Strategies: Fixed vs Sentence vs Recursive
Indexing: Embedding and Storing Chunks
Query, Retrieve, and Generate

← Back to AI Engineering Academy