0PricingLogin
LangChain / RAG / Vector DBs · Lesson

Loading Diverse Document Types

Explore LangChain's document loaders for PDFs, web pages, databases, and more, extracting content for your RAG system.

Document Loaders: The First Step

Welcome to the world of LangChain! Before an LLM can answer questions about your data, it needs to 'read' it. This is where Document Loaders come in.

Document loaders are tools that help LangChain ingest data from various sources like text files, PDFs, web pages, or databases. They convert raw data into a standardized format called Document objects.

Loading Simple Text Files

The simplest way to load data is from a plain text file. LangChain's TextLoader is perfect for this. It reads the content and wraps it into a Document object.

Try running this example to see how it works:

import os
from langchain_community.document_loaders import TextLoader

# Create a dummy text file for demonstration
file_content = "Hello CoddyKit learners!\nThis is a sample text file."
file_path = "sample.txt"
with open(file_path, "w") as f:
    f.write(file_content)

# Initialize the TextLoader with the file path
loader = TextLoader(file_path)

# Load the documents
documents = loader.load()

# Print the content of the first document
if documents:
    print(f"Loaded content:\n{documents[0].page_content}")
    print(f"Source: {documents[0].metadata.get('source')}")

# Clean up the dummy file
os.remove(file_path)

All lessons in this course

  1. Loading Diverse Document Types
  2. Understanding Text Splitting Strategies
  3. Customizing Document Splitting
  4. Handling Document Metadata and Filtering
← Back to LangChain / RAG / Vector DBs