LangChain Document Intelligence - This article is part of a series.

Part : Building a Document Processor for RAG Chatbots (Part 3)

Part : This Article

Part : Document Processing and Retrieval with LangChain in Python

Part 2: Embeddings and Vector Storage with LangChain
#

Welcome to the second article in our 4-part series on LangChain Document Intelligence.

In Part 1 , we explored how to load and split documents into smaller, manageable chunks. Now that we have those chunks, the next step is to convert them into embeddings — numeric vectors that encode meaning — and store them in a database built for fast retrieval.

What Are Embeddings?
#

Embeddings are how we give AI models a way to understand the meaning behind our words. Instead of using plain text, we convert each chunk into a vector of numbers that captures its semantic meaning.

Two chunks with similar ideas — even if the words are different — will produce similar embeddings. This allows us to find relevant information based on meaning, not just matching keywords.

Real-World Analogy
#

Imagine you ask:

“What AWS services has Khalid used?”

Your resume might say:

“Built pipelines using Lambda, S3, and DynamoDB.”
“Designed cloud apps with AWS services.”

If the system only matched keywords, it might miss the second example. But embeddings help connect both answers by understanding the context — even if the wording is different.

Overview of the Process
#

Here’s what we’ll cover:

Generate embeddings from document chunks using OpenAI
Store those embeddings in a vector database (FAISS)
Perform similarity search to find relevant chunks for a given question
Format queries with templates and send them to a chat model

Step 1: Load and Split the Document
#

This part should look familiar from Part 1:

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("data/the_adventure_of_the_blue_carbuncle.pdf")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_docs = splitter.split_documents(docs)

Step 2: Initialize the OpenAI Embedding Model
#

from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()

This uses your existing OpenAI API key and gives you access to models like text-embedding-3-small.

You can customize the model like this:

embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-small",
    dimensions=1536,
    chunk_size=1000
)

Step 3: Generate Embeddings
#

text = split_docs[0].page_content
embedding_vector = embedding_model.embed_query(text)
print(embedding_vector[:5])

This returns a list of numbers like:

[0.0105, -0.0001, 0.0052, -0.0246, -0.0126]

That’s your document chunk in vector form — ready for storage and comparison.

Step 4: Store Embeddings in a Vector Database (FAISS)
#

from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(split_docs, embedding_model)

FAISS organizes all your vectors into a searchable index. Behind the scenes, it:

Converts all text chunks into embeddings
Builds a structure to retrieve the closest matches fast
Keeps the original chunk and metadata tied to the vector

Step 5: Search for Relevant Chunks (Similarity Search)
#

Let’s say a user asks:

query = "What was the main clue?"
results = vectorstore.similarity_search(query, k=3)
for doc in results:
    print(doc.page_content[:300], "\n")

FAISS will:

Convert the query into an embedding
Search for the most similar document vectors
Return the top k matching chunks

Step 6: Add Context with Prompt Templates
#

Now we wrap the retrieved context and the question into a formatted prompt:

from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(
    "Answer the following question based on the provided context.\n\n"
    "Context:\n{context}\n\n"
    "Question: {question}"
)
context = "\n\n".join([doc.page_content for doc in results])
query = "What was the main clue?"
prompt = prompt_template.format(context=context, question=query)

Step 7: Ask the Chat Model
#

from langchain_openai import ChatOpenAI

chat = ChatOpenAI()
response = chat.invoke(prompt)

print(f"Q: {query}")
print(f"A: {response.content}")

Now you’re using retrieved context to answer a user’s question — the foundation of Retrieval-Augmented Generation (RAG).

What’s Next?
#

You’ve now:

Converted documents into semantic vectors
Stored them in a vector DB
Queried those vectors based on meaning
Connected the result to a chat model

In Part 3, we’ll explore different vector databases (like Chroma, pgvector, Redis, and Pinecone) and compare them on performance, scalability, and production-readiness.

LangChain Document Intelligence - This article is part of a series.

Part : Building a Document Processor for RAG Chatbots (Part 3)

Part : This Article

Part : Document Processing and Retrieval with LangChain in Python

Part 2: Embeddings and Vector Storage with LangChain#

What Are Embeddings?#

Real-World Analogy#

Overview of the Process#

Step 1: Load and Split the Document#

Step 2: Initialize the OpenAI Embedding Model#

Step 3: Generate Embeddings#

Step 4: Store Embeddings in a Vector Database (FAISS)#

Step 5: Search for Relevant Chunks (Similarity Search)#

Step 6: Add Context with Prompt Templates#

Step 7: Ask the Chat Model#

What’s Next?#