Part 2: Embeddings and Vector Storage with LangChain#
Welcome to the second article in our 4-part series on LangChain Document Intelligence.
In Part 1 , we explored how to load and split documents into smaller, manageable chunks. Now that we have those chunks, the next step is to convert them into embeddings — numeric vectors that encode meaning — and store them in a database built for fast retrieval.
What Are Embeddings?#
Embeddings are how we give AI models a way to understand the meaning behind our words. Instead of using plain text, we convert each chunk into a vector of numbers that captures its semantic meaning.
Two chunks with similar ideas — even if the words are different — will produce similar embeddings. This allows us to find relevant information based on meaning, not just matching keywords.
Real-World Analogy#
Imagine you ask:
“What AWS services has Khalid used?”
Your resume might say:
- “Built pipelines using Lambda, S3, and DynamoDB.”
- “Designed cloud apps with AWS services.”
If the system only matched keywords, it might miss the second example. But embeddings help connect both answers by understanding the context — even if the wording is different.
Overview of the Process#
Here’s what we’ll cover:
- Generate embeddings from document chunks using OpenAI
- Store those embeddings in a vector database (FAISS)
- Perform similarity search to find relevant chunks for a given question
- Format queries with templates and send them to a chat model
Step 1: Load and Split the Document#
This part should look familiar from Part 1:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = PyPDFLoader("data/the_adventure_of_the_blue_carbuncle.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_docs = splitter.split_documents(docs)
Step 2: Initialize the OpenAI Embedding Model#
from langchain_openai import OpenAIEmbeddings
embedding_model = OpenAIEmbeddings()
This uses your existing OpenAI API key and gives you access to models like text-embedding-3-small
.
You can customize the model like this:
embedding_model = OpenAIEmbeddings(
model="text-embedding-3-small",
dimensions=1536,
chunk_size=1000
)
Step 3: Generate Embeddings#
text = split_docs[0].page_content
embedding_vector = embedding_model.embed_query(text)
print(embedding_vector[:5])
This returns a list of numbers like:
[0.0105, -0.0001, 0.0052, -0.0246, -0.0126]
That’s your document chunk in vector form — ready for storage and comparison.
Step 4: Store Embeddings in a Vector Database (FAISS)#
from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_documents(split_docs, embedding_model)
FAISS organizes all your vectors into a searchable index. Behind the scenes, it:
- Converts all text chunks into embeddings
- Builds a structure to retrieve the closest matches fast
- Keeps the original chunk and metadata tied to the vector
Step 5: Search for Relevant Chunks (Similarity Search)#
Let’s say a user asks:
query = "What was the main clue?"
results = vectorstore.similarity_search(query, k=3)
for doc in results:
print(doc.page_content[:300], "\n")
FAISS will:
- Convert the query into an embedding
- Search for the most similar document vectors
- Return the top
k
matching chunks
Step 6: Add Context with Prompt Templates#
Now we wrap the retrieved context and the question into a formatted prompt:
from langchain.prompts import ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_template(
"Answer the following question based on the provided context.\n\n"
"Context:\n{context}\n\n"
"Question: {question}"
)
context = "\n\n".join([doc.page_content for doc in results])
query = "What was the main clue?"
prompt = prompt_template.format(context=context, question=query)
Step 7: Ask the Chat Model#
from langchain_openai import ChatOpenAI
chat = ChatOpenAI()
response = chat.invoke(prompt)
print(f"Q: {query}")
print(f"A: {response.content}")
Now you’re using retrieved context to answer a user’s question — the foundation of Retrieval-Augmented Generation (RAG).
What’s Next?#
You’ve now:
- Converted documents into semantic vectors
- Stored them in a vector DB
- Queried those vectors based on meaning
- Connected the result to a chat model
In Part 3, we’ll explore different vector databases (like Chroma, pgvector, Redis, and Pinecone) and compare them on performance, scalability, and production-readiness.