Skip to main content

Embedding and Vector Storage with LangChain (Part 2)

·618 words·3 mins
Khalid Rizvi
Author
Khalid Rizvi
Where Legacy Meets GenAI
LangChain Document Intelligence - This article is part of a series.
Part : This Article

Part 2: Embeddings and Vector Storage with LangChain
#

Welcome to the second article in our 4-part series on LangChain Document Intelligence.

In Part 1 , we explored how to load and split documents into smaller, manageable chunks. Now that we have those chunks, the next step is to convert them into embeddings — numeric vectors that encode meaning — and store them in a database built for fast retrieval.


What Are Embeddings?
#

Embeddings are how we give AI models a way to understand the meaning behind our words. Instead of using plain text, we convert each chunk into a vector of numbers that captures its semantic meaning.

Two chunks with similar ideas — even if the words are different — will produce similar embeddings. This allows us to find relevant information based on meaning, not just matching keywords.


Real-World Analogy
#

Imagine you ask:

“What AWS services has Khalid used?”

Your resume might say:

  • “Built pipelines using Lambda, S3, and DynamoDB.”
  • “Designed cloud apps with AWS services.”

If the system only matched keywords, it might miss the second example. But embeddings help connect both answers by understanding the context — even if the wording is different.


Overview of the Process
#

Here’s what we’ll cover:

  1. Generate embeddings from document chunks using OpenAI
  2. Store those embeddings in a vector database (FAISS)
  3. Perform similarity search to find relevant chunks for a given question
  4. Format queries with templates and send them to a chat model

Step 1: Load and Split the Document
#

This part should look familiar from Part 1:

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("data/the_adventure_of_the_blue_carbuncle.pdf")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_docs = splitter.split_documents(docs)

Step 2: Initialize the OpenAI Embedding Model
#

from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings()

This uses your existing OpenAI API key and gives you access to models like text-embedding-3-small.

You can customize the model like this:

embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-small",
    dimensions=1536,
    chunk_size=1000
)

Step 3: Generate Embeddings
#

text = split_docs[0].page_content
embedding_vector = embedding_model.embed_query(text)
print(embedding_vector[:5])

This returns a list of numbers like:

[0.0105, -0.0001, 0.0052, -0.0246, -0.0126]

That’s your document chunk in vector form — ready for storage and comparison.


Step 4: Store Embeddings in a Vector Database (FAISS)
#

from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(split_docs, embedding_model)

FAISS organizes all your vectors into a searchable index. Behind the scenes, it:

  • Converts all text chunks into embeddings
  • Builds a structure to retrieve the closest matches fast
  • Keeps the original chunk and metadata tied to the vector

Step 5: Search for Relevant Chunks (Similarity Search)#

Let’s say a user asks:

query = "What was the main clue?"
results = vectorstore.similarity_search(query, k=3)
for doc in results:
    print(doc.page_content[:300], "\n")

FAISS will:

  • Convert the query into an embedding
  • Search for the most similar document vectors
  • Return the top k matching chunks

Step 6: Add Context with Prompt Templates
#

Now we wrap the retrieved context and the question into a formatted prompt:

from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(
    "Answer the following question based on the provided context.\n\n"
    "Context:\n{context}\n\n"
    "Question: {question}"
)
context = "\n\n".join([doc.page_content for doc in results])
query = "What was the main clue?"
prompt = prompt_template.format(context=context, question=query)

Step 7: Ask the Chat Model
#

from langchain_openai import ChatOpenAI

chat = ChatOpenAI()
response = chat.invoke(prompt)

print(f"Q: {query}")
print(f"A: {response.content}")

Now you’re using retrieved context to answer a user’s question — the foundation of Retrieval-Augmented Generation (RAG).


What’s Next?
#

You’ve now:

  • Converted documents into semantic vectors
  • Stored them in a vector DB
  • Queried those vectors based on meaning
  • Connected the result to a chat model

In Part 3, we’ll explore different vector databases (like Chroma, pgvector, Redis, and Pinecone) and compare them on performance, scalability, and production-readiness.

LangChain Document Intelligence - This article is part of a series.
Part : This Article