RAG Pipeline

You can use a retriever as part of a RAG (Retrieval-Augmented Generation) pipeline to provide context to a LLM.

In this lesson, you will use the vector retriever you created to pass additional context to an LLM allowing it to generate more accurate and relevant responses.

Open the genai-fundamentals/vector_rag.py file and review the program:

python
vector_rag.py
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.retrievers import VectorRetriever

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create embedder
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")

# Create retriever
retriever = VectorRetriever(
    driver,
    index_name="moviePlots",
    embedder=embedder,
    return_properties=["title", "plot"],
)

# Create the LLM

# Create GraphRAG pipeline

# Search 

# CLose the database connection
driver.close()

The program includes the code to connect to Neo4j and create the vector retriever.

You will add the code to:

  1. Create and configure the LLM

  2. Create the GraphRAG pipeline to use the vector retriever

  3. Submit a query to the RAG pipeline

  4. Parse the results

LLM

You will need an LLM to generate the response based on the users query and the context provided by the vector retriever.

Create the LLM using the OpenAILLM class from the neo4j_graphrag package:

python
from neo4j_graphrag.llm import OpenAILLM

# Create the LLM
llm = OpenAILLM(model_name="gpt-4o")

The LLM is configured to use gpt-4o. You can change the configuration to another OpenAI model by changing the model parameter.

You can also change the temperature to control the randomness of the generated response, by providing a value in the model_params. A value of 0 will make responses more deterministic, while a value closer to 1 will make responses more random.

python
# Modify the LLM configuration if needed
llm = OpenAILLM(
    model_name="gpt-3.5-turbo", 
    model_params={"temperature": 1}
)

The neo4j-graphrag package supports multiple LLM models and the ability to create your own interface.

GraphRAG pipeline

The GraphRAG class allows you to create a RAG pipeline including a retriever and an LLM.

Create the GraphRAG pipeline using the retriever and the llm you created:

python
from neo4j_graphrag.generation import GraphRAG

# Create GraphRAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)

The GraphRAG pipeline will:

  1. Use the retriever to find relevant context based on the user’s query.

  2. Pass the user’s query and the retrieve context to the LLM.

You can use the search method to submit a query.

python
# Search
query_text = "Find me movies about toys coming alive"

response = rag.search(
    query_text=query_text, 
    retriever_config={"top_k": 5}
)

print(response.answer)

The search method takes the user’s query and returns the generated response from the LLM. You can also specify addition retriever_config, such as the number of results to return.

Click to view the complete code
python
import os
from dotenv import load_dotenv
load_dotenv()

from neo4j import GraphDatabase
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.retrievers import VectorRetriever
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.generation import GraphRAG

# Connect to Neo4j database
driver = GraphDatabase.driver(
    os.getenv("NEO4J_URI"), 
    auth=(
        os.getenv("NEO4J_USERNAME"), 
        os.getenv("NEO4J_PASSWORD")
    )
)

# Create embedder
embedder = OpenAIEmbeddings(model="text-embedding-ada-002")

# Create retriever
retriever = VectorRetriever(
    driver,
    index_name="moviePlots",
    embedder=embedder,
    return_properties=["title", "plot"],
)

# Create the LLM
llm = OpenAILLM(model_name="gpt-4o")


# Create GraphRAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)

# Search
query_text = "Find me movies about toys coming alive"

response = rag.search(
    query_text=query_text, 
    retriever_config={"top_k": 5}
)

print(response.answer)


# Close the database connection
driver.close()

Run the program to search for similar movies based on a plot.

Return context

You can also return the context that was used to generate the response. This can be useful in understanding how the LLM generated the response.

Add the return_context=True parameter to the search method:

python
# Search
query_text = "Find me movies about toys coming alive"

response = rag.search(
    query_text=query_text, 
    retriever_config={"top_k": 5},
    return_context=True
)

print(response.answer)
print("CONTEXT:", response.retriever_result.items)

The retriever_result is returned along with the generated response.

Experiment with different queries and see how the LLM generates responses based on the context provided by the vector retriever.

Randomness

Responses from the LLM will not be consistent and may vary each time you run the program. Experiment with the temperature parameter in the LLM configuration to see how it affects the randomness of the generated response.

Check Your Understanding

Context

True or False: Using the GraphRAG pipeline, the LLM generates a response without using any context from the retriever.

  • ❏ True

  • ✓ False

Hint

The GraphRAG pipeline relies on the retriever to provide relevant context, which is then used by the LLM to generate a more accurate response.

Solution

The statement is False. In the GraphRAG pipeline always uses passes the context to the LLM to generate a response.

Lesson Summary

In this lesson, you learned how to create a GraphRAG pipeline using a vector retriever and an LLM.

In the next lesson, you will build a more advanced retriever that combines vector search with graph traversal and relationships to enhance information retrieval.