Neo4jGraph

You can query Neo4j from a LangChain application using the Neo4jGraph class. The Neo4jGraph class acts as the connection to the database when using other LangChain components, such as retrievers and agents.

In this lesson, you will modify the simple LangChain agent to be able to answer questions about a graph database schema.

The database contains information about movies, actors, and user ratings.

Query Neo4j

To query Neo4j you need to:

  1. Create a Neo4jGraph instance and connect to a database

  2. Run a Cypher statement to get data from the database

Open the genai-integration-langchain/neo4j_query.py file:

python
neo4j_query.py
import os
from dotenv import load_dotenv
load_dotenv()

# Create Neo4jGraph instance

# Run a query and print the result

Update the code to:

  1. Connect to the Neo4j database using your connection details:

    python
    from langchain_neo4j import Neo4jGraph
    
    # Create Neo4jGraph instance
    graph = Neo4jGraph(
        url=os.getenv("NEO4J_URI"),
        username=os.getenv("NEO4J_USERNAME"), 
        password=os.getenv("NEO4J_PASSWORD"),
    )
  2. Run a Cypher query to retrieve data from the database:

    python
    # Run a query and print the result
    result = graph.query("""
    MATCH (m:Movie {title: "Mission: Impossible"})<-[a:ACTED_IN]-(p:Person)
    RETURN p.name AS actor, a.role AS role
    """)
    
    print(result)

The results are returned as a list of dictionaries, where each dictionary represents a row in the results.

[
    {'actor': 'Henry Czerny', 'role': 'Eugene Kittridge'},
    {'actor': 'Emmanuelle Béart', 'role': 'Claire Phelps'},
    {'actor': 'Tom Cruise', 'role': 'Ethan Hunt'},
    {'actor': 'Jon Voight', 'role': 'Jim Phelps'}
]

You can use data returned from queries as context for an agent.

Schema

You are going to modify the agent to retrieve the database schema and add it to the context.

You can view the database schema using the Cypher query:

cypher
CALL db.schema.visualization()

Open the genai-integration-langchain/scheme_agent.py file.

python
schema_agent.py
import os
from dotenv import load_dotenv
load_dotenv()

from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate
from typing_extensions import List, TypedDict

# Connect to Neo4j
# graph = 

# Initialize the LLM
model = init_chat_model("gpt-4o", model_provider="openai")

# Create a prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(template)

# Define state for application
class State(TypedDict):
    question: str
    context: List[dict]
    answer: str

# Define functions for each step in the application

# Retrieve context 
def retrieve(state: State):
    context = [
        {"data": "None"}
    ]
    return {"context": context}

# Generate the answer based on the question and context
def generate(state: State):
    messages = prompt.invoke({"question": state["question"], "context": state["context"]})
    response = model.invoke(messages)
    return {"answer": response.content}

# Define application steps
workflow = StateGraph(State).add_sequence([retrieve, generate])
workflow.add_edge(START, "retrieve")
app = workflow.compile()

# Run the application
question = "What data is in the context?"
response = app.invoke({"question": question})
print("Answer:", response["answer"])

Add the code to create a connection to the Neo4j database using the Neo4jGraph class:

python
from langchain_neo4j import Neo4jGraph

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

Modify the retrieve function to use the Neo4jGraph instance to query the database and retrieve the schema information:

python
# Retrieve context 
def retrieve(state: State):
    context = graph.query("CALL db.schema.visualization()")
    return {"context": context}

The agent will add the database schema to the context and use it to answer questions about the database.

Update the question and run the application:

python
question = "How is the graph structured?"
Click to see the complete code
python
import os
from dotenv import load_dotenv
load_dotenv()

from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate
from typing_extensions import List, TypedDict
from langchain_neo4j import Neo4jGraph

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

# Initialize the LLM
model = init_chat_model("gpt-4o", model_provider="openai")

# Create a prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(template)

# Define state for application
class State(TypedDict):
    question: str
    context: List[dict]
    answer: str

# Define functions for each step in the application

# Retrieve context 
def retrieve(state: State):
    context = graph.query("CALL db.schema.visualization()")
    return {"context": context}

# Generate the answer based on the question and context
def generate(state: State):
    messages = prompt.invoke({"question": state["question"], "context": state["context"]})
    response = model.invoke(messages)
    return {"answer": response.content}

# Define application steps
workflow = StateGraph(State).add_sequence([retrieve, generate])
workflow.add_edge(START, "retrieve")
app = workflow.compile()

# Run the application
question = "How is the graph structured?"
response = app.invoke({"question": question})
print("Answer:", response["answer"])

When asked "How is the graph structured?", the agent will response with a summary of the graph, typically including details about nodes, relationships, and their properties.

Click to see an example response

The graph is structured using nodes and relationships.

  • Nodes:

    • Movie: Has indexes on year, imdbRating, released, imdbId, title, and tagline. It has unique constraints on movieId and tmdbId.

    • User: Has an index on name and a unique constraint on userId.

    • Actor, Director, and Genre: Do not have indexes listed. The Genre node has a unique constraint on name.

    • Person: Has an index on name and a unique constraint on tmdbId.

  • Relationships:

    • ACTED_IN: Connects Actor, Director, or Person nodes with Movie nodes.

    • RATED: Connects User nodes to Movie nodes.

    • IN_GENRE: Connects Movie nodes to Genre nodes.

    • DIRECTED: Connects Person, Actor, or Director nodes to Movie nodes.

In summary, the graph represents a network where movies are connected to various entities like users, genres, actors, directors, and general persons through specific relationships indicating roles like acting, directing, rating, and categorization into genres.

Experiment by asking the agent other questions about the database schema. Here are some examples:

  • How is the graph structured?

  • How are Movie nodes connected to Person nodes?

  • What relationships are in the graph?

  • What properties do Movie nodes have?

  • How would I find user ratings for a movie?

When you are ready, continue to the next module.

Lesson Summary

In this lesson, you learned how to query Neo4j from LangChain and updated the agent to retrieve the database schema.

In the next module, you learn how to use vectors to create RAG and GraphRAG retrievers.

Chatbot

How can I help you today?