Entity Resolution

When the entities are identified in the text and subsequently created in the knowledge graph, they may not be unique. For example, the text may refer to Neo4j in some places and Neo4j Graph Database in others.

The default entity resolution strategy in the SimpleKGBuilder is to merge entities that have the same label and identical name property.

No Entity Resolution

You can disable entity resolution by setting the perform_entity_resolution parameter to False when creating the SimpleKGBuilder instance:

python
kg_builder = SimpleKGPipeline(
    llm=llm,
    driver=neo4j_driver, 
    neo4j_database=os.getenv("NEO4J_DATABASE"), 
    embedder=embedder, 
    from_pdf=True,
    perform_entity_resolution=False,
)

Disabling entity resolution will result in all identified entities being created as new nodes.

This may lead to multiple nodes representing the same real-world entity,

Post Processing Entity Resolution

The neo4j_graphrag library includes additional entity resolver components. The entity resolvers are used after the creation of the knowledge graph to identify and merge duplicate entities.

For example:

Post processing of entities can result in a more concise knowledge graph with fewer duplicate entities but with the risk of incorrectly merging distinct entities.

Refer to the Entity Resolver documentation for more information and how to use them.

When you’re ready you can continue.

Lesson Summary

In this lesson, you learned about entity resolution strategies.

In the next lesson, you will learn how to use and configure different LLMs.

Chatbot

How can I help you today?