When the entities are identified in the text and subsequently created in the knowledge graph, they may not be unique. For example, the text may refer to Neo4j in some places and Neo4j Graph Database in others.
The default entity resolution strategy in the SimpleKGBuilder is to merge entities that have the same label and identical name property.
No Entity Resolution
You can disable entity resolution by setting the perform_entity_resolution parameter to False when creating the SimpleKGBuilder instance:
kg_builder = SimpleKGPipeline(
llm=llm,
driver=neo4j_driver,
neo4j_database=os.getenv("NEO4J_DATABASE"),
embedder=embedder,
from_pdf=True,
perform_entity_resolution=False,
)Disabling entity resolution will result in all identified entities being created as new nodes.
Post Processing Entity Resolution
The neo4j_graphrag library includes additional entity resolver components. The entity resolvers are used after the creation of the knowledge graph to identify and merge duplicate entities.
For example:
-
The
SpacySemanticMatchResolveruses thespaCylibrary to find and resolves entities with same label and similar set of textual properties. -
The
FuzzyMatchResolverfinds and resolves entities with the same label and similar set of textual properties using RapidFuzz for fuzzy matching.
Post processing of entities can result in a more concise knowledge graph with fewer duplicate entities but with the risk of incorrectly merging distinct entities.
Refer to the Entity Resolver documentation for more information and how to use them.
When you’re ready you can continue.
Lesson Summary
In this lesson, you learned about entity resolution strategies.
In the next lesson, you will learn how to use and configure different LLMs.