Full-Text Search

Although semantic search very commonly used for matching user queries to text in a database it has its limitations.

As a reminder, in semantic search we match pieces of text based on their semantic similarity i.e. how similar they are in meaning. This means two pieces of text like "The child is playing with a toy" and "The kid is having fun with a plaything" would be considered semantically similar because they both represent the same thing even though they have different wordings.

Semantic search works by converting search queries into embeddings using an embedding model, then finding the most similar text in a database by comparing these query vector representations to those of all texts in the database.

Semantic search is very useful when matching user queries to text in a database because it allows users to ask their questions in a conversational style and still get good results, for example "What is the name of the actor in the movie 'The Matrix'?", "Who starred in the movie 'The Matrix'?", etc. will be matched to similar pieces of text in the database. However, semantic search has its limitations.

In particular when searching for domain-specific terms that lack broad semantic meaning or have different meanings in a wider context, semantic search may fail to retrieve relevant information or may return irrelevant information. This occurs because these terms might not be well-represented in the training data of the embedding model used for semantic search. For example semantically search may fail to return anything useful if you ask for an unusual name of a character or place in a movie, as it would have no way of knowing what this term means outside of its context. "Which movie is set on Tatooine?" only has meaning within the Star Wars universe and won’t return anything by semantic search if "Tatooine" wasn’t present in the training data of the embedding model.

Additionally, semantic search is also not reliable when a user query includes specific strings, such as names or dates, that need to be matched exactly for accurate results. As an example let’s use the query "Oscar winners 2019". In this case, you’re looking for specific information: the exact list of films or individuals who won Oscars in 2019. Semantic search might interpret the query broadly and return conceptually related results, such as notable films or discussions about the Oscars in general, but potentially from different years or focusing on nominations instead of winners. This would make the results less accurate for such a fact-specific query.

Full-text is an alternative method of searching text which can overcome these issues. Rather than matching pieces of text by their semantic meaning full-text search will match them based on exact wording, including variations in spelling. For example, with the query "Oscar winners 2019", a full-text search would return results containing "2019", thus avoiding returning irrelevant information from other years. Unlike semantic search embeddings models are not necessary for full-text search.

In many circumstances it can be useful to combine semantic and full-text searches. Hybrid search refers to a range of search methods that combine semantic search and full-text search. Typically these methods involve running a semantic search and a full-text search in parallel, then combining the results from both. These results are then ranked in some way and the best ones are returned. We will see in later lessons how we can make use of hybrid search in our RAG applications.

Continue

When you are ready, you can move on to the next task.

Summary

You have learned about full-text search in Neo4j.

Next, you will learn how to create a full-text index on a node property.