Introduction to Neo4j & GenAI

This course will get you started learning about using Neo4j with Generative AI.

You will learn about:

  • Large Language Models (LLMs)

  • Knowledge Graphs

  • How to use Neo4j for Retrieval Augmented Generation (RAG)

  • Integrating Neo4j and LLMs using Python and Langchain

Throughout this course, you will explore how to leverage the capabilities of Neo4j and Generative AI to build intelligent, context-aware systems.

In this lesson, you will learn about Generative AI and Knowledge Graphs.

Generative AI and Knowledge Graphs

Generative AI and Knowledge Graphs

Knowledge graphs are a specific implementation of a Graph Database, where information is captured and integrated from many different sources, representing the inherent knowledge of a particular domain.

They provide a structured way to represent entities, their attributes, and their relationships, allowing for a comprehensive and interconnected understanding of the information within that domain.

Knowledge graphs break down sources of information and integrate them, allowing you to see the relationships between the data.

a diagram of an abstract knowledge graph showing how sources contain chunks of data about topics which can be related to other topics

You can tailor knowledge graphs for semantic search, data retrieval, and reasoning.

You may not be familiar with the term knowledge graph, but you have probably used one. Search engines typically use knowledge graphs to provide information about people, places, and things.

The following knowledge graph could represent Neo4j:

An example of a knowledge graph of Neo4j showing the relationships between people

This integration from diverse sources gives knowledge graphs a more holistic view and facilitates complex queries, analytics, and insights.

Knowledge Graphs and Ontologies

For more on Knowledge Graphs, Ontologies, we recommend watching the Going Meta – A Series on Graphs, Semantics and Knowledge series on YouTube.

Knowledge graphs can readily adapt and evolve as they grow, taking on new information and structure changes.

Neo4j is well-suited for representing and querying complex, interconnected data in Knowledge Graphs. Unlike traditional relational databases, which use tables and rows, Neo4j uses a graph-based model with nodes and relationships.

Generative AI & Large Language Models

Generative AI is a class of algorithms and models that can generate new content, such as images, text, or even music. New content is generated based on user prompting, existing patterns, and examples from existing data.

Large Language Models, referred to as LLMs, learn the underlying structure and distribution of the data and can then generate new samples that resemble the original data.

LLMs are trained on vast amounts of text data to understand and generate human-like text. LLMs can answer questions, create content, and assist with various linguistic tasks by leveraging patterns learned from the data.

Instructing an LLM

The response generated by an LLM is a probabilistic continuation of the instructions it receives. The LLM provides the most likely response based on the patterns it has learned from its training data.

In simple terms, if presented with the prompt "Continue this sequence - A B C", an LLM could respond "D E F".

To get an LLM to perform a task, you provide a prompt, a piece of text that should specify your requirements and provide clear instructions on how to respond.

A user asks an LLM the question 'What is an LLM? Give the response using simple language avoiding jargon.'

Precision in the task description, potentially combined with examples or context, ensures that the model understands the intent and produces relevant and accurate outputs.

An example prompt may be a simple question.

What is the capital of Japan?

Or, it could be more descriptive. For example:

You are a friendly travel agent
helping a customer to choose
a holiday destination. Your readers
may have English as a second
language, so use simple terms
and avoid colloquialisms.
Avoid Jargon at all costs.
Tell me about the capital of Japan.

The LLM will interpret these instructions and return a response based on the patterns it has learned from its training data.

Potential Problems

While LLMs provide a lot of potential, you should also be cautious.

At their core, LLMs are highly complex predictive text machines. LLM’s don’t know or understand the information they output; they simply predict the next word in a sequence.

The words are based on the patterns and relationships from other text in the training data. The sources for this training data are often the internet, books, and other publicly available text. This data could be of questionable quality and maybe be incorrect. Training happens at a point in time, it may not reflect the current state of the world and would not include any private information.

LLMs are fine-tuned to be as helpful as possible, even if that means occasionally generating misleading or baseless content, a phenomenon known as hallucination.

For example, when asked to "Describe the moon." an LLM may respond with "The moon is made of cheese.". While this is a common saying, it is not true.

A diagram of a confused LLM with a question mark thinking about the moon and cheese.

While LLMs can represent the essence of words and phrases, they don’t possess a genuine understanding or ethical judgment of the content.

Large Language Models (LLMs) are often considered "black boxes" due to the difficulty deciphering their decision-making processes. The LLM would also be unable to provide the sources for its output or explain its reasoning.

An LLM as a black box

These factors can lead to outputs that might be biased, devoid of context, or lack logical coherence.

Fixing Hallucinations

Providing additional contextual data helps to ground the LLM’s responses and make them more accurate.

A knowledge graph is a mechanism for providing additional data to an LLM. Data within the knowledge graph can guide the LLM to provide more relevant, accurate, and reliable responses.

While the LLM uses its language skills to interpret and respond to the contextual data, it will not disregard the original training data.

You can think of the original training data as the base knowledge and linguistic capabilities, while the contextual information guides in specific situations.

The combination of both approaches enables the LLM to generate more meaningful responses.

Check Your Understanding

1. False Negatives

What is the name given to a confident, but incorrect answer provided by an LLM?

  • ❏ Day Dream

  • ✓ Hallucination

  • ❏ Illusion

  • ❏ Ungrounding

Hint

This phenomenon can occur when the LLM is unaware of the concept, either through bad data or a cut-off date in the training data.

Solution

The answer is Hallucination.

2. LLM Hallucinations

Why might an LLM produce outputs that manifest as generating untrue facts or nonsensical explanations?

  • ❏ LLMs are not designed to generate human-like text.

  • ✓ LLMs rely on patterns based on their training data, which may not always be accurate.

  • ❏ LLMs always provide accurate and factual information.

  • ❏ LLMs are designed to be unpredictable.

Hint

Remember that LLMs rely heavily on patterns from their training data.

Solution

LLMs rely on patterns and sometimes overfit to the data they’ve been trained on.

Lesson Summary

In this lesson, you learned about LLMs, their benefits, and challenges.

In the next lesson, you will learn about hallucination and the strategies for avoiding it.

Chatbot

Hi, I am an Educational Learning Assistant for Intelligent Network Exploration. You can call me E.L.A.I.N.E.

How can I help you today?