The Northwind dataset and embeddings are used in later lessons to build and test an agent. Setting them up now ensures you can run both Cypher and similarity search tools.
In this lesson, you will learn how to set up the Northwind dataset with embeddings so your agent can run both Cypher queries and semantic similarity searches.
Overview
You will complete three steps:
Create an AuraDB instance with vector-optimized configuration
Load the Northwind dataset with embeddings
Enable tool authentication so agents can query the instance
Before diving into the steps, the following sections explain what embeddings are and why they matter for agents.
What Are Embeddings?
An embedding is a numerical vector that captures the semantic meaning of text. Machine learning models like OpenAI’s text-embedding-ada-002 convert text into high-dimensional vectors, typically 1536 dimensions, where similar meanings produce vectors that are close together.
For example, "spicy sauce" and "hot condiment" are different strings, but their embeddings are similar because they share semantic meaning. This lets you search by concept rather than exact keywords.
When Agents Can Use Embeddings
Agents do not require embeddings, but when embeddings are available, they can leverage them for the Similarity Search tool.
With embeddings in your knowledge graph, the Similarity Search tool can answer questions like "Find products similar to hot sauce" by comparing vector distances — even when no product contains those exact words.
Without embeddings, the Similarity Search tool is unavailable, but agents still work with Cypher Template and Text2Cypher tools. To learn how to create embeddings for your own data, see the Vector Indexes and Unstructured Data course or the New Cypher AI Procedures blog post.
How the Vector Index Works
A vector index organizes embeddings for fast nearest-neighbor lookup. Without an index, the database compares every embedding on every query, which does not scale.
The Northwind script creates an index called product_text_embeddings that enables sub-second similarity search.
Step 1: Create an AuraDB Instance
Go to the Aura Console and create a new AuraDB instance
Name it something descriptive like "Northwind with embeddings"
Select your cloud provider and region
Under Additional settings, enable Vector-optimized configuration
The vector-optimized setting ensures your instance is configured for embedding storage and similarity search.
Loads Northwind data: Product, Category, Supplier, Customer, Order, and Address nodes with their relationships
Creates Product.text and Product.textEmbedding properties using genai.vector.encode with OpenAI
Creates the product_text_embeddings vector index
Before running, replace the :param openAIKey placeholder with your OpenAI API key.
Step 3: Enable Tool Authentication
Agents need permission to query your instance. Without this step, your instance will appear unavailable when you create an agent.
Go to Aura Console → Organization → Security Settings
Enable Allow tools to connect with permissions from the user’s project role
Select your Northwind instance from the list
Check Your Understanding
Northwind Embeddings
Why create embeddings on Northwind text properties?
❏ To reduce storage size
✓ To enable semantic similarity search, for example "find products similar to spicy condiments"
❏ To encrypt sensitive data
❏ To speed up exact match queries only
Hint
Embeddings encode semantic meaning. They enable finding similar items by vector distance, not just exact text match.
Solution
To enable semantic similarity search.
Embeddings convert text into vectors that capture meaning. A vector index lets you find products, categories, or customers similar to a query, useful for an agent that answers natural language questions.
Summary
In this lesson, you learned how to create an AuraDB instance, load the Northwind dataset with embeddings, and enable tool authentication.