Import data

Importing data for your recommendation engine

In the previous module, you learned how to create and connect to your Aura database instance.

In this lesson, you will learn how to:

Use the Import tool to load movie data into your Aura instance
Create a data model that supports recommendation queries (nodes and relationships)
Run an import and verify your movie dataset is ready for recommendations

Using the Import tool

The Import tool provides a visual interface for loading CSV data into your Neo4j instance.

Instead of writing complex import scripts, the Import tool allows you to visually map your CSV data to graph nodes (Movie, Person) and relationships (ACTED_IN). This ensures your data model supports queries from the start.

Behind the scenes, the Import tool takes your CSV and turns it into a graph: it inspects the file structure, then you decide how columns map to node properties (e.g. movieId and title → Movie) and how rows link up as relationships (e.g. Person ACTED_IN Movie). From that, it generates and runs the Cypher that loads the data and checks that nodes and relationships were created correctly.

The diagram shows the complete import process from source files to your Neo4j database.

Import process diagram showing the steps from CSV to Neo4j

Step 1: Prepare your movie dataset

Your data needs information about movies, actors, and their relationships. The CSV file contains this information in a tabular format that we’ll transform into a graph. Proper data preparation ensures a smooth import and an effective graph model.

How to prepare your data before importing:

Download the sample movie data: movies.csv
Save it to your local machine (e.g., Downloads folder)
Open the file to preview its structure—you’ll see columns like movieId, title, personId, name, and characters

Data preparation checklist:

Before importing, verify your CSV data:

Unique identifiers exist: Ensure each movie has a unique movieId and each person has a unique personId. Duplicate IDs will create duplicate nodes or cause import errors.
Data types are consistent: Check that movieId and personId are consistently formatted (all numbers or all strings). Mixed types can cause mapping issues.
Missing values are handled: Identify any empty cells. Decide whether to skip rows with missing data or use default values. For recommendations, missing actor names might break relationship creation.
Special characters are properly encoded: Ensure characters like quotes, commas, or newlines are properly escaped or use a different delimiter.
Column headers are clear: Verify column names are descriptive and don’t contain spaces or special characters (use movieId not Movie ID).
Relationships are identifiable: Confirm which columns connect entities (e.g., personId and movieId together indicate an ACTED_IN relationship).

What’s in the dataset: This CSV contains information about movies and the actors who appeared in them. Each row represents an actor’s role in a movie, which we’ll model as a relationship in the graph. This structure enables recommendation queries like "Find all movies with Tom Hanks" or "Find actors who worked together."

Example data structure:

movieId,title,personId,name,characters
123,The Matrix,456,Keanu Reeves,"Neo"
123,The Matrix,789,Laurence Fishburne,"Morpheus"

This structure shows that both Keanu Reeves and Laurence Fishburne acted in The Matrix, creating two ACTED_IN relationships in your graph.

Step 2: Add your data source to Aura

Follow these steps to add your data source to Aura:

In the Aura Console, navigate to your instance
Click on Import in the left sidebar
Click New data source button
Select CSV as the data source type (since your movie data is in CSV format)
Click Upload CSV to open the file dialog
Select the movies.csv file from your local machine

CSV files are easy to work with and commonly used for data imports. The Import tool reads the CSV structure and helps you map it to graph nodes and relationships.

Step 3: Review your data structure

Once the file is uploaded, you’ll see the Import tool interface showing your CSV structure.

The Import tool displays your CSV columns (movieId, title, personId, name, characters) and sample data rows. This preview helps you understand what data you’re working with before creating your graph model.

Step 4: Create your data model

The data model defines how your CSV data becomes a graph. You need:

Movie nodes - Each movie becomes a node to query
Person nodes - Each actor becomes a node to traverse from
ACTED_IN relationships - These connections enable recommendation queries like "Find movies with the same actors"

Follow these steps to click Create model manually to start building your graph structure.

Step 5: Define Movie nodes

Movies are central to your data model. Each Movie node has properties (title, movieId) to use in queries like "Find movies similar to The Matrix."

Follow these steps to add the Movie node label:

Click the Add node label button (or the + icon)
In the details panel on the right, set the label to Movie
Click Map from table to connect CSV columns to node properties
Map movieId → This becomes the unique identifier for each Movie node
Map title → This becomes a property to search and display

When you map movieId and title, the Import tool will create Cypher statements like:

CREATE (m:Movie {movieId: '123', title: 'The Matrix'})

This creates Movie nodes that your recommendation queries can traverse.

After adding the label, you can edit the model structure to refine how your CSV data maps to graph elements.

Step 6: Define Person nodes

Actors are the connections between movies in your data model. When you query "Find movies with Tom Hanks," you’re traversing from a Person node through ACTED_IN relationships to Movie nodes.

Follow these steps to add the Person node label:

Click the Add node label button again to create a second node type
Set the label to Person
Click Map from table
Map personId → Unique identifier for each Person node
Map name → Property to search (e.g., "Tom Hanks")

Example for recommendations:

Once imported, you’ll be able to query for all movies Tom Hanks acted in:

MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)
RETURN m.title

This finds all movies Tom Hanks acted in—the foundation of actor-based recommendations.

Optional: Edit property types by clicking the pencil icon next to each property. For example, you might want to ensure personId is stored as an integer for better query performance.

Step 7: Define ACTED_IN relationships

Relationships are the core of your graph. The ACTED_IN relationship connects Person nodes to Movie nodes, enabling queries like:

"Find all movies with the same actors" (traverse from Movie through ACTED_IN to Person, then back to other Movies)
"Find actors who worked together" (find two Person nodes connected to the same Movie)

Follow these steps to create the ACTED_IN relationship:

Hover over the edge of the Person node—you’ll see a plus-sign (+)
Click and drag from Person to Movie node
Name the relationship type ACTED_IN
The Import tool automatically maps personId and movieId to connect the right nodes
Click Map from table and select characters—this stores the character name as a property on the relationship

The Import tool creates Cypher statements like:

MATCH (p:Person {personId: '123'}), (m:Movie {movieId: '456'})
CREATE (p)-[:ACTED_IN {characters: ['Neo']}]->(m)

This creates the connections your recommendation engine needs to traverse.

Verification: The green checkmark indicates that the relationship mapping is correct. Your model now shows Person nodes connected to Movie nodes via ACTED_IN relationships—exactly what you need for recommendation queries.

Step 8: Review and confirm your model

Before importing, verify that your model correctly maps CSV data to graph structure. Incorrect mappings mean your queries won’t work.

Follow these steps to review and verify the model:

Review the model diagram—you should see Person and Movie nodes connected by ACTED_IN relationships
Click on each node to verify property mappings (movieId, title for Movie; personId, name for Person)
Verify the ACTED_IN relationship maps personId and movieId correctly
Confirm primary keys: The Import tool uses movieId and personId as unique identifiers to avoid creating duplicate nodes

The Import tool analyzes your CSV to ensure:

No duplicate nodes (uses movieId/personId as unique keys)
All relationships can be created (both Person and Movie nodes exist)
Data types are correct (strings, numbers, etc.)

Step 9: Run the import

Follow these steps to run the import:

Click Run import button
You’ll be prompted to connect to your database
Enter your Aura instance credentials:
- URI: Your instance connection string (e.g., neo4j+s://xxxxx.databases.neo4j.io)
- Username: Usually neo4j (or your instance ID for AuraDB Free)
- Password: The password you saved when creating the instance
Click Connect
Wait for the import to complete—this may take a minute depending on your dataset size

Behind the scenes, the Import tool does the following:

Generates optimized Cypher statements from your model
Connects to your Aura instance
Executes batched inserts (creates nodes first, then relationships)
Verifies all data was imported correctly
Reports any errors or warnings

The Import tool processes your CSV data and creates nodes and relationships in your Neo4j instance. After the import completes, you’ll see a summary of what was created.

Step 10: Verify your import results

What to check: The import summary shows how many nodes and relationships were created. For your recommendation engine, you should see:

Multiple Movie nodes (one for each unique movie)
Multiple Person nodes (one for each unique actor)
ACTED_IN relationships connecting them

If the counts look correct, your data was imported successfully. If something seems off (e.g., zero relationships), your model mapping might need adjustment.

Example: A successful import might show:

100 Movie nodes
50 Person nodes
200 ACTED_IN relationships

This means you have 200 actor-movie connections to traverse for recommendations.

Step 11: Save your data model

If you need to import more data later or recreate the structure in another instance, the saved model lets you reuse the same mapping without rebuilding it.

Follow these steps to save your data model:

Close the import summary window
You’ll return to the Import tool main screen
Your imported data source appears in the list
Click on the model name field (it may show "Untitled")
Enter a descriptive name like "Movies Model" or "Movie Recommendation Dataset"
Click Save

Reusing the model: Load this model later and apply it to new CSV files with the same structure, making it easy to add more movies to your recommendation engine.

Check your understanding

Data Import workflow

What is the correct order of steps when importing data using the Import tool?

❏ Create model → Run import → Upload CSV → Connect to database
❏ Connect to database → Run import → Upload CSV → Create model
✓ Upload CSV → Create model → Connect to database → Run import
❏ Create model → Connect to database → Upload CSV → Run import

Hint

First you need data to work with, then you define how that data maps to nodes and relationships, then you connect and execute.

Solution

The correct order is Upload CSV → Create model → Connect to database → Run import.

Upload CSV - Add your data source file using "New data source"
Create model - Define nodes (like Person, Movie) and relationships (like ACTED_IN) with their properties
Connect to database - Select which instance to import into
Run import - Execute the import and verify the results

Data model reuse

Where are data models saved in Aura, and what does this mean for reuse?

❏ Models are saved at the instance level and can only be used with that specific instance
✓ Models are saved at the project level and can be reused across different instances within the project
❏ Models are saved at the organization level and can be shared across all projects
❏ Models are saved locally on your computer and must be uploaded each time

Hint

Data models are saved at the project level, which means they can be applied to any instance within that project.

Solution

Models are saved at the project level and can be reused across different instances within the project.

This means you can create a data model once and apply it to multiple instances within the same project, making it easy to maintain consistent graph structures across development, staging, and production environments.

Summary

In this lesson, you imported movie data into your Aura instance to power your recommendation engine. You:

Prepared your dataset: Downloaded and verified the movies.csv file, checking for unique identifiers, consistent data types, and proper formatting
Created a graph model: Defined Movie and Person nodes connected by ACTED_IN relationships—the structure your recommendation queries need
Ran the import: Loaded your data into Aura, creating nodes and relationships that enable recommendation queries
Saved your model: Preserved the mapping for future imports

The graph structure you created (Person -[:ACTED_IN]→ Movie) enables queries like:

Finding movies with the same actors
Discovering actors who worked together
Identifying similar movies based on shared cast

Data models are saved at the project level and can be reused across different instances.

For more information on the Import tool, including supported file formats and advanced mapping options, see the Neo4j Aura Import documentation.

In the next lesson, you’ll write Cypher queries to find movie recommendations by traversing the relationships you just created.

AuraDB Fundamentals

Introduction to Neo4j Aura

Getting Started

Tools

Operations

Import data

Importing data for your recommendation engine

Using the Import tool

Step 1: Prepare your movie dataset

Step 2: Add your data source to Aura

Step 3: Review your data structure

Step 4: Create your data model

Step 5: Define Movie nodes

Step 6: Define Person nodes

Step 7: Define ACTED_IN relationships

Step 8: Review and confirm your model

Step 9: Run the import

Step 10: Verify your import results

Step 11: Save your data model

Check your understanding

Data Import workflow

Data model reuse

Summary

Chatbot