Building an Import Process

During this course, you have created queries which complete the following tasks:

Create Person and Movie constraints
Import data from persons.csv and create Person nodes
Import data from movies.csv and create Movie nodes
Create ACTED_IN and DIRECTED relationships between Person and Movie nodes
Create additional ACTOR and DIRECTOR labels on Person nodes

All the queries are independent of each other and do not form a single process.

View all the import queries

cypher

CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (x:Person)
REQUIRE x.tmdbId IS UNIQUE;

CREATE CONSTRAINT Movie_movieId IF NOT EXISTS
FOR (x:Movie)
REQUIRE x.movieId IS UNIQUE;

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET
p.imdbId = toInteger(row.person_imdbId),
p.bornIn = row.bornIn,
p.name = row.name,
p.bio = row.bio,
p.poster = row.poster,
p.url = row.url,
p.born = date(row.born),
p.died = date(row.died);

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {movieId: toInteger(row.movieId)})
SET
m.tmdbId = toInteger(row.movie_tmdbId),
m.imdbId = toInteger(row.movie_imdbId),
m.released = date(row.released),
m.title = row.title,
m.year = toInteger(row.year),
m.plot = row.plot,
m.budget = toInteger(row.budget),
m.imdbRating = toFloat(row.imdbRating),
m.poster = row.poster,
m.runtime = toInteger(row.runtime),
m.imdbVotes = toInteger(row.imdbVotes),
m.revenue = toInteger(row.revenue),
m.url = row.url,
m.countries = split(row.countries, '|'),
m.languages = split(row.languages, '|');

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {movieId: toInteger(row.movieId)})
MERGE (p)-[r:ACTED_IN]->(m)
SET r.role = row.role;

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/directed.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {movieId: toInteger(row.movieId)})
MERGE (p)-[r:DIRECTED]->(m);

MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p SET p:Actor;

MATCH (p:Person)-[:DIRECTED]->()
WITH DISTINCT p SET p:Director;

In this lesson, you will build a single import process to rebuild the graph by:

Deleting any existing data
Dropping any existing constraints
Run the queries to create the constraints, nodes, and relationships

The benefit of this approach is that you can easily re-run and test a single repeatable process.

Multiple Queries

To run multiple queries together, you must put a semi-colon (;) at the end of each query.

For example, this Cypher code contains two separate queries which will run one after the other:

cypher

MATCH (p:Person) RETURN p;
MATCH (m:Movie) RETURN m;

Resetting the data

Before you can re-run the import process, you must delete any existing data and drop any constraints.

The nodes and relationships within the graph hold all of the data. You will need to delete the relationships before deleting the nodes.

The following Cypher will delete the ACTED_IN and DIRECTED relationships:

cypher

MATCH (Person)-[r:ACTED_IN]->(Movie) DELETE r;
MATCH (Person)-[r:DIRECTED]->(Movie) DELETE r;

Once the relationships are deleted, you can delete the Person and Movie nodes:

cypher

MATCH (p:Person) DELETE p;
MATCH (m:Movie) DELETE m;

Alternatively, you can use DETACH DELETE to delete the nodes and relationships at the same time:

cypher

MATCH (p:Person) DETACH DELETE p;
MATCH (m:Movie) DETACH DELETE m;

You could also drop the constraints on the Person and Movie nodes if they exist:

cypher

DROP CONSTRAINT Person_tmdbId IF EXISTS;
DROP CONSTRAINT Movie_movieId IF EXISTS;

These queries reset the database and allow you to re-run the import process.

Importing the data

Combining the queries above with those from the previous lessons will create a single import process.

cypher

MATCH (p:Person) DETACH DELETE p;
MATCH (m:Movie) DETACH DELETE m;

DROP CONSTRAINT Person_tmdbId IF EXISTS;
DROP CONSTRAINT Movie_movieId IF EXISTS;

CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (x:Person)
REQUIRE x.tmdbId IS UNIQUE;

CREATE CONSTRAINT Movie_movieId IF NOT EXISTS
FOR (x:Movie)
REQUIRE x.movieId IS UNIQUE;

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET
p.imdbId = toInteger(row.person_imdbId),
p.bornIn = row.bornIn,
p.name = row.name,
p.bio = row.bio,
p.poster = row.poster,
p.url = row.url,
p.born = date(row.born),
p.died = date(row.died);

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {movieId: toInteger(row.movieId)})
SET
m.tmdbId = toInteger(row.movie_tmdbId),
m.imdbId = toInteger(row.movie_imdbId),
m.released = date(row.released),
m.title = row.title,
m.year = toInteger(row.year),
m.plot = row.plot,
m.budget = toInteger(row.budget),
m.imdbRating = toFloat(row.imdbRating),
m.poster = row.poster,
m.runtime = toInteger(row.runtime),
m.imdbVotes = toInteger(row.imdbVotes),
m.revenue = toInteger(row.revenue),
m.url = row.url,
m.countries = split(row.countries, '|'),
m.languages = split(row.languages, '|');

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {movieId: toInteger(row.movieId)})
MERGE (p)-[r:ACTED_IN]->(m)
SET r.role = row.role;

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/directed.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {movieId: toInteger(row.movieId)})
MERGE (p)-[r:DIRECTED]->(m);

MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p SET p:Actor;

MATCH (p:Person)-[:DIRECTED]->()
WITH DISTINCT p SET p:Director;

You can run this query at any point to refresh the database with the latest data.

A single process to build your graph provides a consistent mechanism to test your import.

Check Your Understanding

Running multiple queries

What character must you put at the end of a query to run multiple queries together?

❏ ,
✓ ;
❏ |
❏ /

Hint

Cypher uses the same pattern as many programming languages to determine the end of an instruction.

Solution

To run multiple queries together, you must put a semi-colon (;) at the end of each query.

Summary

In this lesson, you learned:

How to run multiple queries together.
The benefits of creating a single import process.
How to delete nodes and relationships and drop constraints.

In the next lesson, you will learn to split your import into multiple transactions.

Importing CSV data into Neo4j

Importing Data

Creating Nodes, Properties, and Relationships

Data Types, Lists, and Labels

Importing data considerations

Building an Import Process

Multiple Queries

Resetting the data

Importing the data

Check Your Understanding

Running multiple queries

Summary

Chatbot