Building an Import Process

During this course, you have created queries which complete the following tasks:

  • Create Person and Movie constraints

  • Import data from persons.csv and create Person nodes

  • Import data from movies.csv and create Movie nodes

  • Create ACTED_IN and DIRECTED relationships between Person and Movie nodes

  • Create additional ACTOR and DIRECTOR labels on Person nodes

All the queries are independent of each other and do not form a single process.

View all the import queries
cypher
CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (x:Person)
REQUIRE x.tmdbId IS UNIQUE;

CREATE CONSTRAINT Movie_movieId IF NOT EXISTS
FOR (x:Movie)
REQUIRE x.movieId IS UNIQUE;

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET
p.imdbId = toInteger(row.person_imdbId),
p.bornIn = row.bornIn,
p.name = row.name,
p.bio = row.bio,
p.poster = row.poster,
p.url = row.url,
p.born = date(row.born),
p.died = date(row.died);

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {movieId: toInteger(row.movieId)})
SET
m.tmdbId = toInteger(row.movie_tmdbId),
m.imdbId = toInteger(row.movie_imdbId),
m.released = date(row.released),
m.title = row.title,
m.year = toInteger(row.year),
m.plot = row.plot,
m.budget = toInteger(row.budget),
m.imdbRating = toFloat(row.imdbRating),
m.poster = row.poster,
m.runtime = toInteger(row.runtime),
m.imdbVotes = toInteger(row.imdbVotes),
m.revenue = toInteger(row.revenue),
m.url = row.url,
m.countries = split(row.countries, '|'),
m.languages = split(row.languages, '|');

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {movieId: toInteger(row.movieId)})
MERGE (p)-[r:ACTED_IN]->(m)
SET r.role = row.role;

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/directed.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {movieId: toInteger(row.movieId)})
MERGE (p)-[r:DIRECTED]->(m);

MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p SET p:Actor;

MATCH (p:Person)-[:DIRECTED]->()
WITH DISTINCT p SET p:Director;

In this lesson, you will build a single import process to rebuild the graph by:

  • Deleting any existing data

  • Dropping any existing constraints

  • Run the queries to create the constraints, nodes, and relationships

The benefit of this approach is that you can easily re-run and test a single repeatable process.

Multiple Queries

To run multiple queries together, you must put a semi-colon (;) at the end of each query.

For example, this Cypher code contains two separate queries which will run one after the other:

cypher
MATCH (p:Person) RETURN p;
MATCH (m:Movie) RETURN m;

Resetting the data

Before you can re-run the import process, you must delete any existing data and drop any constraints.

The nodes and relationships within the graph hold all of the data. You will need to delete the relationships before deleting the nodes.

The following Cypher will delete the ACTED_IN and DIRECTED relationships:

cypher
MATCH (Person)-[r:ACTED_IN]->(Movie) DELETE r;
MATCH (Person)-[r:DIRECTED]->(Movie) DELETE r;

Once the relationships are deleted, you can delete the Person and Movie nodes:

cypher
MATCH (p:Person) DELETE p;
MATCH (m:Movie) DELETE m;

Alternatively, you can use DETACH DELETE to delete the nodes and relationships at the same time:

cypher
MATCH (p:Person) DETACH DELETE p;
MATCH (m:Movie) DETACH DELETE m;

You could also drop the constraints on the Person and Movie nodes if they exist:

cypher
DROP CONSTRAINT Person_tmdbId IF EXISTS;
DROP CONSTRAINT Movie_movieId IF EXISTS;

These queries reset the database and allow you to re-run the import process.

Importing the data

Combining the queries above with those from the previous lessons will create a single import process.

cypher
MATCH (p:Person) DETACH DELETE p;
MATCH (m:Movie) DETACH DELETE m;

DROP CONSTRAINT Person_tmdbId IF EXISTS;
DROP CONSTRAINT Movie_movieId IF EXISTS;

CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (x:Person)
REQUIRE x.tmdbId IS UNIQUE;

CREATE CONSTRAINT Movie_movieId IF NOT EXISTS
FOR (x:Movie)
REQUIRE x.movieId IS UNIQUE;

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET
p.imdbId = toInteger(row.person_imdbId),
p.bornIn = row.bornIn,
p.name = row.name,
p.bio = row.bio,
p.poster = row.poster,
p.url = row.url,
p.born = date(row.born),
p.died = date(row.died);

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {movieId: toInteger(row.movieId)})
SET
m.tmdbId = toInteger(row.movie_tmdbId),
m.imdbId = toInteger(row.movie_imdbId),
m.released = date(row.released),
m.title = row.title,
m.year = toInteger(row.year),
m.plot = row.plot,
m.budget = toInteger(row.budget),
m.imdbRating = toFloat(row.imdbRating),
m.poster = row.poster,
m.runtime = toInteger(row.runtime),
m.imdbVotes = toInteger(row.imdbVotes),
m.revenue = toInteger(row.revenue),
m.url = row.url,
m.countries = split(row.countries, '|'),
m.languages = split(row.languages, '|');

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {movieId: toInteger(row.movieId)})
MERGE (p)-[r:ACTED_IN]->(m)
SET r.role = row.role;

LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/importing-cypher/directed.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {movieId: toInteger(row.movieId)})
MERGE (p)-[r:DIRECTED]->(m);

MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p SET p:Actor;

MATCH (p:Person)-[:DIRECTED]->()
WITH DISTINCT p SET p:Director;

You can run this query at any point to refresh the database with the latest data.

A single process to build your graph provides a consistent mechanism to test your import.

Check Your Understanding

Running multiple queries

What character must you put at the end of a query to run multiple queries together?

  • ,

  • ;

  • |

  • /

Hint

Cypher uses the same pattern as many programming languages to determine the end of an instruction.

Solution

To run multiple queries together, you must put a semi-colon (;) at the end of each query.

Summary

In this lesson, you learned:

  • How to run multiple queries together.

  • The benefits of creating a single import process.

  • How to delete nodes and relationships and drop constraints.

In the next lesson, you will learn to split your import into multiple transactions.