Refactoring Properties as Nodes

Final steps toward the data model

In the course, Graph Data Modeling Fundamentals, you learned that you can refactor the graph so that key queries of the application will perform better. Adding the labels as you did in the previous Challenge was a type of graph refactoring also.

In the Graph Data Modeling Fundamentals course, you learned how to turn a property of type list into a set of nodes with relationships.

In this lesson, we review that code and in the next challenge, you will complete the refactoring of the graph for our target data model.

Our target data model contains Genre nodes where each Movie node has an IN_GENRE relationship to one or more Genre nodes.

Adding a uniqueness constraint for Genre nodes

When you used the Data Importer, it automatically created the uniqueness constraints in the graph for the unique IDs you specified when you imported the data.

You can view the constraints defined in the graph with the SHOW CONSTRAINTS command in Neo4j Browser:

Here we see that we have three constraints that were created for us by the Data Importer.

A best practice is to always have a unique ID for every type of node in the graph. We want to also have a uniqueness constraint for the Genre nodes we will be creating in the graph. Having a uniqueness constraint defined helps with performance when creating nodes and also for queries. The MERGE clause looks up nodes using the property value defined for the constraint. With a constraint, it is a quick lookup and if the node already exists, it is not created.

Here is the code we use to create this uniqueness constraint for the name property of Genre nodes:

Cypher

Unresolved directive in lesson.adoc - include::https://raw.githubusercontent.com/neo4j-graphacademy/llm-vectors-unstructured/main/modules/3-refactoring-imported-data/lessons/7-nodes-from-properties/create-genre-constraint.cypher[]

You will be adding this constraint in the next Challenge.

Creating the Genre nodes from the genres property of Movie nodes

The next step is to retrieve all Movie nodes and use the values in the genres property to create the Genre node if it does not already exist and point to it with the IN_GENRE relationship.

Here is the code you will be executing in the next Challenge to do this:

Cypher

Unresolved directive in lesson.adoc - include::https://raw.githubusercontent.com/neo4j-graphacademy/llm-vectors-unstructured/main/modules/3-refactoring-imported-data/lessons/7-nodes-from-properties/merge-genre-nodes.cypher[]

The UNWIND clause expands the elements in genres list for the node as rows. With this data, it creates the Genre node using MERGE. With MERGE, it only creates the node if it does not already exist. Then it creates the relationship between the Movie node and the Genre node.

Removing the genres property

After you have created the Genre nodes and their relationships to the Movie nodes, you will simply remove the genres property from the graph.

Cypher

Unresolved directive in lesson.adoc - include::https://raw.githubusercontent.com/neo4j-graphacademy/llm-vectors-unstructured/main/modules/3-refactoring-imported-data/lessons/7-nodes-from-properties/remove-genres-property.cypher[]

Again, you will be doing this step in the next Challenge.

Confirming the final schema

Here we can view the visualization of the schema to confirm that it matches our data model.

Check your understanding

1. Constraints

Why do you add a uniqueness constraint to the graph prior to creating nodes?

✓ A best practice is to have a unique ID for a node of a given type in the graph.
❏ It enables you to generate a unique ID for every node.
✓ It prevents duplicate nodes when you create them in the graph.
✓ It speeds up MERGE performance.

Hint

These three reasons justify adding a uniqueness constraint to the graph.

Solution

You add a uniqueness constraint to the graph prior to creating nodes because:

A best practice is to have a unique ID for a node of a given type in the graph.
It prevents duplicate nodes when you create them in the graph.
It speeds up MERGE performance.

2. Processing a list

What Cypher keyword is used to expand the elements of a list as rows during a query?

❏ FOR EACH
✓ UNWIND
❏ EXTRACT
❏ ITERATE

Hint

With this keyword, you are unwinding the elements of the list.

Solution

The correct answer is: UNWIND

Summary

In this lesson, you reviewed how to create nodes from properties in the graph. In the next Challenge, you will complete the refactoring of the graph post-import to create the Genre nodes.

Importing CSV Data into Neo4j

Preparing for Importing Data into Neo4j

Using the Neo4j Data Importer

Refactoring Imported Data

Importing CSV Data with Cypher

Refactoring Properties as Nodes

Final steps toward the data model

Adding a uniqueness constraint for Genre nodes

Creating the Genre nodes from the genres property of Movie nodes

Removing the genres property

Confirming the final schema

Check your understanding

1. Constraints

2. Processing a list

Summary