Eliminating Duplicate Data

Duplicate data

You should take care to avoid duplicating data in your graph. Where some databases require a form of denormalization to improve the speed of a set of queries, this is not always the case with a graph database. De-duplicating data gives you the added benefit of allowing you to query through a node - for example, finding other customers who have purchased a particular product, or finding similar movies based on the rating of other users.

In addition, duplicating data in the graph increases the size of the graph and the amount of data that may need to be retrieved for a query.

New use case

We have a new use case that we must account for.

Use case #11: What movies are available in a particular language?

Our current instance model looks like this:

Instance model thus fars

As you can see we do not account for languages in the data model so we will have to add this data.

Duplicate data example

Suppose we add a property to each Movie node in the graph named languages that represents the languages in which a movie is available.

Here is what the instance model would look like:

Instance model with languages

Here we see that all Movie nodes have English in the list of languages. This is duplicate data and for a scaled database, would represent a lot of duplication.

Check your understanding

Why eliminate duplication?

Why would you refactor a graph to eliminate duplication?

  • ❏ You cannot have duplicates in the primary key values for the nodes.

  • ✓ Improve query performance.

  • ✓ Reduce the amount of storage required for the graph.

Hint

There are two main reasons you refactor a graph to eliminate duplication.

Solution

You should refactor your graph to improve query performance and reduce the amount of storage required for the graph.

Summary

In this lesson, you learned why it is important to eliminate duplication of data in the graph. In the next challenge, you will add more data to your instance model as duplicate data so you can test the new use case.

Chatbot

Hi, I am an Educational Learning Assistant for Intelligent Network Exploration. You can call me E.L.A.I.N.E.

How can I help you today?