Optimizing Query Performance

Introduction

So far in this module, you have learned how to access and interpret query logs to identify slow queries. Query logs become most valuable when you use them to systematically improve your database performance.

In this lesson, you will learn how to use query logs to identify problematic queries, and tips on how to optimize them.

Understanding Query Optimization

Query optimization is the process of improving query performance by identifying slow queries and applying specific techniques to make them faster. Slow queries impact user experience and consume valuable database resources.

The most important metric for prioritization is total time spent, calculated as frequency multiplied by duration. A query running 1,000 times at 500ms has more total impact than a query running once at 60 seconds, despite being individually faster.

How Cypher Queries Work

Cypher queries execute in two phases: finding anchor nodes then expanding relationships.

Anchor nodes are the starting points. Neo4j locates these nodes first, then expands outward by traversing relationships.

Index-free adjacency

Quick tips

  • The fewer anchor nodes you have, the faster your query will run.

  • Filter data as early as possible to reduce work in later stages.

  • The fewer relationships you traverse, the faster your query will run.

  • The more specific the relationship types you use, the faster your query will run.

  • Using indexes helps Neo4j quickly find anchor nodes.

Identifying Queries to Optimize

Query logs help you find queries that need optimization. Use the filter to show queries with a minimum duration above 1000ms to focus on slow queries.

The Summary tab shows aggregated data for each unique query. Sort by total time spent to identify queries with the highest overall impact on your database. Look for queries that appear frequently and calculate their total impact by multiplying frequency by average duration.

User-facing queries should be prioritized over background jobs. A slow query in your application’s main workflow directly hurts user experience. Background processing jobs can often tolerate longer execution times.

Analyzing Query Performance

For each slow query, examine the key metrics in the query logs. The duration tells you how long the query takes and whether it’s consistently slow or variable. The planning time reveals whether the query is parameterized.

High planning time above 100ms often indicates non-parameterized queries that cannot reuse execution plans.

Page faults indicate that data is not in cache and must be read from disk. High page faults suggest missing indexes or insufficient memory. Review the execution pattern to understand when the query runs, how often, and what triggers it.

Optimizing Queries

You can prefix your queries with EXPLAIN and PROFILE to understand query execution. EXPLAIN shows the planned execution steps without running the query. PROFILE executes the query and returns detailed statistics about actual performance.

Explain

EXPLAIN shows the planned execution steps without running the query.

cypher
EXPLAIN
MATCH (u:User)-[r:RATED]->(m:Movie)
WHERE m.title = 'Toy Story'
RETURN u.name, r.rating
Explain Index Range Movie Title

Profile

PROFILE shows the actual number of database hits, rows returned, and time spent in each operation. Look for operations with high database hits or rows to identify bottlenecks.

cypher
PROFILE
MATCH (u:User)-[r:RATED]->(m:Movie)
WHERE m.title = 'Toy Story'
RETURN u.name, r.rating
Profile Index Range Movie Title

Example 1: Missing Index

Consider this query that appears frequently in your query logs with high duration and page faults:

cypher
MATCH (u:User)-[r:RATED]->(m:Movie)
WHERE m.title = 'The Matrix'
RETURN u.name, r.rating

Running PROFILE on this query shows high database hits because Neo4j must scan all Movie nodes and check the title property of each one.

Query starting with a NodeByLabelScan operator

Solution

Create an index on the title property to enable fast lookups:

cypher
CREATE INDEX movie_title FOR (m:Movie) ON (m.title)

Affect

After creating the index, you will now see the NodeByLabelScan operator replaced with the NodeIndexSeek operator.

Profile Index Range Movie Title

Modeling for performance

You can learn more about effective graph modeling in Graph Data Modeling Fundamentals.

Query Optimization Checklist

When analyzing slow queries, check for these common issues:

  • Variable length paths have upper limits: Use [*1..5] instead of [*] to prevent excessive traversals

  • Node labels are specified: Use MATCH (u:User) instead of MATCH (u) to avoid scanning all nodes

  • Properties have indexes: Create indexes on properties used in WHERE clauses and lookups

  • Relationship types are specified: Always provide relationship types to narrow traversals

  • Relationship directions are specified: Always specify the direction of the relationship when it is known

  • Queries use parameters: Replace literal values with $parameter to enable plan caching

  • Return only needed properties: Use RETURN u.name, m.title instead of RETURN u, m to reduce data transfer

  • Result sets are limited: Add LIMIT clauses to prevent returning excessive data

Check Your Understanding

Query Prioritization

Which metric is most important when prioritizing queries for optimization?

  • ❏ The longest single query duration

  • ✓ Total time spent (frequency × duration)

  • ❏ The number of page faults

  • ❏ The planning time

Hint

Consider the overall impact on database resources, not just individual query speed.

Solution

The correct answer is Total time spent (frequency × duration).

This metric captures the overall impact of a query on your database. A frequently-run query with moderate duration often has more total impact than a rarely-run slow query.

The longest single query duration measures individual query speed but ignores how often it runs. The number of page faults indicates caching issues but doesn’t measure total resource consumption. The planning time shows whether queries are parameterized but doesn’t reflect execution impact.

PROFILE vs EXPLAIN

What is the key difference between using PROFILE and EXPLAIN when analyzing query performance?

  • ❏ PROFILE shows the query plan while EXPLAIN shows the results

  • ❏ EXPLAIN is faster because it uses cached data

  • ✓ PROFILE executes the query and returns actual statistics while EXPLAIN only shows the planned execution

  • ❏ PROFILE can only be used on read queries while EXPLAIN works on all queries

Hint

Consider whether the query actually runs and what type of information each command provides.

Solution

The correct answer is PROFILE executes the query and returns actual statistics while EXPLAIN only shows the planned execution.

PROFILE runs the query and provides real performance metrics including actual database hits, rows processed, and time spent in each operation. EXPLAIN only shows the planned execution steps without running the query.

Summary

Query optimization uses query logs to identify and fix performance problems. Prioritize queries by total time spent, which is frequency multiplied by duration, to find queries with the highest overall impact.

Use EXPLAIN and PROFILE to understand query execution before making changes. Common optimizations include adding indexes for property lookups, using list comprehensions instead of multiple MATCH clauses to avoid cartesian products, and modeling frequently filtered relationship properties as specific relationship types. Always measure baseline performance before optimizing and validate improvements in query logs after deployment.

Chatbot

How can I help you today?