Introduction
In this lesson you will learn:
-
How Cypher queries are processed
-
The Cypher runtimes
-
Query execution phases
Understanding Query Optimization
Query optimization is the process of improving query performance by identifying slow queries and applying specific techniques to make them faster. Slow queries impact user experience and consume valuable database resources.
The most important metric for prioritization is total time spent, calculated as frequency multiplied by duration. A query running 1,000 times at 500ms has more total impact than a query running once at 60 seconds, despite being individually faster.
How Cypher Queries Work
Cypher queries execute in two phases: finding anchor nodes then expanding relationships.
Anchor nodes are the starting points. Neo4j locates these nodes first, then expands outward by traversing relationships.
Quick tips
-
The fewer anchor nodes you have, the faster your query will run.
-
Filter data as early as possible to reduce work in later stages.
-
The fewer relationships you traverse, the faster your query will run.
-
The more specific the relationship types you use, the faster your query will run.
-
Using indexes helps Neo4j quickly find anchor nodes.
Cypher runtimes
There are three Cypher runtimes that Neo4j uses to execute queries, each with different performance characteristics and use cases:
-
Slotted - community edition and Aura free
-
Pipelined - default for enterprise edition and Aura production
-
Parallel - available in enterprise edition and Aura for analytical queries
Slotted Runtime
Slotted Runtime:
-
Interpreted runtime using pull-based "Volcano" execution model
-
Processes queries row-by-row with single-threaded execution
-
Faster planning phase but slower execution
-
Best for applications with short, non-cached queries where fast planning is more important than execution speed
Pipelined Runtime
Pipelined Runtime:
-
Push-based execution model that processes data in batches
-
Uses compiled approach with code generation for better performance
-
Transforms logical operators into execution pipelines with improved CPU cache utilization
-
Ideal for transactional use cases with high concurrency and most general query workloads
Parallel Runtime
Parallel Runtime:
-
Multi-threaded runtime
-
Allows single queries to utilize multiple CPU cores simultaneously
-
Designed for analytical, graph-global read queries that process large sections of the graph
-
Uses partitioned operators to segment and process data in parallel
-
Best for long-running analytical queries (>500ms) on systems with many CPUs and low concurrency workloads
-
Does not support write operations or procedures that aren’t thread-safe
Runtime tips
-
The pipelined runtime will generally provide the best performance.
-
The slotted runtime can be useful where planning time is more important than execution time, such as short, non-cached queries.
-
When multiple CPU cores are available, long-running queries can be expected to run faster on the parallel runtime.
Lesson Summary
In this lesson, you learned about the Cypher query lifecycle, including the parsing, planning, and execution phases that Neo4j goes through when processing your queries.
In the next lesson, you will learn how to use PROFILE and EXPLAIN to analyze query performance.