Checkpoint and Replan Events

Introduction

Neo4j performs internal operations that help maintain database health and optimize query performance.

In this lesson, you will learn how to monitor checkpoint and replan events to understand your database activity.

Understanding Checkpoints and Replan Events

Checkpoints are the process of flushing pending updates from memory to disk. They ensure your data is safe and durable by creating recovery points that allow Neo4j to restart quickly after an unexpected shutdown. This is a normal, essential operation that occurs automatically.

Replan events occur when Neo4j recreates the execution plan for a Cypher query. Neo4j caches query plans for efficiency, rebuilding them when database statistics change significantly—such as when the number of nodes, relationships, or index characteristics evolve—to ensure optimal performance as your data grows.

Monitoring Checkpoints

The Monitoring dashboard shows four checkpoint metrics that help you understand database write activity and health.

Checkpoint Events - Total Count

Total checkpoint events shows the total number of checkpoint events executed since the server started.

The count depends on your write activity—busier databases checkpoint more often. This is a cumulative counter that helps you understand overall checkpoint frequency.

Checkpoint Events - Rate

Checkpoint rate shows the number of checkpoint events per minute.

A consistent, moderate rate is normal and healthy. The rate naturally correlates with write transaction volume.

Checkpoint Cumulative Time

Cumulative checkpoint time shows the total time in milliseconds spent in checkpointing since the server started.

This metric helps you understand the overall time investment in checkpointing operations over the lifetime of the instance.

Last Checkpoint Duration

Last checkpoint duration shows the duration of the most recent checkpoint event in milliseconds.

Checkpoints typically run every 15 minutes or every 100,000 transactions, whichever comes first. The duration can range from several seconds to several minutes, which is normal and healthy. As a general guideline, if you see checkpoint duration consistently over 10 minutes in Aura, this suggests an opportunity to review storage performance or optimize your write patterns. (For self-managed Neo4j deployments, the threshold is typically 30 minutes.) These thresholds will vary depending on your specific workload.

Checkpoint count and cumulative time values may drop if background maintenance is performed by Aura. This is normal and doesn’t indicate a problem.

Monitoring Replan Events

The Monitoring dashboard shows two replan metrics that help you understand query plan caching efficiency.

Replan Events - Total Count

Total replan events shows the total number of times Cypher has replanned a query since the server started.

A low count with occasional spikes is normal and healthy. You’ll naturally see replanning when executing new queries for the first time, after schema changes, or when database statistics change significantly as your data grows—this is Neo4j adapting to your evolving database.

Replan Events - Rate

Replan rate shows the number of replanning events per minute.

As a general guideline, consistently high replan rates suggest an optimization opportunity: your queries may benefit from using parameters instead of literal values. What constitutes "high" will vary depending on your query patterns.

Troubleshooting Checkpoint and Replan Issues

Monitor checkpoint and replan metrics to identify database health issues and optimization opportunities.

Long Checkpoint Duration

As a general guideline, if checkpoint duration is consistently over 10 minutes in Aura, you should investigate potential causes. These thresholds will vary depending on your specific write patterns. For self-managed Neo4j, the threshold is typically 30 minutes.

Long checkpoints can indicate heavy write load or large transaction logs waiting to be flushed. Start by reviewing your write patterns—are you running large batch imports or bulk updates? Consider batching large updates into smaller transactions using CALL { } IN TRANSACTIONS.

Check whether long checkpoint durations correlate with specific operations. If checkpoints are only slow during heavy batch processing, this is expected behavior. If they’re consistently slow during normal activity, gather the following information:

The time range when slow checkpoints occur
Your typical write transaction patterns
Any recent changes to your data model or query patterns
Whether the issue correlates with specific application operations

Provide this information to Neo4j support. Support has access to infrastructure-level metrics not available in your dashboard—storage I/O performance, disk throughput, and system-level diagnostics. They can determine whether the issue requires write pattern optimization on your side or infrastructure-level tuning.

High Replan Rates

A low replan rate with occasional spikes is normal—you’ll naturally see replanning when executing new queries, after schema changes, or when database statistics change significantly as data volumes grow.

As a general guideline, if replan rates are consistently high, this suggests an optimization opportunity. High replan rates typically indicate queries using literal values instead of parameters.

Review your query logs to identify frequently executed queries. Look for queries with hardcoded values that could be replaced with parameters. This simple change can significantly improve query performance and reduce planning overhead.

Check Your Understanding

Checkpoint Duration Threshold

In Aura, what checkpoint duration value warrants investigation?

❏ Over 1 minute
❏ Over 5 minutes
✓ Over 10 minutes
❏ Over 30 minutes

Hint

Checkpoints typically run every 15 minutes or every 100,000 transactions. Normal checkpoints take several seconds to several minutes. Aura has a lower threshold than self-managed Neo4j.

Solution

Over 10 minutes is correct for Aura.

Checkpoint duration consistently over 10 minutes in Aura indicates potential issues such as I/O problems, heavy write load, or storage performance problems. This threshold signals that the checkpoint process is taking significantly longer than the expected several seconds to several minutes.

Over 1 minute and over 5 minutes are within normal ranges for checkpoints. Over 30 minutes is the threshold for self-managed Neo4j deployments, but Aura uses a lower 10-minute threshold due to the managed infrastructure.

Replan Event Causes

What is the primary cause of consistently high replan rates?

❏ Too many schema changes
✓ Queries not using parameters
❏ Insufficient memory
❏ Heavy write load

Hint

Consider how query plan caching works and what prevents plan reuse.

Solution

Queries not using parameters is correct.

When queries use literal values instead of parameters, each unique value triggers a new execution plan. This prevents Neo4j from reusing cached plans and causes unnecessary replanning. Using parameterized queries allows the database to reuse the same plan for all executions.

Too many schema changes cause occasional spikes, not consistent high rates. Insufficient memory affects query execution but not plan caching. Heavy write load may trigger more frequent checkpoints but doesn’t directly cause replan events.

Summary

Checkpoints flush pending updates from memory to disk, creating recovery points that enable faster database restarts. They typically run every 15 minutes or every 100,000 transactions, with typical duration of seconds to a few minutes. As a general guideline, duration consistently over 10 minutes in Aura warrants investigation (30 minutes for self-managed Neo4j). Replan events occur when Neo4j recreates query execution plans as database statistics change or when new queries are executed. Both are normal operations that help maintain database health and performance. High replan rates often indicate queries using literal values instead of parameters, which can be optimized as covered earlier in the course.

Aura In Production

Backup and Restore

Logs and Query Optimization

Monitoring Resources

Monitoring Instance Performance

Monitoring Database Health

Advanced Monitoring

Checkpoint and Replan Events

Introduction

Understanding Checkpoints and Replan Events

Monitoring Checkpoints

Checkpoint Events - Total Count

Checkpoint Events - Rate

Checkpoint Cumulative Time

Last Checkpoint Duration

Monitoring Replan Events

Replan Events - Total Count

Replan Events - Rate

Troubleshooting Checkpoint and Replan Issues

Long Checkpoint Duration

High Replan Rates

Check Your Understanding

Checkpoint Duration Threshold

Replan Event Causes

Summary

Chatbot