Introduction
You have learned how to monitor instance-level metrics like heap memory and garbage collection. Storage metrics are equally critical—they reveal whether your database can handle your data volume and help you manage costs.
In this lesson, you will learn how to monitor database store size and identify when compaction is required.
Understanding store size
Aura tracks two storage metrics that help you monitor database growth.
Allocated space is the total disk space reserved for your database. This includes active data, deleted data not yet reclaimed, and internal database structures. Neo4j allocates space in chunks for efficiency, so this value may be larger than your actual data.
Used space is the portion of allocated space that contains active data. This represents the actual size of your graph data.
The expected pattern is for allocated space to be slightly larger than used space due to how Neo4j manages storage allocation.
Monitoring store size
Monitor the allocated space and used space metrics to understand your storage usage.
A small difference between allocated and used space is normal. When you delete data, space is marked as available but not immediately reclaimed. Neo4j also allocates space in advance for efficiency.
A large difference indicates significant storage waste. This typically occurs after major data deletions where properties, nodes, or relationships have been removed. The space remains allocated but unused.
If allocated space grows while used space stays stable, many deletions are occurring without new data additions. This pattern indicates compaction would be beneficial.
Database compaction reclaims unused space after deletions. Compaction creates a new copy of your data, leaving behind unused allocated space.
Consider compaction when you notice a significant difference between allocated and used space. This commonly happens after large data cleanup operations or when storage costs are a concern.
To compact your Aura database, export a snapshot and restore it to a new instance, or use Neo4j Desktop following the compaction guide.
After compaction, used space remains the same while allocated space reduces to match. This reclaims wasted storage and may reduce costs.
Compaction requires downtime
Plan compaction during maintenance windows as it requires taking the database offline.
Planning for storage growth
Monitor storage growth over time to plan capacity. Calculate your growth rate by comparing store size at two time points and dividing by the time difference.
Upgrade your instance before reaching storage limits. Leave a buffer of up to 30% for unexpected growth and operational overhead.
Consider optimizing your data model before scaling. Review whether you can archive old data, remove unnecessary properties, or use more efficient data types.
Vector embeddings and vector indexes are particularly storage-intensive and may warrant evaluation.
Check Your Understanding
When to compact
What indicates that database compaction would be beneficial?
-
❏ Allocated and used space are equal
-
❏ Allocated space is slightly larger than used space
-
✓ Allocated space is significantly larger than used space after major deletions
-
❏ Used space is growing steadily over time
Hint
Consider when space becomes wasted rather than just allocated.
Solution
Allocated space is significantly larger than used space after major deletions is correct.
A large difference indicates storage waste from deleted data that hasn’t been reclaimed. Compaction would reclaim this unused space and potentially reduce storage costs.
Equal spaces would indicate no waste. Slightly larger allocated space is normal due to Neo4j’s allocation strategy. Growing used space indicates data growth, not waste.
Summary
Aura tracks allocated space and used space to help you monitor storage. A small difference is normal due to how Neo4j allocates space, but a large difference after deletions indicates compaction would reclaim wasted storage. Monitor growth patterns to plan capacity and upgrade before reaching limits.
In the next lesson, you will learn about query rate and latency metrics at the database level.