Monitoring Query Rate and Latency

Introduction

Query performance directly affects user experience. Slow queries frustrate users, while failed queries may indicate application problems or resource constraints.

In this lesson, you will learn how to monitor query rates, failed queries, and query latency percentiles.

Understanding Query Metrics

Aura provides two types of query metrics at the database level to help you monitor performance.

Query rate shows how many queries execute per minute. This includes both successful queries and failed queries. Query rate helps you understand database load and identify unusual patterns like sudden spikes or drops.

Query latency shows how long queries take to execute, measured in percentiles. The 50th percentile, or the median query time shows typical performance, the 75th percentile shows how most queries perform, and the 99th percentile captures your slowest queries.

Monitoring Query Performance

Monitor both query rate and latency together to understand database health.

As a general guideline, failed queries should be very low, typically less than 1% of total queries. High failed query rates indicate application errors, syntax problems, permission issues, or resource constraints causing timeouts. Review query logs to identify failing queries and their root causes.

A stable or gradually increasing query rate with low latency indicates a healthy database. Sudden changes in query rate may indicate new deployments, application issues, or connectivity problems.

For query latency, a low 50th percentile, for example under 100ms, generally indicates typical queries are fast. If the 99th percentile is much higher than the median, specific queries need optimization rather than scaling the instance. A wide spread between percentiles indicates inconsistent query performance.

Interpreting Query Patterns

Different patterns reveal different issues and require different actions.

High query rate with increasing latency indicates resource pressure. The database may be handling more load than it can process efficiently. Review CPU, memory, and page cache metrics to identify bottlenecks.

Low failed queries with high 99th percentile latency indicates specific slow queries need optimization. Most queries work well, but outliers affect user experience. Use query logs to identify and optimize the slowest queries.

Increasing failed queries over time indicates growing problems. Resource pressure may cause query timeouts, or application issues may be spreading. Investigate failed queries immediately to prevent further degradation.

When You Should Optimize or Scale

Use query metrics to decide between optimization and scaling.

If the 99th percentile latency is very high, as a guideline more than 10x the median, you should optimize specific queries first. Scaling won’t fix poorly written queries. Review query logs for duration above your threshold and add indexes or rewrite queries.

If all percentiles are high, you have systemic performance issues. Review all instance metrics including CPU, memory, and page cache before scaling. The instance may be undersized for your workload.

Monitor query metrics before and after making changes to validate improvements. Set latency targets based on your specific use case, for example under 100ms for user-facing queries or under 5 seconds for reports. These thresholds will vary depending on your application requirements and user expectations.

Check Your Understanding

Understanding Latency Percentiles

What does a large gap between the 50th percentile and 99th percentile latency indicate?

  • ❏ The database needs more memory

  • ❏ Network connectivity is unreliable

  • ✓ Specific slow queries need optimization rather than scaling

  • ❏ All queries are performing poorly

hint

Consider what it means when most queries are fast but the slowest ones are much slower.

solution

Specific slow queries need optimization rather than scaling is correct.

A large gap indicates most queries execute quickly (low median) but a small percentage are very slow (high 99th percentile). This pattern shows the instance handles queries well overall, but specific queries are outliers needing optimization. Scaling won’t fix inefficient queries.

More memory would affect all queries uniformly. Network issues would impact all percentiles. All queries performing poorly would show high values across all percentiles.

Summary

Query rate and latency metrics help you monitor database performance. Failed queries should be very low, and the 99th percentile latency reveals slow queries that impact user experience. A large gap between median and 99th percentile indicates specific queries need optimization rather than scaling.

In the next lesson, you will learn about monitoring transactions and identifying transaction issues.

Chatbot

How can I help you today?