Query Profile Structure and Metrics
Overview
A Query Profile is a detailed report that provides insights into the execution of a SQL query within StarRocks. It offers a comprehensive view of the query's performance, including the time spent on each operation, the amount of data processed, and other relevant metrics. This information is invaluable for optimizing query performance, identifying bottlenecks, and troubleshooting issues.
80% of real-world slow queries are solved by spotting one of three red-flag metrics. This cheat-sheet gets you there before you drown in numbers.
Quick-Start
Profile a recent query:
1. List recent query IDs
A query ID is needed to analyze a query profile. Use SHOW PROFILELIST;
:
SHOW PROFILELIST;
SHOW PROFILELIST
is detailed in Text-based Query Profile Visualized Analysis. See that page if you are getting started.
2. Open the profile side-by-side with your SQL
Run ANALYZE PROFILE FOR <query_id>\G
or click Profile in the CelerData Web UI.
3. Skim the “Execution Overview” banner
Examine key metrics for overall execution performance:
- QueryExecutionWallTime: Total wall clock time for query execution
- QueryPeakMemoryUsagePerNode: Peak memory usage per node, with values exceeding 80% of BE memory indicating potential risks of data spill or Out-of-Memory (OOM) errors
- QueryCumulativeCpuTime / WallTime < 0.5 * num_cpu_cores means CPU is waiting (likely I/O or network)
If none fire, your query is usually fine—stop here.
4. Drill one level deeper
Identify the operators that consume the most time or the most memory, analyze their metrics, and determine the underlying cause to pinpoint performance bottlenecks.
The "Operator Metrics" section offers numerous guidelines to aid in identifying the root cause of performance issues.
Core Concepts
Query Execution Flow
The comprehensive execution flow of a SQL query involves the following stages:
- Planning: The query undergoes parsing, analysis, and optimization, culminating in the generation of a query plan.
- Scheduling: The scheduler and coordinator work together to distribute the query plan to all participating backend nodes.
- Execution: The query plan is executed using the pipeline execution engine.
Query Plan Structure
The StarRocks execution engine is designed to execute queries in a distributed manner, and the structure of a Query Profile reflects this design. The following components make up the distributed query plan:
- Fragment: The highest level of the execution tree, representing a logical unit of work. A query can be divided into one or more fragments.
- FragmentInstance: Each fragment is instantiated multiple times, with each instance (FragmentInstance) executed on a different computing node. This allows for parallel processing across nodes.
- Pipeline: A FragmentInstance is further divided into multiple pipelines, which are sequences of connected Operator instances. Pipelines define the execution path for a FragmentInstance.
- PipelineDriver: To maximize the utilization of computing resources, each pipeline can have multiple instances, known as PipelineDrivers. These drivers execute the pipeline in parallel, leveraging multiple computing cores.
- Operator: The fundamental execution unit, an Operator instance is a part of a PipelineDriver. Operators implement specific algorithms, such as aggregation, join, or scan, to process data.
Pipeline Execution Engine Concepts
The Pipeline Engine is a key component of the StarRocks execution engine. It is responsible for executing the query plan in a parallel and efficient manner. The Pipeline Engine is designed to handle complex query plans and large volumes of data, ensuring high performance and scalability.
Key concepts in the Pipeline Engine:
- Operator: A fundamental unit of execution responsible for implementing specific algorithms (e.g., aggregation, join, scan)
- Pipeline: A sequence of connected Operator instances representing the execution path
- PipelineDriver: Multiple instances of a pipeline for parallel execution
- Schedule: Non-blocking scheduling of pipelines using user-space time-slicing
Metric Merging Strategy
By default, StarRocks merges the FragmentInstance and PipelineDriver layers to reduce profile volume, resulting in a simplified three-layer structure:
- Fragment
- Pipeline
- Operator
You can control this merging behavior through the session variable pipeline_profile_level
:
1
(Default): Merged three-layer structure2
: Original five-layer structure- Other values: Treated as
1
When merging metrics, different strategies are used based on metric type:
-
Time-related metrics: Take the average
- Example:
OperatorTotalTime
is the average time consumption __MAX_OF_OperatorTotalTime
and__MIN_OF_OperatorTotalTime
record extremes
- Example:
-
Non-time-related metrics: Sum the values
- Example:
PullChunkNum
is the sum across all instances __MAX_OF_PullChunkNum
and__MIN_OF_PullChunkNum
record extremes
- Example:
-
Constant metrics: Same value across all instances (e.g.,
DegreeOfParallelism
)
Significant differences between MIN and MAX values often indicate data skew, particularly in aggregation and join operations.
Query Profile Metrics
Summary Metrics
Basic information about the query execution:
Metric | Description |
---|---|
Total | The total time consumed by the query, including Planning, Executing, and Profiling phase durations. |
Query State | Query state, possible states include Finished, Error, and Running. |
Query ID | Unique identifier for the query. |
Start Time | Timestamp when the query started. |
End Time | Timestamp when the query ended. |
Total | Total duration of the query. |
Query Type | Type of the query. |
Query State | Current state of the query. |
StarRocks Version | Version of StarRocks used. |
User | User who executed the query. |
Default Db | Default database used for the query. |
Sql Statement | SQL statement executed. |
Variables | Important variables used for the query. |
NonDefaultSessionVariables | Non-default session variables used for the query. |
Collect Profile Time | Time taken to collect the profile. |
IsProfileAsync | Indicates if the profile collection was asynchronous. |
Planner Metrics
It provides a comprehensive overview of the planner. Typically, if the total time spent on the planner is less than 10ms, it is not a cause for concern.
In certain scenarios, the planner may require more time:
- Complex queries may necessitate additional time for parsing and optimization to ensure an optimal execution plan.
- The presence of numerous materialized views can increase the time required for query rewriting.
- When multiple concurrent queries exhaust system resources and the query queue is utilized, the
Pending
time may be prolonged. - Queries involving external tables may incur additional time for communication with the external metadata server.
Example:
- -- Parser[1] 0
- -- Total[1] 3ms
- -- Analyzer[1] 0
- -- Lock[1] 0
- -- AnalyzeDatabase[1] 0
- -- AnalyzeTemporaryTable[1] 0
- -- AnalyzeTable[1] 0
- -- Transformer[1] 0
- -- Optimizer[1] 1ms
- -- MVPreprocess[1] 0
- -- MVTextRewrite[1] 0
- -- RuleBaseOptimize[1] 0
- -- CostBaseOptimize[1] 0
- -- PhysicalRewrite[1] 0
- -- DynamicRewrite[1] 0
- -- PlanValidate[1] 0
- -- InputDependenciesChecker[1] 0
- -- TypeChecker[1] 0
- -- CTEUniqueChecker[1] 0
- -- ColumnReuseChecker[1] 0
- -- ExecPlanBuild[1] 0
- -- Pending[1] 0
- -- Prepare[1] 0
- -- Deploy[1] 2ms
- -- DeployLockInternalTime[1] 2ms
- -- DeploySerializeConcurrencyTime[2] 0
- -- DeployStageByStageTime[6] 0
- -- DeployWaitTime[6] 1ms
- -- DeployAsyncSendTime[2] 0
- DeployDataSize: 10916
Reason:
Execution Overview Metrics
High-level execution statistics:
Metric | Description | Rule of Thumb |
---|---|---|
FrontendProfileMergeTime | FE-side profile processing time | < 10ms normal |
QueryAllocatedMemoryUsage | Total allocated memory across nodes | |
QueryDeallocatedMemoryUsage | Total deallocated memory across nodes | |
QueryPeakMemoryUsagePerNode | Maximum peak memory per node | < 80% capacity normal |
QuerySumMemoryUsage | Total peak memory across nodes | |
QueryExecutionWallTime | Wall time of execution | |
QueryCumulativeCpuTime | Total CPU time across nodes | Compare with walltime * totalCpuCores |
QueryCumulativeOperatorTime | Total operator execution time | Denominator for operator time percentages |
QueryCumulativeNetworkTime | Total Exchange node network time | |
QueryCumulativeScanTime | Total Scan node IO time | |
QueryPeakScheduleTime | Maximum Pipeline ScheduleTime | < 1s normal for simple queries |
QuerySpillBytes | Data spilled to disk | < 1GB normal |
Fragment Metrics
Fragment-level execution details:
Metric | Description |
---|---|
InstanceNum | Number of FragmentInstances |
InstanceIds | IDs of all FragmentInstances |
BackendNum | Number of participating BEs |
BackendAddresses | BE addresses |
FragmentInstancePrepareTime | Fragment Prepare phase duration |
InstanceAllocatedMemoryUsage | Total allocated memory for instances |
InstanceDeallocatedMemoryUsage | Total deallocated memory for instances |
InstancePeakMemoryUsage | Peak memory across instances |
Pipeline Metrics
Pipeline execution details and relationships:
Key relationships:
- DriverTotalTime = ActiveTime + PendingTime + ScheduleTime
- ActiveTime = ∑ OperatorTotalTime + OverheadTime
- PendingTime = InputEmptyTime + OutputFullTime + PreconditionBlockTime + PendingFinishTime
- InputEmptyTime = FirstInputEmptyTime + FollowupInputEmptyTime
Metric | Description |
---|---|
DegreeOfParallelism | Degree of pipeline execution parallelism. |
TotalDegreeOfParallelism | Sum of degrees of parallelism. Since the same Pipeline may execute on multiple machines, this item aggregates all values. |
DriverPrepareTime | Time taken by the Prepare phase. This metric is not included in DriverTotalTime. |
DriverTotalTime | Total execution time of the Pipeline, excluding the time spent in the Prepare phase. |
ActiveTime | Execution time of the Pipeline, including the execution time of each operator and the overall framework overhead, such as time spent in invoking methods like has_output, need_input, etc. |
PendingTime | Time the Pipeline is blocked from being scheduled for various reasons. |
InputEmptyTime | Time the Pipeline is blocked due to an empty input queue. |
FirstInputEmptyTime | Time the Pipeline is first blocked due to an empty input queue. The first blocking time is separately calculated because the first blocking is mainly caused by Pipeline dependencies. |
FollowupInputEmptyTime | Time the Pipeline is subsequently blocked due to an empty input queue. |
OutputFullTime | Time the Pipeline is blocked due to a full output queue. |
PreconditionBlockTime | Time the Pipeline is blocked due to unmet dependencies. |
PendingFinishTime | Time the Pipeline is blocked waiting for asynchronous tasks to finish. |
ScheduleTime | Scheduling time of the Pipeline, from entering the ready queue to being scheduled for execution. |
BlockByInputEmpty | Number of times the pipeline is blocked due to InputEmpty. |
BlockByOutputFull | Number of times the pipeline is blocked due to OutputFull. |
BlockByPrecondition | Number of times the pipeline is blocked due to unmet preconditions. |
Operator Metrics
Metric | Description |
---|---|
PrepareTime | Time spent on preparation. |
OperatorTotalTime | Total time consumed by the Operator. It satisfies the equation: OperatorTotalTime = PullTotalTime + PushTotalTime + SetFinishingTime + SetFinishedTime + CloseTime. It excludes time spent on preparation. |
PullTotalTime | Total time the Operator spends executing push_chunk. |
PushTotalTime | Total time the Operator spends executing pull_chunk. |
SetFinishingTime | Total time the Operator spends executing set_finishing. |
SetFinishedTime | Total time the Operator spends executing set_finished. |
PushRowNum | Cumulative number of input rows for the Operator. |
PullRowNum | Cumulative number of output rows for the Operator. |
JoinRuntimeFilterEvaluate | Number of times Join Runtime Filter is evaluated. |
JoinRuntimeFilterHashTime | Time spent computing hash for Join Runtime Filter. |
JoinRuntimeFilterInputRows | Number of input rows for Join Runtime Filter. |
JoinRuntimeFilterOutputRows | Number of output rows for Join Runtime Filter. |
JoinRuntimeFilterTime | Time spent on Join Runtime Filter. |