Functions Reference¶
This page provides a complete reference to all SQL functions available in Micromegas queries, including both standard DataFusion functions and Micromegas-specific extensions.
Micromegas Extensions¶
Table Functions¶
Table functions return tables that can be used in FROM clauses.
view_instance(view_name, identifier)¶
Creates a process or stream-scoped view instance for better performance.
Syntax:
Parameters:
-
view_name(Utf8): Name of the view ('log_entries', 'measures', 'thread_spans', 'async_events') -
identifier(Utf8): Process ID (for most views) or Stream ID (for thread_spans)
Returns: Schema depends on the view type (see Schema Reference)
Examples:
-- Get logs for a specific process
SELECT time, level, msg
FROM view_instance('log_entries', 'my_process_123')
WHERE level <= 3;
-- Get spans for a specific stream
SELECT name, duration
FROM view_instance('thread_spans', 'stream_456')
WHERE duration > 1000000; -- > 1ms
list_partitions() 🔧¶
Administrative Function - Lists available data partitions in the lakehouse with metadata including file paths, sizes, and schema hashes.
See Admin Functions Reference for details.
retire_partitions(view_set_name, view_instance_id, begin_insert_time, end_insert_time) 🔧¶
Administrative Function - Retires data partitions from the lakehouse for a specified time range.
⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.
materialize_partitions(view_name, begin_insert_time, end_insert_time, partition_delta_seconds) 🔧¶
Administrative Function - Materializes data partitions for a view over a specified time range.
See Admin Functions Reference for details.
list_view_sets() 🔧¶
Administrative Function - Lists all available view sets with their current schema information.
See Admin Functions Reference for details.
retire_partition_by_metadata(view_set_name, view_instance_id, begin_insert_time, end_insert_time) 🔧¶
Administrative Function - Retires a single partition by its metadata identifiers.
⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.
retire_partition_by_file(file_path) 🔧¶
Administrative Function - Retires a single partition by file path. Prefer retire_partition_by_metadata() for new code.
⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.
delete_duplicate_processes() 🔧¶
Administrative Function - Deletes duplicate processes within the query time range. Keeps the earliest entry per process_id.
⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.
delete_duplicate_streams() 🔧¶
Administrative Function - Deletes duplicate streams within the query time range. Keeps the earliest entry per stream_id.
⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.
delete_duplicate_blocks() 🔧¶
Administrative Function - Deletes duplicate blocks within the query time range. Keeps the earliest entry per block_id.
⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.
perfetto_trace_chunks(process_id, span_types, start_time, end_time)¶
Generates Perfetto trace chunks from process telemetry data for visualization and performance analysis.
Syntax:
SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(process_id, span_types, start_time, end_time)
ORDER BY chunk_id
Parameters:
-
process_id(Utf8): Process UUID to generate trace for -
span_types(Utf8): Type of spans to include:'thread','async', or'both' -
start_time(Timestamp): Start time for trace data (UTC timestamp) -
end_time(Timestamp): End time for trace data (UTC timestamp)
Returns:
| Column | Type | Description |
|---|---|---|
| chunk_id | Int32 | Sequential chunk identifier |
| chunk_data | Binary | Binary protobuf TracePacket data |
Examples:
-- Generate trace for thread spans only
SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(
'process-uuid-123',
'thread',
TIMESTAMP '2024-01-01T00:00:00Z',
TIMESTAMP '2024-01-01T01:00:00Z'
)
ORDER BY chunk_id;
-- Generate trace for both thread and async spans
SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(
'my-process-id',
'both',
NOW() - INTERVAL '1 hour',
NOW()
)
ORDER BY chunk_id;
Note: The returned binary data is in Perfetto protobuf format and can be loaded directly into the Perfetto UI for visualization and analysis.
process_spans(process_id, types)¶
Returns thread spans, async spans, or both from a process, with stream_id and thread_name columns prepended. For async spans, stream_id is empty and thread_name is 'async'.
Syntax:
-- Thread spans only
SELECT * FROM process_spans('process-uuid', 'thread')
-- Async spans only
SELECT * FROM process_spans('process-uuid', 'async')
-- Both combined
SELECT name, begin, end, depth, thread_name as lane
FROM process_spans('process-uuid', 'both')
ORDER BY lane, begin
Parameters:
process_id(Utf8): Process UUID to querytypes(Utf8):'thread','async', or'both'
Note: The time range is provided out of band via the query's begin/end parameters, not as function arguments.
Returns: Same schema as thread_spans with two additional leading columns:
| Column | Type | Description |
|---|---|---|
| stream_id | Dictionary(Int16, Utf8) | Stream identifier (empty for async) |
| thread_name | Dictionary(Int16, Utf8) | Thread display name ('async' for async spans) |
| id | Int64 | Span identifier |
| parent | Int64 | Parent span identifier |
| depth | UInt32 | Nesting depth |
| hash | UInt32 | Span hash |
| begin | Timestamp(Nanosecond) | Span start time |
| end | Timestamp(Nanosecond) | Span end time |
| duration | Int64 | Duration in nanoseconds |
| name | Dictionary(Int16, Utf8) | Span name (function) |
| target | Dictionary(Int16, Utf8) | Module/target |
| filename | Dictionary(Int16, Utf8) | Source file |
| line | UInt32 | Line number |
Examples:
-- Get all spans across threads for a process
SELECT stream_id, thread_name, name, duration
FROM process_spans('process-uuid-123', 'thread')
ORDER BY begin;
-- Analyze frame time per thread
SELECT thread_name, name, AVG(duration) / 1000000.0 as avg_ms
FROM process_spans('my-process-id', 'both')
WHERE depth = 0
GROUP BY thread_name, name
ORDER BY avg_ms DESC;
parse_block(block_id)¶
Parses transit-serialized objects from a block's payload and returns each object as a row with its type name and full content as JSONB. This provides a generic block inspection tool, independent of any specific view (logs, metrics, spans).
Syntax:
Parameters:
block_id(Utf8): UUID of the block to parse. Block IDs can be found in theblocksview.
Returns:
| Column | Type | Description |
|---|---|---|
| object_index | Int64 | Ordinal position within the block (global, starting from the block's object_offset) |
| type_name | Utf8 | Transit type name (e.g., "LogStringEvent", "BeginThreadSpanEvent") |
| value | Binary | Full object content as JSONB binary data |
Examples:
-- Find a block to inspect
SELECT block_id, nb_objects, "streams.tags"
FROM blocks
LIMIT 5;
-- Parse all objects in a block
SELECT object_index, type_name, jsonb_format_json(value)
FROM parse_block('550e8400-e29b-41d4-a716-446655440000');
-- Filter by object type
SELECT object_index, jsonb_format_json(value)
FROM parse_block('550e8400-e29b-41d4-a716-446655440000')
WHERE type_name LIKE 'Log%';
-- Extract a specific field from objects
SELECT object_index, type_name,
jsonb_as_string(jsonb_get(value, 'msg')) as msg
FROM parse_block('550e8400-e29b-41d4-a716-446655440000')
WHERE type_name = 'LogStringInteropEvent'
LIMIT 10;
Notes:
- The
valuecolumn contains JSONB-encoded objects. Each object includes a__typefield with the transit type name, which is especially useful for inspecting nested objects. - When a
LIMITis used without filters, the function stops parsing early for efficiency. When filters are present, all objects are materialized first so DataFusion can apply the filter. - Use with JSONB functions like
jsonb_get,jsonb_format_json, andjsonb_as_stringto extract and display object contents.
Scalar Functions¶
JSON/JSONB Functions¶
Micromegas provides functions for working with JSON data stored in binary JSONB format for efficient storage and querying.
jsonb_parse(json_string)¶
Parses a JSON string into binary JSONB format.
Syntax:
Parameters:
-
json_string(Multiple formats supported): JSON string to parse: -
Utf8- Plain string Dictionary<Int32, Utf8>- Dictionary-encoded string
Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB data
Example:
-- Parse JSON string into JSONB
SELECT jsonb_parse('{"name": "web_server", "port": 8080}') as parsed_json
FROM processes;
jsonb_path_query_first(jsonb, path)¶
Returns the first match of a JSONPath expression on a JSONB value, or NULL if no match is found.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB value in any of these formats: -
Binary- Plain JSONB binary -
Dictionary<Int32, Binary>- Dictionary-encoded JSONB -
path(Utf8): A JSONPath expression string (e.g.,$.store.book[0].title)
Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB value of the first match, or NULL if no match
Examples:
-- Extract a nested value
SELECT jsonb_path_query_first(jsonb_parse('{"user": {"name": "Alice"}}'), '$.user.name') as name;
-- Returns: "Alice" (as JSONB)
-- Array index access
SELECT jsonb_path_query_first(jsonb_parse('{"items": [10, 20, 30]}'), '$.items[1]') as second;
-- Returns: 20
-- First wildcard match
SELECT jsonb_as_string(jsonb_path_query_first(data, '$.tags[0]')) as first_tag
FROM processes;
jsonb_path_query(jsonb, path)¶
Returns all matches of a JSONPath expression on a JSONB value as a JSONB array.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB value in any of these formats: -
Binary- Plain JSONB binary -
Dictionary<Int32, Binary>- Dictionary-encoded JSONB -
path(Utf8): A JSONPath expression string (e.g.,$.store.book[*].title)
Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB array containing all matched values, or an empty array if no match
Examples:
-- Extract all names from an array of objects
SELECT jsonb_path_query(jsonb_parse('{"users": [{"name": "Alice"}, {"name": "Bob"}]}'), '$.users[*].name') as names;
-- Returns: ["Alice", "Bob"]
-- All array elements
SELECT jsonb_path_query(jsonb_parse('[1, 2, 3]'), '$[*]') as all_items;
-- Returns: [1, 2, 3]
-- No match returns empty array
SELECT jsonb_path_query(jsonb_parse('{"a": 1}'), '$.missing') as result;
-- Returns: []
Filter Predicates in JSONPath¶
jsonb_path_query and jsonb_path_query_first support SQL/JSON path syntax for filter predicates. This differs from the JavaScript-style JSONPath syntax commonly found in online tutorials.
Key difference: Filters use ? () after a wildcard step, not [?()] inside brackets.
| Feature | JavaScript JSONPath (NOT supported) | SQL/JSON path (supported) |
|---|---|---|
| Filter in brackets | $.items[?(@.price < 10)] |
$.items[*] ? (@.price < 10) |
| String equality | [?(@.type=="human")] |
[*] ? (@.type == "human") |
| Logical AND | [?(@.a > 1 && @.b < 5)] |
[*] ? (@.a > 1 && @.b < 5) |
Examples:
-- Filter array elements by field value
SELECT jsonb_path_query(
jsonb_parse('{"items": [{"type": "active", "id": 1}, {"type": "inactive", "id": 2}]}'),
'$.items[*] ? (@.type == "active")'
) as active_items;
-- Returns: [{"id": 1, "type": "active"}]
-- Numeric comparison
SELECT jsonb_path_query(
jsonb_parse('{"scores": [{"name": "Alice", "val": 85}, {"name": "Bob", "val": 42}]}'),
'$.scores[*] ? (@.val > 50)'
) as high_scores;
-- Returns: [{"name": "Alice", "val": 85}]
-- Get first match with filter
SELECT jsonb_path_query_first(
jsonb_parse('{"users": [{"role": "admin", "name": "Alice"}, {"role": "user", "name": "Bob"}]}'),
'$.users[*] ? (@.role == "admin")'
) as first_admin;
-- Returns: {"name": "Alice", "role": "admin"}
-- Combined with jsonb_array_elements for row expansion
SELECT jsonb_as_string(jsonb_get(value, 'name')) as player_name
FROM jsonb_array_elements(
jsonb_path_query(msg_jsonb, '$.teams[*].players[*] ? (@.type == "human")')
)
Common mistake
Using JavaScript-style filter syntax like $.items[?(@.type=="active")] will result in a parse error. Always use the SQL/JSON style: $.items[*] ? (@.type == "active").
jsonb_get(jsonb, key)¶
Extracts a value from a JSONB object by key name.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB object in any of these formats: -
Binary- Plain JSONB binary -
Dictionary<Int32, Binary>- Dictionary-encoded JSONB -
key(Utf8): Key name to extract
Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB value or NULL if key not found
Example:
-- Extract name field from JSON data
SELECT jsonb_get(jsonb_parse('{"name": "web_server", "port": 8080}'), 'name') as name_value
FROM processes;
jsonb_format_json(jsonb)¶
Converts a JSONB value back to a human-readable JSON string.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB value in any of these formats: -
Dictionary<Int32, Binary>- Dictionary-encoded JSONB (default) Binary- Non-dictionary JSONB
Returns: Dictionary<Int32, Utf8> - Dictionary-encoded JSON string representation
Examples:
-- Format JSONB back to JSON string
SELECT jsonb_format_json(jsonb_parse('{"name": "web_server"}')) as json_string
FROM processes;
-- Works directly with dictionary-encoded properties
SELECT jsonb_format_json(properties_to_jsonb(properties)) as json_props
FROM log_entries;
-- Format property values as JSON
SELECT jsonb_format_json(properties) as json_string
FROM processes
WHERE properties IS NOT NULL;
jsonb_as_string(jsonb)¶
Casts a JSONB value to a string.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB value to convert: -
Binary- Plain JSONB binary Dictionary<Int32, Binary>- Dictionary-encoded JSONB
Returns: Dictionary<Int32, Utf8> - Dictionary-encoded string value or NULL if not a string
Example:
-- Extract string value from JSONB
SELECT jsonb_as_string(jsonb_get(jsonb_parse('{"service": "web_server"}'), 'service')) as service_name
FROM processes;
jsonb_as_f64(jsonb)¶
Casts a JSONB value to a 64-bit float.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB value to convert: -
Binary- Plain JSONB binary Dictionary<Int32, Binary>- Dictionary-encoded JSONB
Returns: Float64 - Numeric value or NULL if not a number
Example:
-- Extract numeric value from JSONB
SELECT jsonb_as_f64(jsonb_get(jsonb_parse('{"cpu_usage": 75.5}'), 'cpu_usage')) as cpu_usage
FROM processes;
jsonb_as_i64(jsonb)¶
Casts a JSONB value to a 64-bit integer.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB value to convert: -
Binary- Plain JSONB binary Dictionary<Int32, Binary>- Dictionary-encoded JSONB
Returns: Int64 - Integer value or NULL if not an integer
Example:
-- Extract integer value from JSONB
SELECT jsonb_as_i64(jsonb_get(jsonb_parse('{"port": 8080}'), 'port')) as port_number
FROM processes;
jsonb_object_keys(jsonb)¶
Returns the keys of a JSONB object as an array of strings.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB object: -
Binary- Plain JSONB binary Dictionary<Int32, Binary>- Dictionary-encoded JSONB
Returns: Dictionary<Int32, List<Utf8>> - Dictionary-encoded array of key names for memory efficiency (repeated key lists share the same dictionary entry), or NULL if input is not an object
Examples:
-- Get keys from a JSONB object
SELECT jsonb_object_keys(jsonb_parse('{"name": "server", "port": 8080}')) as keys;
-- Returns: ["name", "port"]
-- Get keys from process properties
SELECT jsonb_object_keys(properties) as prop_keys
FROM processes
LIMIT 5;
jsonb_array_length(jsonb)¶
Returns the number of elements in a JSONB array.
Syntax:
Parameters:
-
jsonb(Multiple formats supported): JSONB value: -
Binary- Plain JSONB binary Dictionary<Int32, Binary>- Dictionary-encoded JSONB
Returns: Int64 - The number of elements in the array, or NULL if the input is not an array
Examples:
-- Count elements in an array
SELECT jsonb_array_length(jsonb_parse('[1, 2, 3]')) as len;
-- Returns: 3
-- Empty array returns 0
SELECT jsonb_array_length(jsonb_parse('[]')) as len;
-- Returns: 0
-- Non-array input returns NULL
SELECT jsonb_array_length(jsonb_parse('{"key": "value"}')) as len;
-- Returns: NULL
-- Filter by array size
SELECT *
FROM events
WHERE jsonb_array_length(jsonb_get(msg_jsonb, 'items')) > 5;
jsonb_each(jsonb_value)¶
Expands a JSONB object or array into rows of key-value pairs. This is a table-returning function (UDTF) that produces one row per entry.
For objects, key is the field name. For arrays, key is the element index as a string ("0", "1", ...).
Syntax:
Parameters:
jsonb_value(Binary/JSONB): A JSONB object or array value, provided as a literal or a subquery returning a single JSONB column. If the subquery returns multiple rows, the entries from all rows are concatenated. Null values are skipped. Returns an error if the input is a scalar (e.g., number or string).
Returns:
| Column | Type | Description |
|---|---|---|
| key | Utf8 | Object field name, or array index as a string |
| value | Binary (JSONB) | Value as JSONB bytes, composable with jsonb_as_string, jsonb_format_json, etc. |
Examples:
-- Expand process properties into rows
SELECT key, jsonb_as_string(value) as value
FROM jsonb_each(
(SELECT properties FROM processes WHERE process_id = 'my_process_123')
)
-- Use with other JSONB functions for nested values
SELECT key, jsonb_format_json(value) as json_value
FROM jsonb_each(
(SELECT jsonb_parse('{"name": "server", "port": 8080, "tags": ["prod", "us-east"]}'))
)
-- Expand a JSONB array into rows
SELECT key as index, jsonb_format_json(value) as element
FROM jsonb_each(
(SELECT jsonb_parse('[10, 20, 30]'))
)
-- Returns: ("0", 10), ("1", 20), ("2", 30)
jsonb_array_elements(jsonb_value)¶
Expands a JSONB array into a set of rows, one per element. This is a table-returning function (UDTF) that produces one row per array element with a single value column.
Unlike jsonb_each, this function only accepts arrays (not objects) and does not produce a key column, making it more natural for array unnesting.
Syntax:
Parameters:
jsonb_value(Binary/JSONB): A JSONB array value, provided as a literal, subquery, or expression (e.g.,jsonb_path_query(...)). If a subquery returns multiple rows, the elements from all arrays are concatenated. Returns an error if the input is not a JSONB array.
Returns:
| Column | Type | Description |
|---|---|---|
| value | Binary (JSONB) | Array element as JSONB bytes, composable with jsonb_as_string, jsonb_get, jsonb_format_json, etc. |
Examples:
-- Unnest a simple array
SELECT jsonb_as_string(value) as val
FROM jsonb_array_elements(jsonb_parse('[1, 2, 3]'))
-- Unnest array of objects and extract a field
SELECT jsonb_as_string(jsonb_get(value, 'name')) as name
FROM jsonb_array_elements(jsonb_parse('[{"name": "Alice"}, {"name": "Bob"}]'))
-- Unnest from a subquery
SELECT jsonb_as_string(jsonb_get(value, 'profile_id')) as profile_id
FROM jsonb_array_elements((SELECT jsonb_path_query(msg_jsonb, '$.teams[*].players[*]') FROM events LIMIT 1))
WHERE jsonb_as_string(jsonb_get(value, 'type')) = 'human'
Data Access Functions¶
get_payload(process_id, stream_id, block_id)¶
Retrieves the raw binary payload of a telemetry block from data lake storage.
Syntax:
Parameters:
-
process_id(Utf8): Process identifier -
stream_id(Utf8): Stream identifier -
block_id(Utf8): Block identifier
Returns: Binary - Raw block payload data
Example:
-- Get raw payload data for specific blocks
SELECT process_id, stream_id, block_id, get_payload(process_id, stream_id, block_id) as payload
FROM blocks
WHERE insert_time >= NOW() - INTERVAL '1 hour'
LIMIT 10;
Note: This is an async function that fetches data from object storage. Use sparingly in queries as it can impact performance.
Property Functions¶
Micromegas provides specialized functions for working with property data, including efficient dictionary encoding for memory optimization.
property_get(properties, key)¶
Extracts a value from a properties map with automatic format detection and optimized performance for JSONB data.
Syntax:
Parameters:
-
properties(Multiple formats supported): Properties data in any of these formats:Dictionary<Int32, Binary>- JSONB format (default, optimized)List<Struct<key, value>>- Legacy format (automatic conversion)Dictionary<Int32, List<Struct>>- Dictionary-encoded legacyBinary- Non-dictionary JSONB
-
key(Utf8): Property key to extract
Returns: Dictionary<Int32, Utf8> - Property value or NULL if not found
Performance: Optimized for the new JSONB format. Legacy formats are automatically converted for backward compatibility.
Examples:
-- Get thread name from process properties (works with all formats)
SELECT time, msg, property_get(process_properties, 'thread-name') as thread
FROM log_entries
WHERE property_get(process_properties, 'thread-name') IS NOT NULL;
-- Filter by custom property
SELECT time, name, value
FROM measures
WHERE property_get(properties, 'source') = 'system_monitor';
-- Direct JSONB property access (post-migration default)
SELECT time, msg, property_get(properties, 'service') as service
FROM log_entries
WHERE property_get(properties, 'env') = 'production';
properties_length(properties)¶
Returns the number of properties in a properties map with support for multiple storage formats.
Syntax:
Parameters:
-
properties(Multiple formats supported): Properties data in any of these formats:List<Struct<key, value>>- Legacy formatDictionary<Int32, Binary>- JSONB format (optimized)Dictionary<Int32, List<Struct>>- Dictionary-encoded legacyBinary- Non-dictionary JSONB
Returns: Int32 - Number of properties
Examples:
-- Works with regular properties
SELECT properties_length(properties) as prop_count
FROM measures;
-- Works with dictionary-encoded properties
SELECT properties_length(properties_to_dict(properties)) as prop_count
FROM measures;
-- JSONB property counting
SELECT properties_length(properties_to_jsonb(properties)) as prop_count
FROM measures;
properties_to_dict(properties)¶
Converts a properties list to a dictionary-encoded array for memory efficiency.
Syntax:
Parameters:
properties(List<Struct<key: Utf8, value: Utf8>>): Properties list to encode
Returns: Dictionary<Int32, List<Struct<key: Utf8, value: Utf8>>> - Dictionary-encoded properties
Examples:
-- Convert properties to dictionary encoding for memory efficiency
SELECT properties_to_dict(properties) as dict_props
FROM measures;
-- Use with other functions via properties_to_array
SELECT array_length(properties_to_array(properties_to_dict(properties))) as prop_count
FROM measures;
Note: Dictionary encoding can reduce memory usage by 50-80% for datasets with repeated property patterns.
properties_to_jsonb(properties)¶
Converts a properties list to binary JSONB format with dictionary encoding for efficient storage and querying.
Syntax:
Parameters:
-
properties(Multiple formats supported): Properties in any of these formats:List<Struct<key: Utf8, value: Utf8>>- Regular properties listDictionary<Int32, List<Struct>>- Dictionary-encoded propertiesBinary- Non-dictionary JSONBDictionary<Int32, Binary>- JSONB format
Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB object containing the properties as key-value pairs
Examples:
-- Convert properties to JSONB format
SELECT properties_to_jsonb(properties) as jsonb_props
FROM log_entries;
-- Use with other JSONB functions
SELECT jsonb_get(properties_to_jsonb(properties), 'hostname') as hostname
FROM log_entries;
-- Convert dictionary-encoded properties to JSONB
SELECT properties_to_jsonb(properties_to_dict(properties)) as jsonb_props
FROM measures;
Note: This function returns Dictionary<Int32, Binary> format for optimal memory usage with Arrow's built-in dictionary encoding.
properties_to_array(dict_properties)¶
Converts dictionary-encoded properties back to a regular array for compatibility with standard functions.
Syntax:
Parameters:
dict_properties(Dictionary<Int32, List<Struct>>): Dictionary-encoded properties
Returns: List<Struct<key: Utf8, value: Utf8>> - Regular properties array
Examples:
-- Convert dictionary-encoded properties back to array
SELECT properties_to_array(properties_to_dict(properties)) as props
FROM measures;
-- Use with array functions
SELECT array_length(properties_to_array(properties_to_dict(properties))) as count
FROM measures;
Histogram Functions¶
Micromegas provides a comprehensive set of functions for creating and analyzing histograms, enabling efficient statistical analysis of large datasets.
make_histogram(start, end, bins, values)¶
Creates histogram data from numeric values with specified range and bin count.
Syntax:
Parameters:
-
start(Float64): Histogram minimum value -
end(Float64): Histogram maximum value -
bins(Int64): Number of histogram bins -
values(Float64): Column of numeric values to histogram
Returns: Histogram structure with buckets and counts
Example:
-- Create histogram of response times (0-50ms, 20 bins)
SELECT make_histogram(0.0, 50.0, 20, CAST(duration AS FLOAT64) / 1000000.0) as duration_histogram
FROM view_instance('thread_spans', 'web_server_123')
WHERE name = 'handle_request';
sum_histograms(histogram_column)¶
Aggregates multiple histograms by summing their bins.
Syntax:
Parameters:
histogram_column(Histogram): Column containing histogram values
Returns: Combined histogram with summed bins
Example:
-- Combine histograms across processes
SELECT sum_histograms(duration_histogram) as combined_histogram
FROM cpu_usage_per_process_per_minute
WHERE time_bin >= NOW() - INTERVAL '1 hour';
expand_histogram(histogram)¶
Expands a histogram struct into rows of (bin_center, count) for visualization as a bar chart.
Syntax:
Parameters:
histogram(Histogram struct): A histogram value frommake_histogram()or a subquery returning one
Returns:
| Column | Type | Description |
|---|---|---|
| bin_center | Float64 | Center value of each bin |
| count | UInt64 | Number of values in the bin |
Examples:
-- Expand a CPU usage histogram into chartable rows
SELECT bin_center, count
FROM expand_histogram(
(SELECT make_histogram(0.0, 100.0, 100, value)
FROM measures
WHERE name = 'cpu_usage')
)
-- Histogram for a specific process
SELECT bin_center, count
FROM expand_histogram(
(SELECT make_histogram(0.0, 50.0, 50, value)
FROM view_instance('measures', 'my_process_123')
WHERE name = 'frame_time')
)
Note: This function is designed for visualization. Use with a bar chart to display distribution data.
quantile_from_histogram(histogram, quantile)¶
Estimates a quantile value from a histogram.
Syntax:
Parameters:
-
histogram(Histogram): Histogram to analyze -
quantile(Float64): Quantile to estimate (0.0 to 1.0)
Returns: Float64 - Estimated quantile value
Examples:
-- Get median (50th percentile) response time
SELECT quantile_from_histogram(duration_histogram, 0.5) as median_duration
FROM performance_histograms;
-- Get 95th percentile response time
SELECT quantile_from_histogram(duration_histogram, 0.95) as p95_duration
FROM performance_histograms;
variance_from_histogram(histogram)¶
Calculates variance from histogram data.
Syntax:
Parameters:
histogram(Histogram): Histogram to analyze
Returns: Float64 - Variance of the histogram data
Example:
-- Calculate response time variance
SELECT variance_from_histogram(duration_histogram) as duration_variance
FROM performance_histograms;
count_from_histogram(histogram)¶
Extracts the total count of values from a histogram.
Syntax:
Parameters:
histogram(Histogram): Histogram to analyze
Returns: UInt64 - Total number of values in the histogram
Example:
-- Get total sample count from histogram
SELECT count_from_histogram(duration_histogram) as total_samples
FROM performance_histograms;
sum_from_histogram(histogram)¶
Extracts the sum of all values from a histogram.
Syntax:
Parameters:
histogram(Histogram): Histogram to analyze
Returns: Float64 - Sum of all values in the histogram
Example:
-- Get total duration from histogram
SELECT sum_from_histogram(duration_histogram) as total_duration
FROM performance_histograms;
Standard SQL Functions¶
Micromegas supports all standard DataFusion SQL functions including math, string, date/time, conditional, and array functions. For a complete list with examples, see the DataFusion Scalar Functions documentation.
Advanced Query Patterns¶
Histogram Analysis¶
-- Create performance histogram (0-100ms, 10 bins)
SELECT make_histogram(0.0, 100.0, 10, duration / 1000000.0) as response_time_ms_histogram
FROM view_instance('thread_spans', 'web_server')
WHERE name = 'handle_request'
AND duration > 1000000; -- > 1ms
-- Analyze histogram statistics
SELECT
quantile_from_histogram(response_time_histogram, 0.5) as median_ms,
quantile_from_histogram(response_time_histogram, 0.95) as p95_ms,
quantile_from_histogram(response_time_histogram, 0.99) as p99_ms,
variance_from_histogram(response_time_histogram) as variance,
count_from_histogram(response_time_histogram) as sample_count,
sum_from_histogram(response_time_histogram) as total_time_ms
FROM performance_histograms
WHERE time_bin >= NOW() - INTERVAL '1 hour';
-- Aggregate histograms across multiple processes
SELECT
time_bin,
sum_histograms(cpu_usage_histo) as combined_cpu_histogram,
quantile_from_histogram(sum_histograms(cpu_usage_histo), 0.95) as p95_cpu
FROM cpu_usage_per_process_per_minute
WHERE time_bin >= NOW() - INTERVAL '1 day'
GROUP BY time_bin
ORDER BY time_bin;
Property Extraction and Filtering¶
-- Find logs with specific thread names
SELECT time, level, msg, property_get(process_properties, 'thread-name') as thread
FROM log_entries
WHERE property_get(process_properties, 'thread-name') LIKE '%worker%'
ORDER BY time DESC;
High-Performance JSONB Property Access¶
-- Convert properties to JSONB for better performance
SELECT
time,
msg,
property_get(properties_to_jsonb(properties), 'service') as service,
property_get(properties_to_jsonb(properties), 'version') as version
FROM log_entries
WHERE property_get(properties_to_jsonb(properties), 'env') = 'production'
AND time >= NOW() - INTERVAL '1 hour'
ORDER BY time DESC;
-- Efficient property filtering with JSONB
WITH jsonb_logs AS (
SELECT
time,
level,
msg,
properties_to_jsonb(properties) as jsonb_props
FROM log_entries
WHERE time >= NOW() - INTERVAL '1 day'
)
SELECT
time,
level,
msg,
property_get(jsonb_props, 'service') as service,
property_get(jsonb_props, 'request_id') as request_id
FROM jsonb_logs
WHERE property_get(jsonb_props, 'error_code') IS NOT NULL
ORDER BY time DESC;
-- Property aggregation with optimal performance
SELECT
property_get(properties_to_jsonb(properties), 'service') as service,
property_get(properties_to_jsonb(properties), 'env') as environment,
COUNT(*) as event_count,
COUNT(CASE WHEN level <= 2 THEN 1 END) as error_count
FROM log_entries
WHERE time >= NOW() - INTERVAL '1 hour'
AND property_get(properties_to_jsonb(properties), 'service') IS NOT NULL
GROUP BY service, environment
ORDER BY error_count DESC;
JSON Data Processing¶
-- Parse and extract configuration from JSON logs
SELECT
time,
msg,
jsonb_as_string(jsonb_get(jsonb_parse(msg), 'service')) as service_name,
jsonb_as_i64(jsonb_get(jsonb_parse(msg), 'port')) as port,
jsonb_as_f64(jsonb_get(jsonb_parse(msg), 'cpu_limit')) as cpu_limit
FROM log_entries
WHERE msg LIKE '%{%' -- Contains JSON
AND jsonb_parse(msg) IS NOT NULL
ORDER BY time DESC;
-- Aggregate metrics from JSON payloads
SELECT
jsonb_as_string(jsonb_get(jsonb_parse(msg), 'service')) as service,
COUNT(*) as event_count,
AVG(jsonb_as_f64(jsonb_get(jsonb_parse(msg), 'response_time'))) as avg_response_ms
FROM log_entries
WHERE msg LIKE '%response_time%'
AND jsonb_parse(msg) IS NOT NULL
GROUP BY service
ORDER BY avg_response_ms DESC;
Time-based Aggregation¶
-- Hourly error counts
SELECT
date_trunc('hour', time) as hour,
COUNT(*) as error_count
FROM log_entries
WHERE level <= 2 -- Fatal and Error
AND time >= NOW() - INTERVAL '24 hours'
GROUP BY date_trunc('hour', time)
ORDER BY hour;
Performance Trace Analysis¶
-- Top 10 slowest functions with statistics
SELECT
name,
COUNT(*) as call_count,
AVG(duration) / 1000000.0 as avg_ms,
MAX(duration) / 1000000.0 as max_ms,
STDDEV(duration) / 1000000.0 as stddev_ms
FROM view_instance('thread_spans', 'my_process')
WHERE duration > 100000 -- > 0.1ms
GROUP BY name
ORDER BY avg_ms DESC
LIMIT 10;
DataFusion Reference¶
Micromegas supports all standard DataFusion SQL syntax, functions, and operators. For complete documentation including functions, operators, data types, and SQL syntax, see the Apache DataFusion SQL Reference.
Next Steps¶
- Query Patterns - Common observability query patterns
- Performance Guide - Optimize your queries for best performance
- Schema Reference - Complete view and field reference