Skip to content

Functions Reference

This page provides a complete reference to all SQL functions available in Micromegas queries, including both standard DataFusion functions and Micromegas-specific extensions.

Micromegas Extensions

Table Functions

Table functions return tables that can be used in FROM clauses.

view_instance(view_name, identifier)

Creates a process or stream-scoped view instance for better performance.

Syntax:

view_instance(view_name, identifier)

Parameters:

  • view_name (Utf8): Name of the view ('log_entries', 'measures', 'thread_spans', 'async_events')

  • identifier (Utf8): Process ID (for most views) or Stream ID (for thread_spans)

Returns: Schema depends on the view type (see Schema Reference)

Examples:

-- Get logs for a specific process
SELECT time, level, msg
FROM view_instance('log_entries', 'my_process_123')
WHERE level <= 3;

-- Get spans for a specific stream
SELECT name, duration
FROM view_instance('thread_spans', 'stream_456')
WHERE duration > 1000000;  -- > 1ms

list_partitions() 🔧

Administrative Function - Lists available data partitions in the lakehouse with metadata including file paths, sizes, and schema hashes.

See Admin Functions Reference for details.

retire_partitions(view_set_name, view_instance_id, begin_insert_time, end_insert_time) 🔧

Administrative Function - Retires data partitions from the lakehouse for a specified time range.

⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.

materialize_partitions(view_name, begin_insert_time, end_insert_time, partition_delta_seconds) 🔧

Administrative Function - Materializes data partitions for a view over a specified time range.

See Admin Functions Reference for details.

list_view_sets() 🔧

Administrative Function - Lists all available view sets with their current schema information.

See Admin Functions Reference for details.

retire_partition_by_metadata(view_set_name, view_instance_id, begin_insert_time, end_insert_time) 🔧

Administrative Function - Retires a single partition by its metadata identifiers.

⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.

retire_partition_by_file(file_path) 🔧

Administrative Function - Retires a single partition by file path. Prefer retire_partition_by_metadata() for new code.

⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.

delete_duplicate_processes() 🔧

Administrative Function - Deletes duplicate processes within the query time range. Keeps the earliest entry per process_id.

⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.

delete_duplicate_streams() 🔧

Administrative Function - Deletes duplicate streams within the query time range. Keeps the earliest entry per stream_id.

⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.

delete_duplicate_blocks() 🔧

Administrative Function - Deletes duplicate blocks within the query time range. Keeps the earliest entry per block_id.

⚠️ DESTRUCTIVE OPERATION: See Admin Functions Reference for details.

perfetto_trace_chunks(process_id, span_types, start_time, end_time)

Generates Perfetto trace chunks from process telemetry data for visualization and performance analysis.

Syntax:

SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(process_id, span_types, start_time, end_time)
ORDER BY chunk_id

Parameters:

  • process_id (Utf8): Process UUID to generate trace for

  • span_types (Utf8): Type of spans to include: 'thread', 'async', or 'both'

  • start_time (Timestamp): Start time for trace data (UTC timestamp)

  • end_time (Timestamp): End time for trace data (UTC timestamp)

Returns:

Column Type Description
chunk_id Int32 Sequential chunk identifier
chunk_data Binary Binary protobuf TracePacket data

Examples:

-- Generate trace for thread spans only
SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(
    'process-uuid-123',
    'thread',
    TIMESTAMP '2024-01-01T00:00:00Z',
    TIMESTAMP '2024-01-01T01:00:00Z'
)
ORDER BY chunk_id;

-- Generate trace for both thread and async spans
SELECT chunk_id, chunk_data
FROM perfetto_trace_chunks(
    'my-process-id',
    'both',
    NOW() - INTERVAL '1 hour',
    NOW()
)
ORDER BY chunk_id;

Note: The returned binary data is in Perfetto protobuf format and can be loaded directly into the Perfetto UI for visualization and analysis.

process_spans(process_id, types)

Returns thread spans, async spans, or both from a process, with stream_id and thread_name columns prepended. For async spans, stream_id is empty and thread_name is 'async'.

Syntax:

-- Thread spans only
SELECT * FROM process_spans('process-uuid', 'thread')

-- Async spans only
SELECT * FROM process_spans('process-uuid', 'async')

-- Both combined
SELECT name, begin, end, depth, thread_name as lane
FROM process_spans('process-uuid', 'both')
ORDER BY lane, begin

Parameters:

  • process_id (Utf8): Process UUID to query
  • types (Utf8): 'thread', 'async', or 'both'

Note: The time range is provided out of band via the query's begin/end parameters, not as function arguments.

Returns: Same schema as thread_spans with two additional leading columns:

Column Type Description
stream_id Dictionary(Int16, Utf8) Stream identifier (empty for async)
thread_name Dictionary(Int16, Utf8) Thread display name ('async' for async spans)
id Int64 Span identifier
parent Int64 Parent span identifier
depth UInt32 Nesting depth
hash UInt32 Span hash
begin Timestamp(Nanosecond) Span start time
end Timestamp(Nanosecond) Span end time
duration Int64 Duration in nanoseconds
name Dictionary(Int16, Utf8) Span name (function)
target Dictionary(Int16, Utf8) Module/target
filename Dictionary(Int16, Utf8) Source file
line UInt32 Line number

Examples:

-- Get all spans across threads for a process
SELECT stream_id, thread_name, name, duration
FROM process_spans('process-uuid-123', 'thread')
ORDER BY begin;

-- Analyze frame time per thread
SELECT thread_name, name, AVG(duration) / 1000000.0 as avg_ms
FROM process_spans('my-process-id', 'both')
WHERE depth = 0
GROUP BY thread_name, name
ORDER BY avg_ms DESC;

parse_block(block_id)

Parses transit-serialized objects from a block's payload and returns each object as a row with its type name and full content as JSONB. This provides a generic block inspection tool, independent of any specific view (logs, metrics, spans).

Syntax:

SELECT object_index, type_name, jsonb_format_json(value)
FROM parse_block(block_id)

Parameters:

  • block_id (Utf8): UUID of the block to parse. Block IDs can be found in the blocks view.

Returns:

Column Type Description
object_index Int64 Ordinal position within the block (global, starting from the block's object_offset)
type_name Utf8 Transit type name (e.g., "LogStringEvent", "BeginThreadSpanEvent")
value Binary Full object content as JSONB binary data

Examples:

-- Find a block to inspect
SELECT block_id, nb_objects, "streams.tags"
FROM blocks
LIMIT 5;

-- Parse all objects in a block
SELECT object_index, type_name, jsonb_format_json(value)
FROM parse_block('550e8400-e29b-41d4-a716-446655440000');

-- Filter by object type
SELECT object_index, jsonb_format_json(value)
FROM parse_block('550e8400-e29b-41d4-a716-446655440000')
WHERE type_name LIKE 'Log%';

-- Extract a specific field from objects
SELECT object_index, type_name,
       jsonb_as_string(jsonb_get(value, 'msg')) as msg
FROM parse_block('550e8400-e29b-41d4-a716-446655440000')
WHERE type_name = 'LogStringInteropEvent'
LIMIT 10;

Notes:

  • The value column contains JSONB-encoded objects. Each object includes a __type field with the transit type name, which is especially useful for inspecting nested objects.
  • When a LIMIT is used without filters, the function stops parsing early for efficiency. When filters are present, all objects are materialized first so DataFusion can apply the filter.
  • Use with JSONB functions like jsonb_get, jsonb_format_json, and jsonb_as_string to extract and display object contents.

Scalar Functions

JSON/JSONB Functions

Micromegas provides functions for working with JSON data stored in binary JSONB format for efficient storage and querying.

jsonb_parse(json_string)

Parses a JSON string into binary JSONB format.

Syntax:

jsonb_parse(json_string)

Parameters:

  • json_string (Multiple formats supported): JSON string to parse:

  • Utf8 - Plain string

  • Dictionary<Int32, Utf8> - Dictionary-encoded string

Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB data

Example:

-- Parse JSON string into JSONB
SELECT jsonb_parse('{"name": "web_server", "port": 8080}') as parsed_json
FROM processes;

jsonb_path_query_first(jsonb, path)

Returns the first match of a JSONPath expression on a JSONB value, or NULL if no match is found.

Syntax:

jsonb_path_query_first(jsonb, path)

Parameters:

  • jsonb (Multiple formats supported): JSONB value in any of these formats:

  • Binary - Plain JSONB binary

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB

  • path (Utf8): A JSONPath expression string (e.g., $.store.book[0].title)

Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB value of the first match, or NULL if no match

Examples:

-- Extract a nested value
SELECT jsonb_path_query_first(jsonb_parse('{"user": {"name": "Alice"}}'), '$.user.name') as name;
-- Returns: "Alice" (as JSONB)

-- Array index access
SELECT jsonb_path_query_first(jsonb_parse('{"items": [10, 20, 30]}'), '$.items[1]') as second;
-- Returns: 20

-- First wildcard match
SELECT jsonb_as_string(jsonb_path_query_first(data, '$.tags[0]')) as first_tag
FROM processes;

jsonb_path_query(jsonb, path)

Returns all matches of a JSONPath expression on a JSONB value as a JSONB array.

Syntax:

jsonb_path_query(jsonb, path)

Parameters:

  • jsonb (Multiple formats supported): JSONB value in any of these formats:

  • Binary - Plain JSONB binary

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB

  • path (Utf8): A JSONPath expression string (e.g., $.store.book[*].title)

Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB array containing all matched values, or an empty array if no match

Examples:

-- Extract all names from an array of objects
SELECT jsonb_path_query(jsonb_parse('{"users": [{"name": "Alice"}, {"name": "Bob"}]}'), '$.users[*].name') as names;
-- Returns: ["Alice", "Bob"]

-- All array elements
SELECT jsonb_path_query(jsonb_parse('[1, 2, 3]'), '$[*]') as all_items;
-- Returns: [1, 2, 3]

-- No match returns empty array
SELECT jsonb_path_query(jsonb_parse('{"a": 1}'), '$.missing') as result;
-- Returns: []

Filter Predicates in JSONPath

jsonb_path_query and jsonb_path_query_first support SQL/JSON path syntax for filter predicates. This differs from the JavaScript-style JSONPath syntax commonly found in online tutorials.

Key difference: Filters use ? () after a wildcard step, not [?()] inside brackets.

Feature JavaScript JSONPath (NOT supported) SQL/JSON path (supported)
Filter in brackets $.items[?(@.price < 10)] $.items[*] ? (@.price < 10)
String equality [?(@.type=="human")] [*] ? (@.type == "human")
Logical AND [?(@.a > 1 && @.b < 5)] [*] ? (@.a > 1 && @.b < 5)

Examples:

-- Filter array elements by field value
SELECT jsonb_path_query(
  jsonb_parse('{"items": [{"type": "active", "id": 1}, {"type": "inactive", "id": 2}]}'),
  '$.items[*] ? (@.type == "active")'
) as active_items;
-- Returns: [{"id": 1, "type": "active"}]

-- Numeric comparison
SELECT jsonb_path_query(
  jsonb_parse('{"scores": [{"name": "Alice", "val": 85}, {"name": "Bob", "val": 42}]}'),
  '$.scores[*] ? (@.val > 50)'
) as high_scores;
-- Returns: [{"name": "Alice", "val": 85}]

-- Get first match with filter
SELECT jsonb_path_query_first(
  jsonb_parse('{"users": [{"role": "admin", "name": "Alice"}, {"role": "user", "name": "Bob"}]}'),
  '$.users[*] ? (@.role == "admin")'
) as first_admin;
-- Returns: {"name": "Alice", "role": "admin"}

-- Combined with jsonb_array_elements for row expansion
SELECT jsonb_as_string(jsonb_get(value, 'name')) as player_name
FROM jsonb_array_elements(
  jsonb_path_query(msg_jsonb, '$.teams[*].players[*] ? (@.type == "human")')
)

Common mistake

Using JavaScript-style filter syntax like $.items[?(@.type=="active")] will result in a parse error. Always use the SQL/JSON style: $.items[*] ? (@.type == "active").

jsonb_get(jsonb, key)

Extracts a value from a JSONB object by key name.

Syntax:

jsonb_get(jsonb, key)

Parameters:

  • jsonb (Multiple formats supported): JSONB object in any of these formats:

  • Binary - Plain JSONB binary

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB

  • key (Utf8): Key name to extract

Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB value or NULL if key not found

Example:

-- Extract name field from JSON data
SELECT jsonb_get(jsonb_parse('{"name": "web_server", "port": 8080}'), 'name') as name_value
FROM processes;

jsonb_format_json(jsonb)

Converts a JSONB value back to a human-readable JSON string.

Syntax:

jsonb_format_json(jsonb)

Parameters:

  • jsonb (Multiple formats supported): JSONB value in any of these formats:

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB (default)

  • Binary - Non-dictionary JSONB

Returns: Dictionary<Int32, Utf8> - Dictionary-encoded JSON string representation

Examples:

-- Format JSONB back to JSON string
SELECT jsonb_format_json(jsonb_parse('{"name": "web_server"}')) as json_string
FROM processes;

-- Works directly with dictionary-encoded properties
SELECT jsonb_format_json(properties_to_jsonb(properties)) as json_props
FROM log_entries;

-- Format property values as JSON
SELECT jsonb_format_json(properties) as json_string
FROM processes
WHERE properties IS NOT NULL;

jsonb_as_string(jsonb)

Casts a JSONB value to a string.

Syntax:

jsonb_as_string(jsonb)

Parameters:

  • jsonb (Multiple formats supported): JSONB value to convert:

  • Binary - Plain JSONB binary

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB

Returns: Dictionary<Int32, Utf8> - Dictionary-encoded string value or NULL if not a string

Example:

-- Extract string value from JSONB
SELECT jsonb_as_string(jsonb_get(jsonb_parse('{"service": "web_server"}'), 'service')) as service_name
FROM processes;

jsonb_as_f64(jsonb)

Casts a JSONB value to a 64-bit float.

Syntax:

jsonb_as_f64(jsonb)

Parameters:

  • jsonb (Multiple formats supported): JSONB value to convert:

  • Binary - Plain JSONB binary

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB

Returns: Float64 - Numeric value or NULL if not a number

Example:

-- Extract numeric value from JSONB
SELECT jsonb_as_f64(jsonb_get(jsonb_parse('{"cpu_usage": 75.5}'), 'cpu_usage')) as cpu_usage
FROM processes;

jsonb_as_i64(jsonb)

Casts a JSONB value to a 64-bit integer.

Syntax:

jsonb_as_i64(jsonb)

Parameters:

  • jsonb (Multiple formats supported): JSONB value to convert:

  • Binary - Plain JSONB binary

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB

Returns: Int64 - Integer value or NULL if not an integer

Example:

-- Extract integer value from JSONB
SELECT jsonb_as_i64(jsonb_get(jsonb_parse('{"port": 8080}'), 'port')) as port_number
FROM processes;

jsonb_object_keys(jsonb)

Returns the keys of a JSONB object as an array of strings.

Syntax:

jsonb_object_keys(jsonb)

Parameters:

  • jsonb (Multiple formats supported): JSONB object:

  • Binary - Plain JSONB binary

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB

Returns: Dictionary<Int32, List<Utf8>> - Dictionary-encoded array of key names for memory efficiency (repeated key lists share the same dictionary entry), or NULL if input is not an object

Examples:

-- Get keys from a JSONB object
SELECT jsonb_object_keys(jsonb_parse('{"name": "server", "port": 8080}')) as keys;
-- Returns: ["name", "port"]

-- Get keys from process properties
SELECT jsonb_object_keys(properties) as prop_keys
FROM processes
LIMIT 5;

jsonb_array_length(jsonb)

Returns the number of elements in a JSONB array.

Syntax:

jsonb_array_length(jsonb)

Parameters:

  • jsonb (Multiple formats supported): JSONB value:

  • Binary - Plain JSONB binary

  • Dictionary<Int32, Binary> - Dictionary-encoded JSONB

Returns: Int64 - The number of elements in the array, or NULL if the input is not an array

Examples:

-- Count elements in an array
SELECT jsonb_array_length(jsonb_parse('[1, 2, 3]')) as len;
-- Returns: 3

-- Empty array returns 0
SELECT jsonb_array_length(jsonb_parse('[]')) as len;
-- Returns: 0

-- Non-array input returns NULL
SELECT jsonb_array_length(jsonb_parse('{"key": "value"}')) as len;
-- Returns: NULL

-- Filter by array size
SELECT *
FROM events
WHERE jsonb_array_length(jsonb_get(msg_jsonb, 'items')) > 5;

jsonb_each(jsonb_value)

Expands a JSONB object or array into rows of key-value pairs. This is a table-returning function (UDTF) that produces one row per entry.

For objects, key is the field name. For arrays, key is the element index as a string ("0", "1", ...).

Syntax:

SELECT key, value
FROM jsonb_each(jsonb_subquery)

Parameters:

  • jsonb_value (Binary/JSONB): A JSONB object or array value, provided as a literal or a subquery returning a single JSONB column. If the subquery returns multiple rows, the entries from all rows are concatenated. Null values are skipped. Returns an error if the input is a scalar (e.g., number or string).

Returns:

Column Type Description
key Utf8 Object field name, or array index as a string
value Binary (JSONB) Value as JSONB bytes, composable with jsonb_as_string, jsonb_format_json, etc.

Examples:

-- Expand process properties into rows
SELECT key, jsonb_as_string(value) as value
FROM jsonb_each(
  (SELECT properties FROM processes WHERE process_id = 'my_process_123')
)

-- Use with other JSONB functions for nested values
SELECT key, jsonb_format_json(value) as json_value
FROM jsonb_each(
  (SELECT jsonb_parse('{"name": "server", "port": 8080, "tags": ["prod", "us-east"]}'))
)

-- Expand a JSONB array into rows
SELECT key as index, jsonb_format_json(value) as element
FROM jsonb_each(
  (SELECT jsonb_parse('[10, 20, 30]'))
)
-- Returns: ("0", 10), ("1", 20), ("2", 30)

jsonb_array_elements(jsonb_value)

Expands a JSONB array into a set of rows, one per element. This is a table-returning function (UDTF) that produces one row per array element with a single value column.

Unlike jsonb_each, this function only accepts arrays (not objects) and does not produce a key column, making it more natural for array unnesting.

Syntax:

SELECT value
FROM jsonb_array_elements(jsonb_array)

Parameters:

  • jsonb_value (Binary/JSONB): A JSONB array value, provided as a literal, subquery, or expression (e.g., jsonb_path_query(...)). If a subquery returns multiple rows, the elements from all arrays are concatenated. Returns an error if the input is not a JSONB array.

Returns:

Column Type Description
value Binary (JSONB) Array element as JSONB bytes, composable with jsonb_as_string, jsonb_get, jsonb_format_json, etc.

Examples:

-- Unnest a simple array
SELECT jsonb_as_string(value) as val
FROM jsonb_array_elements(jsonb_parse('[1, 2, 3]'))

-- Unnest array of objects and extract a field
SELECT jsonb_as_string(jsonb_get(value, 'name')) as name
FROM jsonb_array_elements(jsonb_parse('[{"name": "Alice"}, {"name": "Bob"}]'))

-- Unnest from a subquery
SELECT jsonb_as_string(jsonb_get(value, 'profile_id')) as profile_id
FROM jsonb_array_elements((SELECT jsonb_path_query(msg_jsonb, '$.teams[*].players[*]') FROM events LIMIT 1))
WHERE jsonb_as_string(jsonb_get(value, 'type')) = 'human'

Data Access Functions

get_payload(process_id, stream_id, block_id)

Retrieves the raw binary payload of a telemetry block from data lake storage.

Syntax:

get_payload(process_id, stream_id, block_id)

Parameters:

  • process_id (Utf8): Process identifier

  • stream_id (Utf8): Stream identifier

  • block_id (Utf8): Block identifier

Returns: Binary - Raw block payload data

Example:

-- Get raw payload data for specific blocks
SELECT process_id, stream_id, block_id, get_payload(process_id, stream_id, block_id) as payload
FROM blocks
WHERE insert_time >= NOW() - INTERVAL '1 hour'
LIMIT 10;

Note: This is an async function that fetches data from object storage. Use sparingly in queries as it can impact performance.

Property Functions

Micromegas provides specialized functions for working with property data, including efficient dictionary encoding for memory optimization.

property_get(properties, key)

Extracts a value from a properties map with automatic format detection and optimized performance for JSONB data.

Syntax:

property_get(properties, key)

Parameters:

  • properties (Multiple formats supported): Properties data in any of these formats:

    • Dictionary<Int32, Binary> - JSONB format (default, optimized)
    • List<Struct<key, value>> - Legacy format (automatic conversion)
    • Dictionary<Int32, List<Struct>> - Dictionary-encoded legacy
    • Binary - Non-dictionary JSONB
  • key (Utf8): Property key to extract

Returns: Dictionary<Int32, Utf8> - Property value or NULL if not found

Performance: Optimized for the new JSONB format. Legacy formats are automatically converted for backward compatibility.

Examples:

-- Get thread name from process properties (works with all formats)
SELECT time, msg, property_get(process_properties, 'thread-name') as thread
FROM log_entries
WHERE property_get(process_properties, 'thread-name') IS NOT NULL;

-- Filter by custom property
SELECT time, name, value
FROM measures
WHERE property_get(properties, 'source') = 'system_monitor';

-- Direct JSONB property access (post-migration default)
SELECT time, msg, property_get(properties, 'service') as service
FROM log_entries
WHERE property_get(properties, 'env') = 'production';

properties_length(properties)

Returns the number of properties in a properties map with support for multiple storage formats.

Syntax:

properties_length(properties)

Parameters:

  • properties (Multiple formats supported): Properties data in any of these formats:

    • List<Struct<key, value>> - Legacy format
    • Dictionary<Int32, Binary> - JSONB format (optimized)
    • Dictionary<Int32, List<Struct>> - Dictionary-encoded legacy
    • Binary - Non-dictionary JSONB

Returns: Int32 - Number of properties

Examples:

-- Works with regular properties
SELECT properties_length(properties) as prop_count
FROM measures;

-- Works with dictionary-encoded properties
SELECT properties_length(properties_to_dict(properties)) as prop_count
FROM measures;

-- JSONB property counting
SELECT properties_length(properties_to_jsonb(properties)) as prop_count
FROM measures;

properties_to_dict(properties)

Converts a properties list to a dictionary-encoded array for memory efficiency.

Syntax:

properties_to_dict(properties)

Parameters:

  • properties (List<Struct<key: Utf8, value: Utf8>>): Properties list to encode

Returns: Dictionary<Int32, List<Struct<key: Utf8, value: Utf8>>> - Dictionary-encoded properties

Examples:

-- Convert properties to dictionary encoding for memory efficiency
SELECT properties_to_dict(properties) as dict_props
FROM measures;

-- Use with other functions via properties_to_array
SELECT array_length(properties_to_array(properties_to_dict(properties))) as prop_count
FROM measures;

Note: Dictionary encoding can reduce memory usage by 50-80% for datasets with repeated property patterns.

properties_to_jsonb(properties)

Converts a properties list to binary JSONB format with dictionary encoding for efficient storage and querying.

Syntax:

properties_to_jsonb(properties)

Parameters:

  • properties (Multiple formats supported): Properties in any of these formats:

    • List<Struct<key: Utf8, value: Utf8>> - Regular properties list
    • Dictionary<Int32, List<Struct>> - Dictionary-encoded properties
    • Binary - Non-dictionary JSONB
    • Dictionary<Int32, Binary> - JSONB format

Returns: Dictionary<Int32, Binary> - Dictionary-encoded JSONB object containing the properties as key-value pairs

Examples:

-- Convert properties to JSONB format
SELECT properties_to_jsonb(properties) as jsonb_props
FROM log_entries;

-- Use with other JSONB functions
SELECT jsonb_get(properties_to_jsonb(properties), 'hostname') as hostname
FROM log_entries;

-- Convert dictionary-encoded properties to JSONB
SELECT properties_to_jsonb(properties_to_dict(properties)) as jsonb_props
FROM measures;

Note: This function returns Dictionary<Int32, Binary> format for optimal memory usage with Arrow's built-in dictionary encoding.

properties_to_array(dict_properties)

Converts dictionary-encoded properties back to a regular array for compatibility with standard functions.

Syntax:

properties_to_array(dict_properties)

Parameters:

  • dict_properties (Dictionary<Int32, List<Struct>>): Dictionary-encoded properties

Returns: List<Struct<key: Utf8, value: Utf8>> - Regular properties array

Examples:

-- Convert dictionary-encoded properties back to array
SELECT properties_to_array(properties_to_dict(properties)) as props
FROM measures;

-- Use with array functions
SELECT array_length(properties_to_array(properties_to_dict(properties))) as count
FROM measures;

Histogram Functions

Micromegas provides a comprehensive set of functions for creating and analyzing histograms, enabling efficient statistical analysis of large datasets.

make_histogram(start, end, bins, values)

Creates histogram data from numeric values with specified range and bin count.

Syntax:

make_histogram(start, end, bins, values)

Parameters:

  • start (Float64): Histogram minimum value

  • end (Float64): Histogram maximum value

  • bins (Int64): Number of histogram bins

  • values (Float64): Column of numeric values to histogram

Returns: Histogram structure with buckets and counts

Example:

-- Create histogram of response times (0-50ms, 20 bins)
SELECT make_histogram(0.0, 50.0, 20, CAST(duration AS FLOAT64) / 1000000.0) as duration_histogram
FROM view_instance('thread_spans', 'web_server_123')
WHERE name = 'handle_request';

sum_histograms(histogram_column)

Aggregates multiple histograms by summing their bins.

Syntax:

sum_histograms(histogram_column)

Parameters:

  • histogram_column (Histogram): Column containing histogram values

Returns: Combined histogram with summed bins

Example:

-- Combine histograms across processes
SELECT sum_histograms(duration_histogram) as combined_histogram
FROM cpu_usage_per_process_per_minute
WHERE time_bin >= NOW() - INTERVAL '1 hour';

expand_histogram(histogram)

Expands a histogram struct into rows of (bin_center, count) for visualization as a bar chart.

Syntax:

SELECT bin_center, count
FROM expand_histogram(histogram_subquery)

Parameters:

  • histogram (Histogram struct): A histogram value from make_histogram() or a subquery returning one

Returns:

Column Type Description
bin_center Float64 Center value of each bin
count UInt64 Number of values in the bin

Examples:

-- Expand a CPU usage histogram into chartable rows
SELECT bin_center, count
FROM expand_histogram(
  (SELECT make_histogram(0.0, 100.0, 100, value)
   FROM measures
   WHERE name = 'cpu_usage')
)

-- Histogram for a specific process
SELECT bin_center, count
FROM expand_histogram(
  (SELECT make_histogram(0.0, 50.0, 50, value)
   FROM view_instance('measures', 'my_process_123')
   WHERE name = 'frame_time')
)

Note: This function is designed for visualization. Use with a bar chart to display distribution data.

quantile_from_histogram(histogram, quantile)

Estimates a quantile value from a histogram.

Syntax:

quantile_from_histogram(histogram, quantile)

Parameters:

  • histogram (Histogram): Histogram to analyze

  • quantile (Float64): Quantile to estimate (0.0 to 1.0)

Returns: Float64 - Estimated quantile value

Examples:

-- Get median (50th percentile) response time
SELECT quantile_from_histogram(duration_histogram, 0.5) as median_duration
FROM performance_histograms;

-- Get 95th percentile response time
SELECT quantile_from_histogram(duration_histogram, 0.95) as p95_duration
FROM performance_histograms;

variance_from_histogram(histogram)

Calculates variance from histogram data.

Syntax:

variance_from_histogram(histogram)

Parameters:

  • histogram (Histogram): Histogram to analyze

Returns: Float64 - Variance of the histogram data

Example:

-- Calculate response time variance
SELECT variance_from_histogram(duration_histogram) as duration_variance
FROM performance_histograms;

count_from_histogram(histogram)

Extracts the total count of values from a histogram.

Syntax:

count_from_histogram(histogram)

Parameters:

  • histogram (Histogram): Histogram to analyze

Returns: UInt64 - Total number of values in the histogram

Example:

-- Get total sample count from histogram
SELECT count_from_histogram(duration_histogram) as total_samples
FROM performance_histograms;

sum_from_histogram(histogram)

Extracts the sum of all values from a histogram.

Syntax:

sum_from_histogram(histogram)

Parameters:

  • histogram (Histogram): Histogram to analyze

Returns: Float64 - Sum of all values in the histogram

Example:

-- Get total duration from histogram
SELECT sum_from_histogram(duration_histogram) as total_duration
FROM performance_histograms;

Standard SQL Functions

Micromegas supports all standard DataFusion SQL functions including math, string, date/time, conditional, and array functions. For a complete list with examples, see the DataFusion Scalar Functions documentation.

Advanced Query Patterns

Histogram Analysis

-- Create performance histogram (0-100ms, 10 bins)
SELECT make_histogram(0.0, 100.0, 10, duration / 1000000.0) as response_time_ms_histogram
FROM view_instance('thread_spans', 'web_server')
WHERE name = 'handle_request'
  AND duration > 1000000;  -- > 1ms
-- Analyze histogram statistics
SELECT 
    quantile_from_histogram(response_time_histogram, 0.5) as median_ms,
    quantile_from_histogram(response_time_histogram, 0.95) as p95_ms,
    quantile_from_histogram(response_time_histogram, 0.99) as p99_ms,
    variance_from_histogram(response_time_histogram) as variance,
    count_from_histogram(response_time_histogram) as sample_count,
    sum_from_histogram(response_time_histogram) as total_time_ms
FROM performance_histograms
WHERE time_bin >= NOW() - INTERVAL '1 hour';
-- Aggregate histograms across multiple processes
SELECT 
    time_bin,
    sum_histograms(cpu_usage_histo) as combined_cpu_histogram,
    quantile_from_histogram(sum_histograms(cpu_usage_histo), 0.95) as p95_cpu
FROM cpu_usage_per_process_per_minute
WHERE time_bin >= NOW() - INTERVAL '1 day'
GROUP BY time_bin
ORDER BY time_bin;

Property Extraction and Filtering

-- Find logs with specific thread names
SELECT time, level, msg, property_get(process_properties, 'thread-name') as thread
FROM log_entries
WHERE property_get(process_properties, 'thread-name') LIKE '%worker%'
ORDER BY time DESC;

High-Performance JSONB Property Access

-- Convert properties to JSONB for better performance
SELECT
    time,
    msg,
    property_get(properties_to_jsonb(properties), 'service') as service,
    property_get(properties_to_jsonb(properties), 'version') as version
FROM log_entries
WHERE property_get(properties_to_jsonb(properties), 'env') = 'production'
  AND time >= NOW() - INTERVAL '1 hour'
ORDER BY time DESC;
-- Efficient property filtering with JSONB
WITH jsonb_logs AS (
    SELECT
        time,
        level,
        msg,
        properties_to_jsonb(properties) as jsonb_props
    FROM log_entries
    WHERE time >= NOW() - INTERVAL '1 day'
)
SELECT
    time,
    level,
    msg,
    property_get(jsonb_props, 'service') as service,
    property_get(jsonb_props, 'request_id') as request_id
FROM jsonb_logs
WHERE property_get(jsonb_props, 'error_code') IS NOT NULL
ORDER BY time DESC;
-- Property aggregation with optimal performance
SELECT
    property_get(properties_to_jsonb(properties), 'service') as service,
    property_get(properties_to_jsonb(properties), 'env') as environment,
    COUNT(*) as event_count,
    COUNT(CASE WHEN level <= 2 THEN 1 END) as error_count
FROM log_entries
WHERE time >= NOW() - INTERVAL '1 hour'
  AND property_get(properties_to_jsonb(properties), 'service') IS NOT NULL
GROUP BY service, environment
ORDER BY error_count DESC;

JSON Data Processing

-- Parse and extract configuration from JSON logs
SELECT 
    time,
    msg,
    jsonb_as_string(jsonb_get(jsonb_parse(msg), 'service')) as service_name,
    jsonb_as_i64(jsonb_get(jsonb_parse(msg), 'port')) as port,
    jsonb_as_f64(jsonb_get(jsonb_parse(msg), 'cpu_limit')) as cpu_limit
FROM log_entries
WHERE msg LIKE '%{%'  -- Contains JSON
  AND jsonb_parse(msg) IS NOT NULL
ORDER BY time DESC;
-- Aggregate metrics from JSON payloads
SELECT 
    jsonb_as_string(jsonb_get(jsonb_parse(msg), 'service')) as service,
    COUNT(*) as event_count,
    AVG(jsonb_as_f64(jsonb_get(jsonb_parse(msg), 'response_time'))) as avg_response_ms
FROM log_entries
WHERE msg LIKE '%response_time%'
  AND jsonb_parse(msg) IS NOT NULL
GROUP BY service
ORDER BY avg_response_ms DESC;

Time-based Aggregation

-- Hourly error counts
SELECT 
    date_trunc('hour', time) as hour,
    COUNT(*) as error_count
FROM log_entries
WHERE level <= 2  -- Fatal and Error
  AND time >= NOW() - INTERVAL '24 hours'
GROUP BY date_trunc('hour', time)
ORDER BY hour;

Performance Trace Analysis

-- Top 10 slowest functions with statistics
SELECT 
    name,
    COUNT(*) as call_count,
    AVG(duration) / 1000000.0 as avg_ms,
    MAX(duration) / 1000000.0 as max_ms,
    STDDEV(duration) / 1000000.0 as stddev_ms
FROM view_instance('thread_spans', 'my_process')
WHERE duration > 100000  -- > 0.1ms
GROUP BY name
ORDER BY avg_ms DESC
LIMIT 10;

DataFusion Reference

Micromegas supports all standard DataFusion SQL syntax, functions, and operators. For complete documentation including functions, operators, data types, and SQL syntax, see the Apache DataFusion SQL Reference.

Next Steps