Module lakehouse

Expand description

Module dedicated to the maintenance and query of materialized views

Unlike the telemetry data lake where it’s fast & cheap to write but costly to read, the lakehouse partitions are costly to write but allow for cheap & fast queries using datafusion.

Views based on a low frequency of events (< 1k events per second per process) are kept updated regularly. Views based on a high frequency of events (up to 100k events per second per process) are metrialized on demand.

Modules§

answer: Record batches + schema
async_events_block_processor: Implementation of BlockProcessor for async events
async_events_view: Materializable view of async span events accessible through datafusion
async_parquet_writer: Write parquet in object store
batch_partition_merger: BatchPartitionMerger merges multiple partitions by splitting the work in batches to use less memory. The batches are based on event times.
batch_update: Materialize views on a schedule based on the time data was received from the ingestion service
block_partition_spec: Specification for a view partition backed by a set of telemetry blocks which can be processed out of order
blocks_view: Replicated view of the blocks table of the postgresql metadata database.
caching_reader: Adds file content caching to object store reads
catalog: Catalog utilities for discovering and managing view schemas Catalog utilities for discovering and managing view schemas.
dataframe_time_bounds
export_log_view: Export mechanism that doubles as audit trail
file_cache: Global LRU cache for parquet file contents
get_payload_function: Fetch payload from the object store using SQL
jit_partitions: Management of process-specific partitions built on demand
lakehouse_context: Bundles runtime resources for lakehouse query execution
list_partitions_table_function: Read access to the list of lakehouse partitions
list_view_sets_table_function: Read access to view sets with their schema information
log_block_processor: Implementation of BlockProcessor for log entries
log_stats_view: SQL-based view for log statistics aggregated by process, minute, level, and target
log_view: Materializable view of log entries accessible through datafusion
materialize_partitions_table_function: Exposes materialize_partitions as a table function
materialized_view: TableProvider implementation for the lakehouse
merge: Merge consecutive parquet partitions into a single file
metadata_cache: Global LRU cache for partition metadata
metadata_compat: Compatibility layer for parsing legacy Arrow 56.0 metadata and upgrading to Arrow 57.0
metadata_partition_spec: Specification for a view partition backed by a table in the postgresql metadata database.
metrics_block_processor: Implementation of BlockProcessor for measures
metrics_view: Materializable view of measures accessible through datafusion
migration: Maintenance of the postgresql tables and indices use to track the parquet files used to implement the views
net_spans_view: Jit view of pre-paired network bandwidth spans (Connection / Object / Property / RPC)
otel: OTLP reader path: attribute helpers, block processors, and the otel_spans view.
parse_block_table_function: Table function to parse all transit objects in a block and return them as JSONB
partition: Write & delete sections of views
partition_cache: In-memory copy of a subnet of the list of the partitions in the db
partition_metadata: Operations on the dedicated partition_metadata table
partition_source_data: Describes the event blocks backing a partition
partitioned_execution_plan: ExecutionPlan based on a set of parquet files
partitioned_table_provider: TableProvider based on a set of parquet files
perfetto_trace_execution_plan: ExecutionPlan for generating Perfetto trace chunks
perfetto_trace_table_function: Table function for generating Perfetto trace chunks
process_spans_table_function: Table function returning thread and/or async spans from all CPU streams of a process
process_streams: Shared utilities for discovering CPU streams of a process
processes_view: Replicated view of the processes table of the postgresql metadata database.
query: property_get function support from SQL Datafusion integration
reader_factory: Wrapper around ParquetObjectreader to provide ParquetMetaData without hitting the ObjectStore
retire_partition_by_file_udf: Scalar UDF to retire a single partition by file path
retire_partition_by_metadata_udf: Scalar UDF to retire a single partition by metadata
retire_partitions_table_function: Exposes retire_partitions as a table function
runtime: Runtime resources
session_configurator: SessionConfigurator trait for custom session context configuration
sql_batch_view: Sql-defined view updated in batch
sql_partition_spec: Specification for a view partition backed by a SQL query on the lakehouse.
static_tables_configurator: Auto-discovery configurator for static JSON/CSV tables
streams_view: Replicated view of the streams table of the postgresql metadata database.
table_scan_rewrite: Rewrite table scans to take the query range into account
temp: Tracking of expired partitions
thread_spans_view: Jit view of the call tree built from the thread events of a single stream
view: Basic interface for a set of rows queryable and materializable
view_factory: default_view_factory makes the default ViewFactory, giving users access to view instances, grouped in sets.
view_instance_table_function: Table function to query process-specific views
write_partition: Add or remove view partitions

Module lakehouse

Module lakehouse Copy item path

Modules§

Module lakehouse