Module lakehouse

Module lakehouse 

Source
Expand description

Module dedicated to the maintenance and query of materialized views

Unlike the telemetry data lake where it’s fast & cheap to write but costly to read, the lakehouse partitions are costly to write but allow for cheap & fast queries using datafusion.

Views based on a low frequency of events (< 1k events per second per process) are kept updated regularly. Views based on a high frequency of events (up to 100k events per second per process) are metrialized on demand.

Modules§

answer
Record batches + schema
async_events_block_processor
Implementation of BlockProcessor for async events
async_events_view
Materializable view of async span events accessible through datafusion
async_parquet_writer
Write parquet in object store
batch_partition_merger
BatchPartitionMerger merges multiple partitions by splitting the work in batches to use less memory. The batches are based on event times.
batch_update
Materialize views on a schedule based on the time data was received from the ingestion service
block_partition_spec
Specification for a view partition backed by a set of telemetry blocks which can be processed out of order
blocks_view
Replicated view of the blocks table of the postgresql metadata database.
caching_reader
Adds file content caching to object store reads
catalog
Catalog utilities for discovering and managing view schemas Catalog utilities for discovering and managing view schemas.
dataframe_time_bounds
export_log_view
Export mechanism that doubles as audit trail
file_cache
Global LRU cache for parquet file contents
get_payload_function
Fetch payload from the object store using SQL
jit_partitions
Management of process-specific partitions built on demand
lakehouse_context
Bundles runtime resources for lakehouse query execution
list_partitions_table_function
Read access to the list of lakehouse partitions
list_view_sets_table_function
Read access to view sets with their schema information
log_block_processor
Implementation of BlockProcessor for log entries
log_stats_view
SQL-based view for log statistics aggregated by process, minute, level, and target
log_view
Materializable view of log entries accessible through datafusion
materialize_partitions_table_function
Exposes materialize_partitions as a table function
materialized_view
TableProvider implementation for the lakehouse
merge
Merge consecutive parquet partitions into a single file
metadata_cache
Global LRU cache for partition metadata
metadata_compat
Compatibility layer for parsing legacy Arrow 56.0 metadata and upgrading to Arrow 57.0
metadata_partition_spec
Specification for a view partition backed by a table in the postgresql metadata database.
metrics_block_processor
Implementation of BlockProcessor for measures
metrics_view
Materializable view of measures accessible through datafusion
migration
Maintenance of the postgresql tables and indices use to track the parquet files used to implement the views
parse_block_table_function
Table function to parse all transit objects in a block and return them as JSONB
partition
Write & delete sections of views
partition_cache
In-memory copy of a subnet of the list of the partitions in the db
partition_metadata
Operations on the dedicated partition_metadata table
partition_source_data
Describes the event blocks backing a partition
partitioned_execution_plan
ExecutionPlan based on a set of parquet files
partitioned_table_provider
TableProvider based on a set of parquet files
perfetto_trace_execution_plan
ExecutionPlan for generating Perfetto trace chunks
perfetto_trace_table_function
Table function for generating Perfetto trace chunks
process_spans_table_function
Table function returning thread and/or async spans from all CPU streams of a process
process_streams
Shared utilities for discovering CPU streams of a process
processes_view
Replicated view of the processes table of the postgresql metadata database.
query
property_get function support from SQL Datafusion integration
reader_factory
Wrapper around ParquetObjectreader to provide ParquetMetaData without hitting the ObjectStore
retire_partition_by_file_udf
Scalar UDF to retire a single partition by file path
retire_partition_by_metadata_udf
Scalar UDF to retire a single partition by metadata
retire_partitions_table_function
Exposes retire_partitions as a table function
runtime
Runtime resources
session_configurator
SessionConfigurator trait for custom session context configuration
sql_batch_view
Sql-defined view updated in batch
sql_partition_spec
Specification for a view partition backed by a SQL query on the lakehouse.
static_tables_configurator
Auto-discovery configurator for static JSON/CSV tables
streams_view
Replicated view of the streams table of the postgresql metadata database.
table_scan_rewrite
Rewrite table scans to take the query range into account
temp
Tracking of expired partitions
thread_spans_view
Jit view of the call tree built from the thread events of a single stream
view
Basic interface for a set of rows queryable and materializable
view_factory
default_view_factory makes the default ViewFactory, giving users access to view instances, grouped in sets.
view_instance_table_function
Table function to query process-specific views
write_partition
Add or remove view partitions