Expand description
Module dedicated to the maintenance and query of materialized views
Unlike the telemetry data lake where it’s fast & cheap to write but costly to read, the lakehouse partitions are costly to write but allow for cheap & fast queries using datafusion.
Views based on a low frequency of events (< 1k events per second per process) are kept updated regularly. Views based on a high frequency of events (up to 100k events per second per process) are metrialized on demand.
Modules§
- answer
- Record batches + schema
- async_
events_ block_ processor - Implementation of
BlockProcessorfor async events - async_
events_ view - Materializable view of async span events accessible through datafusion
- async_
parquet_ writer - Write parquet in object store
- batch_
partition_ merger - BatchPartitionMerger merges multiple partitions by splitting the work in batches to use less memory. The batches are based on event times.
- batch_
update - Materialize views on a schedule based on the time data was received from the ingestion service
- block_
partition_ spec - Specification for a view partition backed by a set of telemetry blocks which can be processed out of order
- blocks_
view - Replicated view of the
blockstable of the postgresql metadata database. - caching_
reader - Adds file content caching to object store reads
- catalog
- Catalog utilities for discovering and managing view schemas Catalog utilities for discovering and managing view schemas.
- dataframe_
time_ bounds - export_
log_ view - Export mechanism that doubles as audit trail
- file_
cache - Global LRU cache for parquet file contents
- get_
payload_ function - Fetch payload from the object store using SQL
- jit_
partitions - Management of process-specific partitions built on demand
- lakehouse_
context - Bundles runtime resources for lakehouse query execution
- list_
partitions_ table_ function - Read access to the list of lakehouse partitions
- list_
view_ sets_ table_ function - Read access to view sets with their schema information
- log_
block_ processor - Implementation of
BlockProcessorfor log entries - log_
stats_ view - SQL-based view for log statistics aggregated by process, minute, level, and target
- log_
view - Materializable view of log entries accessible through datafusion
- materialize_
partitions_ table_ function - Exposes materialize_partitions as a table function
- materialized_
view - TableProvider implementation for the lakehouse
- merge
- Merge consecutive parquet partitions into a single file
- metadata_
cache - Global LRU cache for partition metadata
- metadata_
compat - Compatibility layer for parsing legacy Arrow 56.0 metadata and upgrading to Arrow 57.0
- metadata_
partition_ spec - Specification for a view partition backed by a table in the postgresql metadata database.
- metrics_
block_ processor - Implementation of
BlockProcessorfor measures - metrics_
view - Materializable view of measures accessible through datafusion
- migration
- Maintenance of the postgresql tables and indices use to track the parquet files used to implement the views
- parse_
block_ table_ function - Table function to parse all transit objects in a block and return them as JSONB
- partition
- Write & delete sections of views
- partition_
cache - In-memory copy of a subnet of the list of the partitions in the db
- partition_
metadata - Operations on the dedicated partition_metadata table
- partition_
source_ data - Describes the event blocks backing a partition
- partitioned_
execution_ plan - ExecutionPlan based on a set of parquet files
- partitioned_
table_ provider - TableProvider based on a set of parquet files
- perfetto_
trace_ execution_ plan - ExecutionPlan for generating Perfetto trace chunks
- perfetto_
trace_ table_ function - Table function for generating Perfetto trace chunks
- process_
spans_ table_ function - Table function returning thread and/or async spans from all CPU streams of a process
- process_
streams - Shared utilities for discovering CPU streams of a process
- processes_
view - Replicated view of the
processestable of the postgresql metadata database. - query
- property_get function support from SQL Datafusion integration
- reader_
factory - Wrapper around ParquetObjectreader to provide ParquetMetaData without hitting the ObjectStore
- retire_
partition_ by_ file_ udf - Scalar UDF to retire a single partition by file path
- retire_
partition_ by_ metadata_ udf - Scalar UDF to retire a single partition by metadata
- retire_
partitions_ table_ function - Exposes retire_partitions as a table function
- runtime
- Runtime resources
- session_
configurator - SessionConfigurator trait for custom session context configuration
- sql_
batch_ view - Sql-defined view updated in batch
- sql_
partition_ spec - Specification for a view partition backed by a SQL query on the lakehouse.
- static_
tables_ configurator - Auto-discovery configurator for static JSON/CSV tables
- streams_
view - Replicated view of the
streamstable of the postgresql metadata database. - table_
scan_ rewrite - Rewrite table scans to take the query range into account
- temp
- Tracking of expired partitions
- thread_
spans_ view - Jit view of the call tree built from the thread events of a single stream
- view
- Basic interface for a set of rows queryable and materializable
- view_
factory default_view_factorymakes the defaultViewFactory, giving users access to view instances, grouped in sets.- view_
instance_ table_ function - Table function to query process-specific views
- write_
partition - Add or remove view partitions