Why Clickhouse Works At Scale

2026-02-13 database clickhouse

ClickHouse has become a common choice for analytical workloads that require fast queries over very large datasets. Many systems advertise columnar storage or distributed execution, but long term performance and operational complexity are shaped by architectural decisions.

ClickHouse is defined by its MergeTree storage engines, its reliance on data layout instead of traditional indexing, insert time materialized views, vectorized execution combined with compression, and a single node first design philosophy. These characteristics reduce operational overhead while supporting large observability, event analytics, and time series workloads.

The MergeTree Engine Family

The MergeTree family forms the foundation of the ClickHouse performance model. Unlike traditional databases where indexing and storage engines are abstracted away, ClickHouse makes the storage engine an explicit architectural choice. MergeTree variants define how data is written, organized, merged, and queried.

Data is ingested as immutable parts sorted by a primary key and merged asynchronously in the background. Background merges reduce fragmentation and maintain query efficiency without blocking writes. Partitioning, sorting, deduplication, aggregation behavior, replication, and retention policies are defined directly in table DDL.

A table storing HTTP access logs might be sorted by (service, timestamp). Queries that filter by service and time range can skip large sections of data outside those bounds. ReplacingMergeTree collapses multiple versions of the same logical row, which supports late arriving updates. SummingMergeTree and AggregatingMergeTree store partially aggregated data so rollups can be maintained at the storage layer. This design supports high ingestion rates and predictable query performance on historical data.

For example, a ReplacingMergeTree table can support idempotent upserts without row level locking:

CREATE TABLE events
(
    event_id UUID,
    user_id UInt64,
    event_type LowCardinality(String),
    event_time DateTime,
    properties String,
    version UInt32
)
ENGINE = ReplacingMergeTree(version)
PARTITION BY toYYYYMM(event_time)
ORDER BY (user_id, event_time)
TTL event_time + INTERVAL 90 DAY
SETTINGS index_granularity = 8192;

Partitions prune entire months of data. The ORDER BY clause defines the physical sort key. The version column enables deduplication during merges. The TTL enforces retention automatically. Vacuuming and manual index maintenance are not required.

Performance Without Traditional Indexes

ClickHouse does not rely on secondary indexes in the way many relational databases do. It uses sparse primary indexes and data skipping indexes that operate on blocks of rows. These structures align with the physical layout of the data on disk and introduce minimal write overhead.

A metrics table may include a data skipping index on a low cardinality column such as region. When a query filters on a specific region, ClickHouse skips entire blocks that do not contain matching values. This reduces unnecessary scans and lowers storage overhead compared to per row index structures.

Consider a pageview table:

CREATE TABLE pageviews
(
    timestamp DateTime,
    user_id UInt64,
    url String,
    status UInt16
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, timestamp);

A query filtering by user and recent time range:

SELECT count()
FROM pageviews
WHERE user_id = 12345
  AND timestamp >= now() - INTERVAL 7 DAY;

Because the data is sorted by (user_id, timestamp), ClickHouse reads only relevant granules. The engine skips entire data ranges instead of scanning the full table.

Additional pruning can be introduced with data skipping indexes:

ALTER TABLE pageviews
ADD INDEX idx_status status TYPE minmax GRANULARITY 4;

These structures are lightweight metadata rather than full row-level indexes, reinforcing the system’s focus on analytical workloads rather than transactional lookups.

Insert-Time Materialized Views

ClickHouse supports insert time materialized views. Data transformation and aggregation occur when data is written rather than when it is queried. When a row is inserted into a source table, materialized view logic executes immediately and writes results into a target table.

A common use case is metrics aggregation. Raw metrics are ingested at high resolution into a base table. Materialized views aggregate those metrics into per minute or per hour summaries during ingestion. Queries read from pre aggregated tables, which keeps query latency predictable and removes the need for scheduled aggregation jobs.

Start with a raw events table:

CREATE TABLE raw_events
(
    event_time DateTime,
    user_id UInt64,
    revenue Float64
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_time);

Define an aggregate table:

CREATE TABLE daily_revenue
(
    event_date Date,
    total_revenue Float64
)
ENGINE = SummingMergeTree
PARTITION BY event_date
ORDER BY event_date;

Create a materialized view:

CREATE MATERIALIZED VIEW mv_daily_revenue
TO daily_revenue
AS
SELECT
    toDate(event_time) AS event_date,
    sum(revenue) AS total_revenue
FROM raw_events
GROUP BY event_date;

Every insert into raw_events updates daily_revenue automatically. There is no external scheduler, no Spark job, and no periodic aggregation task. This pattern is effective for observability metrics, SaaS analytics dashboards, and financial rollups where near-real-time aggregates are required without additional infrastructure.

Vectorized Execution and Compression

ClickHouse processes data in blocks using vectorized execution. Functions and expressions operate on arrays of values rather than individual rows. This improves CPU efficiency and aligns with modern processor architecture.

Compression is integrated with this execution model. Columns are compressed independently using codecs selected for the data type and distribution. Filters and aggregations can be applied directly on compressed data, which reduces memory bandwidth and disk IO.

You can configure compression at the column level:

CREATE TABLE metrics
(
    timestamp DateTime CODEC(Delta, ZSTD),
    host LowCardinality(String) CODEC(ZSTD(3)),
    cpu_usage Float32 CODEC(Gorilla),
    memory_usage Float32 CODEC(Gorilla)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (host, timestamp);

Delta encoding benefits monotonically increasing timestamps. Gorilla is effective for time series floating point values. LowCardinality provides dictionary encoding for string dimensions. Compression affects both storage footprint and query performance.

Storage characteristics can be inspected directly:

SELECT
    table,
    formatReadableSize(sum(bytes)) AS size
FROM system.parts
WHERE table = 'metrics'
GROUP BY table;

This tight coupling between storage format and execution engine is a defining trait of the ClickHouse performance model.

Single-Node First Architecture

ClickHouse is designed to extract maximum performance from a single node before horizontal scaling is introduced. A single instance on modern hardware can process multi billion row datasets and serve low latency analytical queries.

A typical configuration may look like:

<clickhouse>
    <max_threads>32</max_threads>
    <max_memory_usage>64G</max_memory_usage>
    <mark_cache_size>5368709120</mark_cache_size>
    <uncompressed_cache_size>8589934592</uncompressed_cache_size>
</clickhouse>

This configuration enables parallel execution across CPU cores and uses memory for mark and uncompressed caches. Vertical efficiency is prioritized before distributed complexity is added.

When clustering is required, distributed tables can be layered on top:

CREATE TABLE events_local
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_time);

CREATE TABLE events_distributed
ENGINE = Distributed(my_cluster, default, events_local, rand());

The distributed table acts as a routing layer, while the underlying storage model remains consistent. This separation simplifies scaling without forcing early architectural decisions. Teams can begin with a single-node deployment for logs or metrics and delay the complexity of sharding and replication. When scale demands it, ClickHouse supports distributed tables that query across multiple nodes using the same SQL interface. This incremental path to scale reduces early operational overhead and improves capacity planning flexibility.

Designed for High-Volume Analytical Workloads

ClickHouse targets append heavy analytical workloads. Logs, metrics, traces, and event data share similar characteristics: continuous ingestion, time based queries, and frequent aggregation across large datasets.

An observability platform may ingest application logs, aggregate error counts by service and time window, and support ad hoc queries during incident response. ClickHouse supports high ingestion throughput and parallel aggregation for this pattern.

Data can be ingested using streaming formats:

INSERT INTO raw_events FORMAT JSONEachRow
{"event_time":"2026-02-01 12:00:00","user_id":1,"revenue":9.99}
{"event_time":"2026-02-01 12:00:01","user_id":2,"revenue":19.99}

Or through bulk CSV ingestion:

clickhouse-client --query="INSERT INTO raw_events FORMAT CSV" < events.csv

Large scale aggregations execute using parallel scans:

SELECT
    toStartOfHour(event_time) AS hour,
    count(),
    sum(revenue)
FROM raw_events
WHERE event_time >= now() - INTERVAL 30 DAY
GROUP BY hour
ORDER BY hour;

These queries can scan billions of rows in seconds because ClickHouse is engineered for OLAP patterns: wide scans, heavy aggregation, and high concurrency for analytical queries.

SELECT
    user_id,
    quantileExact(0.95)(revenue) AS p95_revenue
FROM raw_events
GROUP BY user_id
ORDER BY p95_revenue DESC
LIMIT 10;

Percentiles, approximate aggregations, and statistical functions are native capabilities rather than extensions.

What Really Matters

Most database discussions focus on benchmarks or feature comparisons. That is not the relevant distinction. ClickHouse changes how engineers approach analytical data.

Traditional databases encourage aggressive pre aggregation, archiving, and schema design driven by query limits. Analytics becomes constrained by system capacity rather than business questions.

When large scale scans become routine, teams retain raw events and recompute metrics when needed. Less effort is spent building protective data pipelines. More effort is directed toward answering operational and business questions.

The architectural shift is clear. Transactional systems prioritize correctness per row. Analytical systems prioritize understanding across time and volume. ClickHouse aligns with the latter model.

When the cost of asking a question approaches zero, organizations ask better questions. Better questions produce better decisions.

Why Clickhouse Works At Scale

The MergeTree Engine Family

Performance Without Traditional Indexes

Insert-Time Materialized Views

Vectorized Execution and Compression

Single-Node First Architecture

Designed for High-Volume Analytical Workloads

What Really Matters

Related Posts

Upcoming Posts

Popular Tags

Archives

Syndicate