ClickHouse

Description

ClickHouse is faster than most traditional data warehouses and databases and is most often used when real-time queries on large datasets are necessary at an affordable cost. However, developers also often use ClickHouse on top of their CDWH or OLTP databases to act as a “speed layer” within their existing infrastructure.

ClickHouse supports connectors to many clients and drivers, including common BI and data analysis tools. Please see this page for a complete list of supported integrations.

Uses and Functionnalities

ClickHouse is a high-performance, column-oriented SQL database. It is optimized for fast analytics on large-scale data, making it ideal for real-time data analysis and AI workloads. Key uses and functionalities include:

Columnar storage: Data is stored by columns, which reduces disk I/O and improves query speed by reading only relevant data, enabling fast aggregations and efficient compression.
Vectorized query execution: Uses CPU SIMD instructions to process multiple data rows simultaneously, maximizing query performance for complex analytical and AI tasks.
Real-time analytics: Supports low-latency queries that can run instantly on fresh data, suitable for dashboards, monitoring, and fast decision-making.
Distributed architecture: Scales horizontally across clusters with data sharding, replication, and parallel query execution to handle petabytes of data reliably.
MergeTree storage engine: Optimizes insert-heavy operations with sorting, partitioning, and background merging, maintaining high query throughput.
Advanced SQL support: Fully supports standard SQL, including joins, subqueries, window functions, and user-defined functions (UDFs) for flexible data querying.
Efficient data skipping indexes: Instead of traditional indexes, uses data skipping to skip irrelevant data parts, accelerating range queries.
Machine learning and AI integration: Serves as a central data platform for ML workflows, powering feature extraction, model training, inference, and vector search within one system.
Cost-effective resource use: Efficient compression and columnar design reduce storage needs and hardware requirements for large data workloads.
Seamless ML tooling integration: Compatible with popular ML frameworks and supports embedding ClickHouse directly in Python for analytics-driven AI.

CICD integration method

To enable Clickhouse, set the following variables in your main configuration file.

clickhouse_operator = {
    enabled                      = true
    version                      = "0.24.5"
    namespace                    = "clickhouse-operator"
    component                    = "dp-common"
    release_name                 = "clickhouse-operator"
    app_version                  = "0.24.5"
    clickhouse_cluster_namespace = "clickhouse"
}

API / Swagger

Releases

Date	Num. Version	Num. Chart	Description
2025-05-15	0.24.5	0.24.5	Implement Clickhouse Operator 0.24.5 and Clickhouse Server v25.3.2.39

Official documentation

CliockHouse documentation

ClickHouse

ClickHouse

Description

Uses and Functionnalities

CICD integration method

API / Swagger

Releases

Official documentation

On this page