CoreAI
ClickHouse
ClickHouse documentation
ClickHouse
Description
ClickHouse is faster than most traditional data warehouses and databases and is most often used when real-time queries on large datasets are necessary at an affordable cost. However, developers also often use ClickHouse on top of their CDWH or OLTP databases to act as a “speed layer” within their existing infrastructure.
ClickHouse supports connectors to many clients and drivers, including common BI and data analysis tools. Please see this page for a complete list of supported integrations.
Uses and Functionnalities
ClickHouse is a high-performance, column-oriented SQL database. It is optimized for fast analytics on large-scale data, making it ideal for real-time data analysis and AI workloads. Key uses and functionalities include:
-
Columnar storage: Data is stored by columns, which reduces disk I/O and improves query speed by reading only relevant data, enabling fast aggregations and efficient compression.
-
Vectorized query execution: Uses CPU SIMD instructions to process multiple data rows simultaneously, maximizing query performance for complex analytical and AI tasks.
-
Real-time analytics: Supports low-latency queries that can run instantly on fresh data, suitable for dashboards, monitoring, and fast decision-making.
-
Distributed architecture: Scales horizontally across clusters with data sharding, replication, and parallel query execution to handle petabytes of data reliably.
-
MergeTree storage engine: Optimizes insert-heavy operations with sorting, partitioning, and background merging, maintaining high query throughput.
-
Advanced SQL support: Fully supports standard SQL, including joins, subqueries, window functions, and user-defined functions (UDFs) for flexible data querying.
-
Efficient data skipping indexes: Instead of traditional indexes, uses data skipping to skip irrelevant data parts, accelerating range queries.
-
Machine learning and AI integration: Serves as a central data platform for ML workflows, powering feature extraction, model training, inference, and vector search within one system.
-
Cost-effective resource use: Efficient compression and columnar design reduce storage needs and hardware requirements for large data workloads.
-
Seamless ML tooling integration: Compatible with popular ML frameworks and supports embedding ClickHouse directly in Python for analytics-driven AI.
CICD integration method
To enable Clickhouse, set the following variables in your main configuration file.
clickhouse_operator = {
enabled = true
version = "0.24.5"
namespace = "clickhouse-operator"
component = "dp-common"
release_name = "clickhouse-operator"
app_version = "0.24.5"
clickhouse_cluster_namespace = "clickhouse"
}API / Swagger
Releases
| Date | Num. Version | Num. Chart | Description |
|---|---|---|---|
| 2025-05-15 | 0.24.5 | 0.24.5 | Implement Clickhouse Operator 0.24.5 and Clickhouse Server v25.3.2.39 |