Dev_guideComponents

CoreAI

ClickHouse

ClickHouse documentation

ClickHouse

Description

ClickHouse is faster than most traditional data warehouses and databases and is most often used when real-time queries on large datasets are necessary at an affordable cost. However, developers also often use ClickHouse on top of their CDWH or OLTP databases to act as a “speed layer” within their existing infrastructure.

ClickHouse supports connectors to many clients and drivers, including common BI and data analysis tools. Please see this page for a complete list of supported integrations.

Uses and Functionnalities

ClickHouse is a high-performance, column-oriented SQL database. It is optimized for fast analytics on large-scale data, making it ideal for real-time data analysis and AI workloads. Key uses and functionalities include:

  • Columnar storage: Data is stored by columns, which reduces disk I/O and improves query speed by reading only relevant data, enabling fast aggregations and efficient compression.

  • Vectorized query execution: Uses CPU SIMD instructions to process multiple data rows simultaneously, maximizing query performance for complex analytical and AI tasks.

  • Real-time analytics: Supports low-latency queries that can run instantly on fresh data, suitable for dashboards, monitoring, and fast decision-making.

  • Distributed architecture: Scales horizontally across clusters with data sharding, replication, and parallel query execution to handle petabytes of data reliably.

  • MergeTree storage engine: Optimizes insert-heavy operations with sorting, partitioning, and background merging, maintaining high query throughput.

  • Advanced SQL support: Fully supports standard SQL, including joins, subqueries, window functions, and user-defined functions (UDFs) for flexible data querying.

  • Efficient data skipping indexes: Instead of traditional indexes, uses data skipping to skip irrelevant data parts, accelerating range queries.

  • Machine learning and AI integration: Serves as a central data platform for ML workflows, powering feature extraction, model training, inference, and vector search within one system.

  • Cost-effective resource use: Efficient compression and columnar design reduce storage needs and hardware requirements for large data workloads.

  • Seamless ML tooling integration: Compatible with popular ML frameworks and supports embedding ClickHouse directly in Python for analytics-driven AI.

CICD integration method

To enable Clickhouse, set the following variables in your main configuration file.

clickhouse_operator = {
    enabled                      = true
    version                      = "0.24.5"
    namespace                    = "clickhouse-operator"
    component                    = "dp-common"
    release_name                 = "clickhouse-operator"
    app_version                  = "0.24.5"
    clickhouse_cluster_namespace = "clickhouse"
}

API / Swagger

Releases

DateNum. VersionNum. ChartDescription
2025-05-150.24.50.24.5Implement Clickhouse Operator 0.24.5 and Clickhouse Server v25.3.2.39

Official documentation

CliockHouse documentation

On this page