Getting Started
CoreAI components
CoreAI components documentation
CoreAI components
Airbyte
Description
Airbyte is open source data movement infrastructure for building extract and load (EL) data pipelines. It is designed for versatility, scalability, and ease-of-use.
Uses and Functionnalities
Teams and organizations need efficient and timely data access to an ever-growing list of data sources. In-house data pipelines are brittle and costly to build and maintain. Airbyte's unique open source approach enables your data stack to adapt as your data needs evolve.
-
Wide connector availability: Airbyte’s connector catalog comes “out-of-the-box” with over 600 pre-built connectors. These connectors can be used to start replicating data from a source to a destination in just a few minutes.
-
Long-tail connector coverage: You can easily extend Airbyte’s capability to support your custom use cases through Airbyte's No-Code Connector Builder.
-
Robust platform provides horizontal scaling required for large-scale data movement operations, available as Cloud-managed or Self-managed.
-
Accessible User Interfaces through the UI, PyAirbyte (Python library), API, and Terraform Provider to integrate with your preferred tooling and approach to infrastructure management.
Airbyte is suitable for a wide range of data integration use cases, including AI data infrastructure and EL(T) workloads. Airbyte is also embeddable within your own app or platform to power your product.
Apisix
Description
Uses and Functionnalities
Argo Events
Description
Argo Events is an event-based dependency manager for Kubernetes which helps you define multiple dependencies from a variety of event sources like webhook, s3, schedules, streams etc. and trigger Kubernetes objects after successful event dependencies resolution.
Uses and Functionnalities
- Manage dependencies from a variety of event sources.
- Ability to customize business-level constraint logic for event dependencies resolution.
- Manage everything from simple, linear, real-time dependencies to complex, multi-source, batch job dependencies.
- Ability to extends framework to add your own event source listener.
- Define arbitrary boolean logic to resolve event dependencies.
- CloudEvents compliant.
- Ability to manage event sources at runtime.
Argo Workflows
Description
Argo Workflows parallel job orchestration tool. It allows jobs to be scheduled using job modeling, enabling complete and sometimes complex orchestration in an easy manner.
Uses and Functionnalities
Argo CD
Description
Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes.
Argo CD follows the GitOps pattern of using Git repositories as the source of truth for defining the desired application state. Kubernetes manifests can be specified in several ways:
- kustomize applications
- helm charts
- jsonnet files
- Plain directory of YAML/json manifests
- Any custom config management tool configured as a config management plugin
Argo CD automates the deployment of the desired application states in the specified target environments. Application deployments can track updates to branches, tags, or be pinned to a specific version of manifests at a Git commit.
Uses and Functionnalities
- Automated deployment of applications to specified target environments
- Support for multiple config management/templating tools (Kustomize, Helm, Jsonnet, plain-YAML)
- Ability to manage and deploy to multiple clusters
- SSO Integration (OIDC, OAuth2, LDAP, SAML 2.0, GitHub, GitLab, Microsoft, LinkedIn)
- Multi-tenancy and RBAC policies for authorization
- Rollback/Roll-anywhere to any application configuration committed in Git repository
- Health status analysis of application resources
- Automated configuration drift detection and visualization
- Automated or manual syncing of applications to its desired state
- Web UI which provides real-time view of application activity
- CLI for automation and CI integration
- Webhook integration (GitHub, BitBucket, GitLab)
- Access tokens for automation
- PreSync, Sync, PostSync hooks to support complex application rollouts (e.g.blue/green & canary upgrades)
- Audit trails for application events and API calls
- Prometheus metrics
- Parameter overrides for overriding helm parameters in Git
Cert Manager
Description
cert-manager is a powerful and extensible X.509 certificate controller for Kubernetes and OpenShift workloads. It will obtain certificates from a variety of Issuers, both popular public Issuers as well as private Issuers, and ensure the certificates are valid and up-to-date, and will attempt to renew certificates at a configured time before expiry.
Uses and Functionnalities
- Automated issuance and renewal of certificates to secure Ingress with TLS
- Fully integrated Issuers from recognised public and private Certificate Authorities
- Secure pod-to pod communication with mTLS using private PKI Issuers
- Supports certificate use cases for web facing and internal workloads
- Open source add-ons for enhanced cloud native service mesh security
ClickHouse
Description
ClickHouse is faster than most traditional data warehouses and databases and is most often used when real-time queries on large datasets are necessary at an affordable cost. However, developers also often use ClickHouse on top of their CDWH or OLTP databases to act as a “speed layer” within their existing infrastructure.
ClickHouse supports connectors to many clients and drivers, including common BI and data analysis tools. Please see this page for a complete list of supported integrations.
Uses and Functionnalities
ClickHouse is a high-performance, column-oriented SQL database. It is optimized for fast analytics on large-scale data, making it ideal for real-time data analysis and AI workloads. Key uses and functionalities include:
- Columnar storage: Data is stored by columns, which reduces disk I/O and improves query speed by reading only relevant data, enabling fast aggregations and efficient compression.
- Vectorized query execution: Uses CPU SIMD instructions to process multiple data rows simultaneously, maximizing query performance for complex analytical and AI tasks.
- Real-time analytics: Supports low-latency queries that can run instantly on fresh data, suitable for dashboards, monitoring, and fast decision-making.
- Distributed architecture: Scales horizontally across clusters with data sharding, replication, and parallel query execution to handle petabytes of data reliably.
- MergeTree storage engine: Optimizes insert-heavy operations with sorting, partitioning, and background merging, maintaining high query throughput.
- Advanced SQL support: Fully supports standard SQL, including joins, subqueries, window functions, and user-defined functions (UDFs) for flexible data querying.
- Efficient data skipping indexes: Instead of traditional indexes, uses data skipping to skip irrelevant data parts, accelerating range queries.
- Machine learning and AI integration: Serves as a central data platform for ML workflows, powering feature extraction, model training, inference, and vector search within one system.
- Cost-effective resource use: Efficient compression and columnar design reduce storage needs and hardware requirements for large data workloads.
- Seamless ML tooling integration: Compatible with popular ML frameworks and supports embedding ClickHouse directly in Python for analytics-driven AI.
CoreAI-doc-pipeline
Description
This component is a part of our Document Processing Pipeline. It is designed to automatically process files uploaded to MinIO by extracting relevant information and saving it into a ClickHouse database for future searches, reporting, and AI processing.
Uses and Functionnalities
coreai-llm-backend
Description
coreai-llm-backend is a backend service for managing and serving large language models (LLMs). It provides APIs for model inference, management, and integration with other systems. The backend is designed for scalable, secure, and efficient LLM operations in enterprise environments.
Uses and Functionalities
Harbor
Description
Harbor is an open source registry that secures artifacts with policies and role-based access control, ensures images are scanned and free from vulnerabilities, and signs images as trusted. Harbor, a CNCF Graduated project, delivers compliance, performance, and interoperability to help you consistently and securely manage artifacts across cloud native compute platforms like Kubernetes and Docker.
Uses and Functionnalities
Security
- Security and vulnerability analysis
- Content signing and validation
Management
- Multi-tenant
- Extensible API and web UI
- Replication across many registries, including Harbor
- Identity integration and role-based access control
Keycloak
Description
Identity and authorization management tool. It allows user information to be centralized for all applications that require it. The following chapter explains a simple way to add or remove a user from the entire platform or from a component.
Uses and Functionnalities
- Support for industry-standard protocols such as OpenID Connect, OAuth 2.0, and SAML 2.0, allowing seamless integration with many systems.
- User federation capabilities to connect with existing user directories like LDAP and Active Directory, centralizing identity management.
- Identity brokering and social login, enabling users to sign in using external identity providers such as Google, Facebook, and GitHub.
- A web-based admin console for managing users, roles, permissions, and configuring fine-grained authorization policies.
- Account management console that empowers users to manage their profiles, sessions, and two-factor authentication settings.
- Customizable authentication flows, including advanced password policies and optional features like user self-registration and email verification.
- Session management tools to monitor and control user sessions centrally.
- Flexibility to secure web apps, mobile apps, and RESTful APIs using token-based authentication with access tokens issued by Keycloak.
- Extensible architecture supporting integration with external identity stores and the ability to customize login pages and user interfaces.
Langfuse
Description
Langfuse is a powerful, open-source platform that empowers developers and organizations to monitor, evaluate, and optimize their large language model applications. It facilitates collaborative debugging, prompt management, and comprehensive metrics tracking, all while being highly customizable and scalable. With support for self-hosting and enterprise-grade compliance, Langfuse is built to enhance the performance and reliability of AI-driven workflows in production environments.
Uses and Functionnalities
- LLM Application Observability: Tracks and logs every LLM and non-LLM call, including retrieval, embedding, and API calls, allowing detailed trace inspection and debugging of complex multi-turn conversations or workflows.
- Prompt Management: Centralizes and version-controls prompts, enabling teams to collaboratively iterate on prompts and directly test different prompt versions and models.
- Evaluation (Evals): Supports automated and manual quality measurements of LLM output, including user feedback collection, custom evaluation pipelines, and comparison against ground truth datasets.
- Usage and Cost Tracking: Monitors token usage, request latency, and associated costs per user or application type, with APIs for downstream analytics, billing, and rate-limiting.
- Error Logging and Monitoring: Pinpoints errors and bottlenecks in LLM workflows and labels outputs such as refusals or hallucinations for improved debugging.
- Customizable and Extensible: Offers an API-first architecture with SDKs for Python and JavaScript, integrations with popular frameworks like LangChain and LlamaIndex, and OpenTelemetry compatibility.
- Data Export and Compliance: Supports data exports to blob storage and complies with standards such as ISO27001, SOC2, GDPR, and HIPAA, making it suitable for enterprise deployments.
Metal LB
Description
This component ensures that the solution is connected to the network layer of the environment. Pods distributed across each of the solution's worker nodes ensure that Kubernetes responds to an address or network range.
Warning Over time, MetalLB and Calico overlap in definition and functionality. It's important to consider the implementation based on your use case.
Uses and Functionnalities
Often interpreted as an on-prem alternative to cloud host loadbalancers, this component allows the persistence of an IP address for Kubernetes' Ingress Loadbalancers. In fact, a network address is carried by a host (often nodes), when the host goes down, it allows the transfer of the IP address to another host, thus avoiding a network outage.
In addition, it can be deployed according to tags at the node level, thus making it possible to define nodes which will be able to communicate externally while others (which do not have the defined tag) will remain dedicated to calculation or other activities.
Milvus
Description
Milvus is an open-source, high-performance vector database designed specifically for handling and searching large-scale vector data, crucial for AI and data-driven applications. It provides efficient management, similarity search, and retrieval of complex data representations such as embeddings from text, images, videos, and other unstructured data. Milvus's architecture supports scalability and elasticity, enabling it to handle billions or even trillions of vectors across distributed environments.
Uses and Functionnalities
- Efficient vector similarity search using state-of-the-art algorithms like IVF, HNSW, and FAISS, critical for matching and ranking in AI applications.
- GPU acceleration for accelerated indexing and search operations, enabling real-time processing of massive datasets.
- Support for multi-modal data types including dense, sparse, and binary vectors, along with advanced data types like JSON and arrays.
- Hybrid search capabilities combining semantic and keyword-based queries with metadata filtering for refined results.
- Scalable and distributed architecture allowing horizontal scaling, load balancing, and high availability.
- Integration with AI frameworks and embedding models facilitating seamless embedding generation and reranking.
- Full-text search support alongside vector searches for comprehensive data retrieval.
- Strong data management features including partitioning, clustering keys, and multi-tenancy support for secure resource isolation.
Minio
Description
Divided into two modules (Operator and Tenant), Minio introduces the concept of S3 storage. The Operator allows you to create and manage different MinIO instances. A Minio Tenant isserver-like instance (storage + API) that allows you to present Buckets (using AWS, GCP, and Azure methods) Access policies are available for each user, as well as encryption and replication methods.
Uses and Functionnalities
- Cloud-Native and Kubernetes Native: Runs in lightweight containers and integrates seamlessly with Kubernetes for easy deployment, scaling, and multi-tenancy management.
- High Performance and Scalability: Optimized to deliver low-latency and high-throughput data access, capable of scaling from single nodes to multi-petabyte clusters to support heavy AI/ML workloads.
- S3 API Compatibility: Supports the full Amazon S3 API, making it a drop-in replacement for AWS S3, ideal for hybrid, multi-cloud, or on-premise environments.
- Distributed Architecture: Supports distributed deployments that improve fault tolerance, performance, and reliability.
- Data Protection: Features erasure coding, encryption both in-transit and at-rest, and object locking (WORM) to ensure data integrity, security, and compliance.
- Metadata-less Design: Stores metadata with the data objects, eliminating the need for a separate metadata database and enhancing speed and scalability.
- AI/ML Optimized Storage: Provides a data lake foundation for AI projects, supports parallel data access for faster AI model training and inference, and enables seamless integration with AI/ML frameworks like Kubeflow.
- Event Notifications and Integration: Supports event-driven workflows and can integrate with databases and backup systems for AI data pipeline automation.
Monitoring
Description
Comprising three monitoring modules, Grafana, Prometheus, and Loki this module allows you to monitor the load and consumption of platform components.
- Loki: Log viewing tool
- Prometheus: Tool for collecting application consumption metrics.
- Grafana: Dashboard modeling tool for simplified monitoring. For more information, please refer to the following addresses:
Uses and Functionnalities
Loki
- Centralized log collection and storage
- Log querying and visualization in Grafana
- Correlation of logs with metrics and traces
- Lightweight logging for Kubernetes and cloud-native environments
Prometheus
- Monitoring system performance (CPU, memory, disk, etc.)
- Application metrics collection (e.g., request latency, error rates)
- Alerting based on metric thresholds
- Time-series data storage and querying
Grafana
- Real-Time Monitoring Dashboards
- Alerting System
- Custom Reporting*
- Kubernetes Monitoring
Nginx Ingress controler
Description
Ingress NGINX is a Kubernetes Ingress Controller that uses NGINX as a reverse proxy and load balancer to manage external access to services running inside a Kubernetes cluster.
It implements the Ingress API to route HTTP and HTTPS traffic to the appropriate backend services based on rules defined in Ingress resources.
Uses and Functionnalities
- Routing external traffic
- SSL/TLS rermination
- Load Balancing
- Path-Based and Host-Based routing
- Redirection and traffic rules
Cloud Native PG & PGadmin
Description
Split into two applications (CNPG and PgAdmin), it provides a secure and easy-to-use PostgreSQL environment. Each PostgreSQL cluster is secured by the PostgreSQL Clustering application, which is simple and completely transparent. CNPG is developed and maintained by the PostgreSQL community, like some other solutions. We equipped the environment with a replicated 3-node PostgreSQL cluster as an example. PGAdmin is the essential PostgreSQL administration/development tool, allowing you to view and debug SQL queries/stored procedures or other modeling done within a PostgreSQL instance.
Uses and Functionnalities
- Visual database management: Allows users to create, modify, and delete databases, tables, indexes, and other database objects.
- Query tool: Provides an SQL editor with syntax highlighting, autocomplete, and query execution features for running and debugging SQL queries.
- Data browsing and editing: Enables viewing, filtering, and editing table data directly from the interface.
- Backup and restore: Supports database backup and restore operations to safeguard data.
- User and permission management: Facilitates managing database users and roles, including granting and revoking permissions.
- Monitoring: Offers dashboards and statistics to monitor database performance and server activity.
- Support for multiple PostgreSQL versions: Compatible with various PostgreSQL releases, ensuring flexibility.
- Extensible with plugins: Supports extensions to add more features.
- Backup job scheduling: Automates routine backup tasks through job scheduling in pgAgent.
CoreAI Portal
The BSQAI Platform Portal allows to have a simple platform that unify and facilitate configuration of the platform through a simple and intuitive interface.
Description
The CoreAI Portal is a web-based platform that provides access to artificial intelligence capabilities through a user-friendly interface.
The portal serves as a centralized access point for various AI functionalities provided by our AI Platform, designed to be accessible to users regardless of their technical background.
This documentation will guide you through the portal's features and capabilities and how to use them.
Uses and Functionnalities
- GenAI Chatbot: Interact with Generative AI models through familiar conversational interface to ask questions, request analysis, or obtain assistance with various tasks
- Library: Upload and manage documents, images, and other files that can be processed and analyzed by the AI Platform components
- Knowledge Base: Organize and maintain the vectorized collections that the AI Platform uses to provide accurate and relevant responses in the Gen AI Chatbot interface
- Models Management: Access different AI models optimized for specific tasks such as text generation, data analysis, or document processing
- Model Repository: Own and control the models you want to use within your organization
- Applications and Services: Monitor the operational status of platform applications and services.
- Audit: Review logs of all actions performed within the portal by its users for accountability and compliance purposes
- Settings: Customize portal settings and preferences to meet organizational requirements and user needs, modify identity and access management settings.
Reflector
Description
Reflector is a custom Kubernetes controller designed to replicate secrets, configmaps, and certificates across namespaces within a Kubernetes cluster. It monitors source resources for changes and automatically reflects those changes to mirror resources in the same or different namespaces. This functionality ensures consistency and simplifies management of configuration data and sensitive information across multiple environments. Reflector supports automatic mirror creation, allows fine-grained control over which namespaces can access mirrored resources, and integrates with cert-manager for seamless certificate mirroring.
Uses and Functionnalities
- Replicates secrets and configmaps across namespaces, ensuring synchronization.
- Supports automatic creation and deletion of mirror resources based on source resources.
- Enables fine control through annotations specifying allowed namespaces for reflection and automatic mirroring.
- Integrates with cert-manager, reflecting certificate secrets automatically using annotations.
- Compatible with multiple CPU architectures (amd64, arm, arm64) and deployable via Helm or manual Kubernetes manifests.
- Tracks reflected versions with annotations to manage synchronization state.
- Can be deployed with RBAC, service accounts, and health probes for production readiness.
- Skips conflicting resources in mirror namespaces and logs warnings to prevent disruptions.
- Customizable through Helm values for image versions, resource limits, and node placement.
- Enhances security by allowing namespace restrictions on reflection and auto-mirroring operations.
Sealed Secrets
Description
Sealed Secrets is an open-source Kubernetes tool that encrypts sensitive data for secure storage in Git. It leverages public-key cryptography to ensure that secrets are only decrypted within the target Kubernetes cluster, enhancing security in CI/CD and GitOps pipelines. This functionality allows teams to adopt good secret management practices by keeping sensitive information encrypted while maintaining seamless deployment automation and operational efficiency. Sealed Secrets helps prevent accidental secret disclosure and supports controlled, flexible management across namespaces and clusters.
Uses and Functionnalities
- Encrypts Kubernetes Secrets using asymmetric cryptography, ensuring secrets are stored in an encrypted form.
- Allows storing encrypted secrets safely in version control systems like Git, preventing exposure of sensitive information.
- Uses a public key to seal (encrypt) secrets outside the cluster and a private key within the cluster to decrypt them.
- Implements a Kubernetes Custom Resource Definition (CRD) called SealedSecret to hold the encrypted secrets.
- The Sealed Secrets controller runs inside the Kubernetes cluster and automatically decrypts SealedSecrets into standard Kubernetes Secrets.
- Supports different scopes for decryption: strict (name and namespace-bound), namespace-wide (allows renaming in the same namespace), and cluster-wide (decrypted anywhere in the cluster).
- Enables GitOps workflows by allowing encrypted secrets to live alongside other deployment manifests securely.
- Ensures that even if the encrypted secrets are exposed in public or shared repositories, only the designated Kubernetes cluster can decrypt them.
- Provides a CLI tool, kubeseal, for encrypting secrets offline using the cluster's public key.
- Enhances security by eliminating the risk of base64-encoded secrets being easily decoded by unauthorized users.
Superset
Description
Superset is a modern data exploration and data visualization platform. Superset can replace or augment proprietary business intelligence tools for many teams. Superset integrates well with a variety of data sources.
Uses and Functionnalities
- No-code interface for building charts quickly
- Powerful, web-based SQL Editor for advanced querying
- Lightweight semantic layer for quickly defining custom dimensions and metrics
- Out of the box support for nearly any SQL database or data engine
- Wide array of beautiful visualizations to showcase your data, ranging from simple bar charts to geospatial visualizations
- Lightweight, configurable caching layer to help ease database load
- Highly extensible security roles and authentication options
- API for programmatic customization
- Cloud-native architecture designed from the ground up for scale