Node Taints and Affinities

The kubernetes cluster is usually consisting of multiple nodes which has different hardware capabilities. Usually the most specific nodes are GPU (Graphics Processing Unit) nodes. The other possible configuration could be the nodes with higher storage capacity or higher memory capabilities.

This hardware specific nodes should be used by specific components. The mentioned GPU nodes should be running the LLM models for which it's designed. The other example would be high storage capacity node which could be used by Minio component.

Workload specification

The nodes should be marked for specific type of workload. It is done by a label, which is then identified by components as the required feature of the node. The node workload type should be identified by a label which is specifying which component could be using it.

To ensure the nodes are used by the compents for which these are determined there is a Kubernetes approach which is forcing the pods to be schedule on the specific hardware nodes or on other nodes. There are two configuration options :

Node Taints with Effect Tolerations
Node Affinity labels

Taints and affinity labels should be set on the nodes before the deployment. Since the deployment is scheduling the all the pods at start, it should be affected by the Node Taints and Affinity labels

The node label

There could be multiple workload types, depending on the node hardware and components requirements.

This node label is attracting the components with specific requirements to specific node.

The main workload types are :

gpu
storage
hight-memory

The label of the node is set by a command :

$ kubectl label node <node-name> workload=gpu

This will make the components with GPU requirements running on the node with this label.

The node taint with effect

The similar as node labels are the node taints which are preventing the other components to be scheduled on the node which has specific hardware capabilities.

This is due to a problem with regular pods being scheduled on the specific hardware node. These could take up the other hardware resources like cpu and memory. In the result the hardware resource (like GPU) is free but other resources like cpu and memory are insufficient.

The node taint with effect is preventing other pods which do not have the toleration set to be scheduled on the node. There are types of the taint effects used by the CoreAI platform :

PreferNoSchedule
NoSchedule

The node taint is set to specific workload type and the effect is specified with it. The command example to taint the node :

$ kubectl taint node <node-name> nvidia.com/gpu=present:PreferNoSchedule

The GPU requiring components are tolerating both taint effects.

PreferNoSchedule

This is soft rule which is making the kubernetes scheduler to put the pods on other nodes. It is using the tainted node only if there is no other nodes to be used for scheduling.

NoSchedule

This is hard rule which is preventing the scheduler to use this node with the pods which doesn't have the toleration to this effect.

Pre-Deployment configuration

The node workload type labels and the taints with effects must be specified before the deployment of CoreAI Platform.

Strict mode

It is possible to configure the kubernets scheduler node selection to be strict with the hard rules. Or other approach is to use the soft rules which are allowing the kubernetes scheduler to be less strict.

Strict Scheduler

If the kubernetes scheduler should be strict when selecting the node for pod use the taint effect NoSchedule. The other configuration should be configured in the tfvars secrets where the option enforce_affinity = true has to be specified.

Non-Strict Scheduler

In case the kubernetes scheduler should be less strict in node selection for pods use this options. For the taint effect use PreferNoSchedule. The other configuration in the tfvars secret file is the option enforce_affinity = false has to be specified.

Workload types

Currently the specific hardware specification for node is supporting:

- gpu

This is for running the LLMs (Large Language Models) or indexing the huge amount of documents. Use the command for GPU nodes:

$ kubectl label node <node-name> workload=gpu
$ kubectl taint node <node-name> nvidia.com/gpu=present:PreferNoSchedule

- storage

Storage specification is for the components with the higher storage requirements like the Object Storage.

$ kubectl label node <node-name> workload=storage
$ kubectl taint node <node-name> dedicated=storage:PreferNoSchedule

- high-memory

High-memory specification is for the components with the higher memory requirements like the CNPG database.

$ kubectl label node <node-name> workload=high-memory
$ kubectl taint node <node-name> dedicated=high-memory:PreferNoSchedule

- TBD

The other components should be defined later like the memory or high cpu performance nodes.

$ kubectl label node <node-name> workload=other
$ kubectl taint node <node-name> other-hw=present:PreferNoSchedule

Node Taints and Affinities

On this page