CLI Tools
Install these on the machine you will run Helm from:| Tool | Minimum version | Install |
|---|---|---|
kubectl | 1.28+ | kubernetes.io |
helm | 3.12+ | helm.sh |
Cluster Requirements
Kubernetes
| Property | Requirement |
|---|---|
| Kubernetes version | 1.28+ |
| CNI | Any (Calico, Cilium, Flannel, etc.) |
| Ingress controller | ingress-nginx (IngressClass nginx) |
| Storage | A default StorageClass with ReadWriteOnce support |
OpenShift
| Property | Requirement |
|---|---|
| OpenShift version | 4.12+ |
| Ingress | OpenShift Router (built-in) |
| Storage | Default StorageClass with ReadWriteOnce support |
| SCC | anyuid SCC for pods that need it (PostgreSQL, MinIO) |
Node Sizing
Without GPU inference (vLLM disabled)
| Component | CPU request | Memory request | Storage |
|---|---|---|---|
| backend | 250m | 512Mi | — |
| frontend | 100m | 256Mi | — |
| dashboard-connect | 100m | 256Mi | — |
| postgresql | 250m | 512Mi | 10 Gi PVC |
| qdrant | 500m | 1Gi | 10 Gi PVC |
| minio | 250m | 512Mi | 100 Gi PVC |
| otel-lgtm | 500m | 1Gi | 25 Gi PVC total |
| Total | ~2 vCPU | ~5 Gi | ~145 Gi |
With GPU inference (vLLM enabled)
The vLLM pod must be scheduled on a GPU node. The recommended model (Qwen3.5-9B-AWQ) requires:| Resource | Minimum | Recommended |
|---|---|---|
| GPU | 1× NVIDIA GPU with 16 Gi VRAM | 1× A10G 24 Gi (e.g. g5.2xlarge or on-prem equivalent) |
| CPU | 4 vCPU | 8 vCPU |
| RAM | 20 Gi | 28 Gi |
| Disk (model weights) | 30 Gi | 80 Gi PVC |
| NVIDIA driver | 525+ | 535+ |
| CUDA | 11.8+ | 12.x |
The GPU node must run the NVIDIA device plugin DaemonSet so
nvidia.com/gpu is visible as a schedulable resource. See GPU Setup.Image Registry Access
All Cobi application images (hellocobi/*) are hosted on Docker Hub as private images. You need:
- Docker Hub credentials with pull access to the
hellocobiorganization. - A Kubernetes Secret of type
kubernetes.io/dockerconfigjsonin the target namespace.
Hugging Face Token
vLLM downloads model weights from huggingface.co at startup. You need a Hugging Face account and an access token with read access to the model repository:- Create a token with
readscope. - Pass it via
vllmstack.servingEngineSpec.modelSpec[0].hf_tokenin your values file. - For air-gapped clusters, pre-download the model weights and serve them from a local cache volume.
Persistent Storage
All stateful components useReadWriteOnce PersistentVolumeClaims. The default StorageClass is used unless you specify storageClass in each component’s values.
For on-premises clusters without a cloud storage provisioner, common options are:
| Provisioner | Notes |
|---|---|
rancher.io/local-path | Single-node dev/staging; data is local to the node |
nfs.csi.k8s.io | Multi-node HA; requires an NFS server |
| OpenEBS | Block storage for bare-metal clusters |
| Longhorn | Distributed block storage for bare-metal clusters |