Skip to main content

CLI Tools

Install these on the machine you will run Helm from:
ToolMinimum versionInstall
kubectl1.28+kubernetes.io
helm3.12+helm.sh

Cluster Requirements

Kubernetes

PropertyRequirement
Kubernetes version1.28+
CNIAny (Calico, Cilium, Flannel, etc.)
Ingress controlleringress-nginx (IngressClass nginx)
StorageA default StorageClass with ReadWriteOnce support

OpenShift

PropertyRequirement
OpenShift version4.12+
IngressOpenShift Router (built-in)
StorageDefault StorageClass with ReadWriteOnce support
SCCanyuid SCC for pods that need it (PostgreSQL, MinIO)

Node Sizing

Without GPU inference (vLLM disabled)

ComponentCPU requestMemory requestStorage
backend250m512Mi
frontend100m256Mi
dashboard-connect100m256Mi
postgresql250m512Mi10 Gi PVC
qdrant500m1Gi10 Gi PVC
minio250m512Mi100 Gi PVC
otel-lgtm500m1Gi25 Gi PVC total
Total~2 vCPU~5 Gi~145 Gi
A single node with 4 vCPU / 8 Gi RAM and 200 Gi available disk is sufficient for a minimal deployment.

With GPU inference (vLLM enabled)

The vLLM pod must be scheduled on a GPU node. The recommended model (Qwen3.5-9B-AWQ) requires:
ResourceMinimumRecommended
GPU1× NVIDIA GPU with 16 Gi VRAM1× A10G 24 Gi (e.g. g5.2xlarge or on-prem equivalent)
CPU4 vCPU8 vCPU
RAM20 Gi28 Gi
Disk (model weights)30 Gi80 Gi PVC
NVIDIA driver525+535+
CUDA11.8+12.x
The GPU node must run the NVIDIA device plugin DaemonSet so nvidia.com/gpu is visible as a schedulable resource. See GPU Setup.

Image Registry Access

All Cobi application images (hellocobi/*) are hosted on Docker Hub as private images. You need:
  1. Docker Hub credentials with pull access to the hellocobi organization.
  2. A Kubernetes Secret of type kubernetes.io/dockerconfigjson in the target namespace.
kubectl create secret docker-registry dockerhub-secret \
  --docker-server=https://index.docker.io/v1/ \
  --docker-username=<username> \
  --docker-password=<password-or-token> \
  --docker-email=<email> \
  --namespace <your-namespace>
Reference it in your values file:
global:
  imagePullSecrets:
    - dockerhub-secret

Hugging Face Token

vLLM downloads model weights from huggingface.co at startup. You need a Hugging Face account and an access token with read access to the model repository:
  • Create a token with read scope.
  • Pass it via vllmstack.servingEngineSpec.modelSpec[0].hf_token in your values file.
  • For air-gapped clusters, pre-download the model weights and serve them from a local cache volume.

Persistent Storage

All stateful components use ReadWriteOnce PersistentVolumeClaims. The default StorageClass is used unless you specify storageClass in each component’s values. For on-premises clusters without a cloud storage provisioner, common options are:
ProvisionerNotes
rancher.io/local-pathSingle-node dev/staging; data is local to the node
nfs.csi.k8s.ioMulti-node HA; requires an NFS server
OpenEBSBlock storage for bare-metal clusters
LonghornDistributed block storage for bare-metal clusters
Set the storage class globally per component:
postgresql:
  primary:
    persistence:
      storageClass: "local-path"

qdrant:
  persistence:
    storageClass: "local-path"

minio:
  persistence:
    storageClass: "local-path"