KEDA — Kubernetes Event-Driven Autoscaling

Operational & Developer Reference Guide

What is KEDA?
Architecture & Components
KEDA Metrics Server vs Standard Metrics Server
KEDA vs HPA with Custom Metrics
Enabling KEDA on a Cluster
Customising the KEDA Helm Values
Prometheus Integration
Core CRDs
Scale-to-Zero Configuration
HPA-Backed Scale-Up Configuration
Authentication
Example Configurations
CPU-Based Scaling with KEDA
Time-Based Scaling — Scale to Zero on a Schedule
Cron + CPU: schedule vs load
Vertical Scaling — Pod Resource Requests (VPA + KEDA)
Operational Commands & Debugging
Known Constraints & Gotchas
When to Use KEDA vs Plain HPA

What is KEDA?

KEDA (Kubernetes Event-Driven Autoscaler) extends Kubernetes' native scaling capabilities to external event sources. While the standard Horizontal Pod Autoscaler (HPA) relies purely on internal resource metrics like CPU and memory, KEDA scales workloads based on external signals — queue depth, stream lag, HTTP request rate, Prometheus queries, cloud service triggers, and more.

Key capabilities:

Scale to zero — drain replicas completely when idle, scale back up the moment events arrive
60+ built-in scalers — Kafka, RabbitMQ, Redis, AWS SQS, Azure Service Bus, Prometheus, PostgreSQL, cron, and more
HPA integration — KEDA does not replace HPA; it generates and manages HPA objects under the hood, feeding them event-driven metrics
Job scaling — spawn and clean up Kubernetes Jobs in response to events via ScaledJob
Proactive scaling — acts on queue depth or stream lag before CPU spikes, unlike reactive HPA polling

Note

KEDA is a CNCF graduated project and is production-grade. It must be the only installed external metrics adapter in the cluster.

Architecture & Components

External Event Source (e.g. Kafka, RabbitMQ, SQS)
        │
        ▼
  KEDA Scaler (monitors the event source)
        │
        ▼
  KEDA Metrics Adapter (exposes metrics at external.metrics.k8s.io)
        │
        ▼
  HPA Controller (reads metrics, decides replica count)
        │
        ▼
  Pods (scaled up / down / to zero)

When KEDA is installed by this platform, the following components run in the keda-system namespace:

Container	Role
`keda-operator`	Manages CRDs, controls 0↔1 scaling (activation), connects to event sources
`keda-operator-metrics-apiserver`	Implements the External Metrics API; serves event-source metrics to the HPA
`keda-admission-webhooks`	Validates `ScaledObject` / `ScaledJob` resources at admission time

KEDA dynamically creates and manages an HPA object for each ScaledObject. You do not manage that HPA directly.

KEDA Metrics Server vs Standard Metrics Server

These are two distinct components serving different API paths — they coexist without conflict.

	Standard Metrics Server	KEDA Metrics Server
API group	`metrics.k8s.io/v1beta1`	`external.metrics.k8s.io/v1beta1`
Data source	Kubelet (node/pod CPU & memory)	External systems (queues, streams, APIs)
Used by	`kubectl top`, HPA CPU/memory scaling	HPA event-driven scaling via KEDA
Looks	Inward — inside the cluster	Outward — outside the cluster
Installed by	Kubernetes cluster (or `metrics-server` chart)	KEDA Helm install

Important

For CPU and memory KEDA scalers, KEDA falls back to the standard Metrics Server (metrics.k8s.io). You still need the standard Metrics Server installed if you use resource-based triggers in KEDA.

Important

Only one implementor of external.metrics.k8s.io is permitted per cluster. Running another custom adapter alongside KEDA will break metric resolution.

Query KEDA's external metrics directly:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"

# Query a specific scaler's metric value
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/<namespace>/<metric-name>?labelSelector=scaledobject.keda.sh/name=<scaledobject-name>"

KEDA vs HPA with Custom Metrics

Note

The comparison is not "KEDA vs HPA" — KEDA uses HPA internally. The real question is: KEDA vs manually wiring HPA to a custom metrics adapter.

Capability	HPA + Custom Metrics Adapter	KEDA
Scale to zero	Minimum 1 replica	Minimum 0 replicas
Proactive (queue-depth) scaling	Lags — reacts after CPU spikes	Scales on queue length before CPU spikes
External event sources	Requires custom adapter per source	60+ built-in scalers, zero adapter code
Auth to external systems	Ad hoc per adapter implementation	First-class `TriggerAuthentication` CRD
Multi-trigger logic	One HPA per metric, causes conflicts	Multiple triggers in a single ScaledObject
Job / batch scaling	Not supported	`ScaledJob` CRD
Config complexity	High — adapter config + HPA YAML	Low — single ScaledObject YAML
Cluster overhead	Minimal (native)	Low — one operator + metrics server pod
Config validation	Silent failures	Admission webhooks prevent conflicts
Estimated cost saving (batch)	Baseline	~30% reduction via scale-to-zero

When plain HPA still wins:

CPU/memory metrics directly correlate with your load
Simple stateless web service with gradual, predictable traffic
You want zero additional cluster components
You need something up quickly — HPA is native and well-documented

Enabling KEDA on a Cluster

KEDA is delivered as a platform Helm addon (addons/helm/oss.yaml, feature: keda). It is off by default and enabled per cluster via a feature label on the cluster definition.

Enable in the cluster definition

In the tenant repository, add the enable_keda label to the cluster definition:

# <tenant_path>/clusters/<cluster_name>.yaml
metadata:
  labels:
    enable_keda: "true"

Argo CD will deploy KEDA as a system Helm application (for example system-keda-<cluster_name>) into the keda-system namespace.

Verify the install

kubectl get pods -n keda-system
# Expected:
# keda-operator-xxxxx                          1/1   Running
# keda-operator-metrics-apiserver-xxxxx        1/1   Running
# keda-admission-webhooks-xxxxx                1/1   Running

# CRDs installed by KEDA
kubectl get crds | grep keda

Prerequisites

The standard Metrics Server is installed (required for the KEDA cpu / memory triggers).
No other implementor of external.metrics.k8s.io is installed in the cluster.

Customising the KEDA Helm Values

The platform ships default Helm values for KEDA under config/keda/ in this repository. Tenants override or extend these values from their workloads repository using the same path layout.

Value file layout

File	Scope
`config/keda/all.yaml`	Defaults applied to every cluster that consumes this path
`config/keda/<cloud_vendor>.yaml`	Per-cloud defaults (for example `aws.yaml`, `azure.yaml`)
`config/keda/<cluster_name>.yaml`	Overrides for a single cluster (matches the cluster's `cluster_name` field)

Resolution order (precedence)

Values are layered; more specific files override the same keys from less specific ones. From highest to lowest precedence:

Cluster-specific (workloads repo): config/keda/<cluster_name>.yaml
Cloud-specific (workloads repo): config/keda/<cloud_vendor>.yaml
Global tenant (workloads repo): config/keda/all.yaml
Cloud-specific (platform repo): config/keda/<cloud_vendor>.yaml
Global platform defaults (platform repo): config/keda/all.yaml

Missing files are ignored (ignoreMissingValueFiles: true). Maps are deep-merged by Helm; lists are replaced.

What the platform defaults set

The platform config/keda/all.yaml ships with the following opinionated defaults:

3 replicas of the operator and the metrics API server, spread across nodes via podAntiAffinity on kubernetes.io/hostname
Hardened pod / container security context (non-root, read-only filesystem, no privilege escalation, all capabilities dropped)

Per-workload scaling behaviour (cooldown, HPA behavior, restoreToOriginalReplicaCount, fallback, etc.) is not configured here — those fields live on the ScaledObject CRD under spec.advanced and are set by tenants on each ScaledObject. See the KEDA ScaledObject spec.

Example — override replica count for a single cluster

# <tenant_path>/config/keda/dev.yaml
operator:
  replicaCount: 1
metricsServer:
  replicaCount: 1

Example — set a priority class for the KEDA control plane

# <tenant_path>/config/keda/all.yaml
priorityClassName: system-cluster-critical

Refer to the upstream values.yaml for the complete list of supported keys.

Prometheus Integration

KEDA exposes Prometheus metrics from each of its components (operator, metrics server, admission webhooks). To scrape them via the Prometheus Operator (kube-prometheus-stack), enable the chart's prometheus.* flags through config/keda/....

Metrics endpoints

Each component serves Prometheus metrics on its own port at /metrics:

Component	Port	What you get
`keda-operator`	`8080`	`keda_scaler_active`, `keda_scaler_metrics_value`, `keda_scaled_object_errors_total`, `keda_resource_registered_total`, `keda_trigger_registered_total`, build info, scaling-loop latency
`keda-operator-metrics-apiserver`	`8080`	gRPC client metrics for the internal metrics service, plus the standard `apiserver_*` metrics
`keda-admission-webhooks`	`8080`	`keda_webhook_scaled_object_validation_total`, `keda_webhook_scaled_object_validation_errors`

See the KEDA Prometheus integration docs for the full list.

Enable ServiceMonitors

If the cluster runs kube-prometheus-stack (enable_kube_prometheus_stack: "true"), enable KEDA's ServiceMonitor resources so Prometheus discovers and scrapes each component.

# <tenant_path>/config/keda/all.yaml
prometheus:
  metricServer:
    enabled: true                  # Expose Prometheus metrics on the metrics API server
    serviceMonitor:
      enabled: true                # Create a ServiceMonitor for it
      interval: 30s
      scrapeTimeout: 10s
      additionalLabels:
        # Label your kube-prometheus-stack Prometheus selects on.
        # Default selector for the chart is `release: <helm-release-name>`.
        release: kube-prometheus-stack

  operator:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      additionalLabels:
        release: kube-prometheus-stack

  webhooks:
    enabled: true
    serviceMonitor:
      enabled: true
      interval: 30s
      additionalLabels:
        release: kube-prometheus-stack

Match the Prometheus selector

The additionalLabels value must match the serviceMonitorSelector configured on your Prometheus instance. Inspect it with:

kubectl get prometheus -A -o yaml | yq '.items[].spec.serviceMonitorSelector'

If you do not see your KEDA ServiceMonitor showing up as a target in Prometheus, the selector labels almost certainly do not match.

Alternative: PodMonitors

If you prefer scraping pods directly (or you don't expose Services for the metrics ports), enable podMonitor instead:

# <tenant_path>/config/keda/all.yaml
prometheus:
  operator:
    enabled: true
    podMonitor:
      enabled: true
      interval: 30s
      additionalLabels:
        release: kube-prometheus-stack
  metricServer:
    enabled: true
    podMonitor:
      enabled: true
      additionalLabels:
        release: kube-prometheus-stack

Ship Prometheus alerting rules

The KEDA chart can also create a PrometheusRule resource for you. Define alerts under prometheus.operator.prometheusRules:

# <tenant_path>/config/keda/all.yaml
prometheus:
  operator:
    enabled: true
    prometheusRules:
      enabled: true
      additionalLabels:
        release: kube-prometheus-stack
      alerts:
        - alert: KEDAScalerErrors
          annotations:
            summary: "KEDA scaler {{ $labels.scaler }} is erroring"
            description: "ScaledObject {{ $labels.scaledObject }} is hitting errors on scaler {{ $labels.scaler }}"
          expr: sum by (scaledObject, scaler) (rate(keda_scaler_detail_errors_total[5m])) > 0
          for: 5m
          labels:
            severity: warning

        - alert: KEDAOperatorDown
          annotations:
            summary: "KEDA operator is down"
            description: "No KEDA operator instance has been scraped for 5 minutes"
          expr: absent(up{job=~".*keda.*operator.*"} == 1)
          for: 5m
          labels:
            severity: critical

Verify Prometheus is scraping KEDA

# ServiceMonitors created by the chart
kubectl get servicemonitor -n keda-system

# Confirm Prometheus selected them as scrape targets
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
# Browse to http://localhost:9090/targets and search for "keda"

# Sanity-check a metric
curl -s http://localhost:9090/api/v1/query?query=keda_build_info | jq .

Pre-built Grafana dashboard

KEDA publishes a community Grafana dashboard alongside the project. Import keda-dashboard.json and point it at your KEDA-scraping Prometheus datasource.

Core CRDs

KEDA installs four Custom Resource Definitions:

CRD	Scope	Purpose
`ScaledObject`	Namespace	Maps event sources to Deployments / StatefulSets
`ScaledJob`	Namespace	Spawns and cleans up Kubernetes Jobs on events
`TriggerAuthentication`	Namespace	Stores credentials for event sources
`ClusterTriggerAuthentication`	Cluster	Same as above, but available cluster-wide

Scale-to-Zero Configuration

KEDA uniquely allows minReplicaCount: 0. When no events are present, KEDA drains the deployment entirely. When an event arrives, KEDA scales from 0→1 (activation phase), then hands over to HPA for 1→N.

Key ScaledObject Fields

Field	Description
`minReplicaCount: 0`	Enables full scale-to-zero
`cooldownPeriod`	Seconds to wait after last event before scaling to zero (default: 300)
`pollingInterval`	How often KEDA checks the event source (default: 30s)
`activationLagThreshold`	Minimum event count required to leave zero (activation phase)

Example — Kafka consumer with scale-to-zero

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-consumer-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: kafka-consumer
    kind: Deployment
  minReplicaCount: 0        # Scale fully to zero when idle
  maxReplicaCount: 20
  cooldownPeriod: 300       # Wait 5 mins of silence before scaling to zero
  pollingInterval: 30
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka-bootstrap:9092
        consumerGroup: my-consumer-group
        topic: my-topic
        lagThreshold: "50"
        activationLagThreshold: "5"  # Need >5 messages to leave zero
      authenticationRef:
        name: kafka-trigger-auth

HPA-Backed Scale-Up Configuration

Once a workload is active (replica count ≥ 1), the HPA controller takes over scaling from 1→N. You can control scale-up and scale-down velocity through the advanced.horizontalPodAutoscalerConfig.behavior block.

Recommended Pattern — Aggressive up, conservative down

spec:
  advanced:
    restoreToOriginalReplicaCount: true
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 0       # React immediately
          policies:
            - type: Percent
              value: 100                      # Double replicas every 15s if needed
              periodSeconds: 15
        scaleDown:
          stabilizationWindowSeconds: 300     # Wait 5 mins before reducing
          policies:
            - type: Percent
              value: 50                       # Remove at most 50% per minute
              periodSeconds: 60

Cron + Event Hybrid — Pre-scale for known peaks

triggers:
  - type: cron
    metadata:
      timezone: Europe/London
      start: "0 8 * * 1-5"      # Scale up weekday mornings
      end: "0 20 * * 1-5"
      desiredReplicas: "10"
  - type: kafka                  # Also respond to real queue depth
    metadata:
      bootstrapServers: kafka-bootstrap:9092
      consumerGroup: my-consumer-group
      topic: my-topic
      lagThreshold: "100"

When multiple triggers are defined, the highest desired replica count wins.

Prometheus-based scale-up (HTTP rate)

triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-server.monitoring.svc:9090
      metricName: http_requests_per_second
      query: sum(rate(http_requests_total[1m]))
      threshold: "100"

Authentication (TriggerAuthentication)

Use TriggerAuthentication to securely supply credentials to scalers. This is preferred over embedding credentials directly in the ScaledObject.

Secret-based auth

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: kafka-trigger-auth
  namespace: default
spec:
  secretTargetRef:
    - parameter: sasl
      name: kafka-secrets
      key: sasl-mechanism
    - parameter: username
      name: kafka-secrets
      key: username
    - parameter: password
      name: kafka-secrets
      key: password

Reference in your ScaledObject:

triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka-bootstrap:9092
      consumerGroup: my-consumer-group
      topic: my-topic
      lagThreshold: "100"
    authenticationRef:
      name: kafka-trigger-auth

Cloud pod identity (AWS / Azure / GCP)

KEDA natively supports workload identity federation — no secrets required:

spec:
  podIdentity:
    provider: aws        # or: azure, gcp, azure-workload

For cluster-wide shared credentials use ClusterTriggerAuthentication with kind: ClusterTriggerAuthentication in the authenticationRef.

Example Configurations

RabbitMQ Queue Worker

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-worker
  namespace: default
spec:
  scaleTargetRef:
    name: rabbitmq-worker
  minReplicaCount: 0
  maxReplicaCount: 30
  cooldownPeriod: 300
  pollingInterval: 10
  triggers:
    - type: rabbitmq
      metadata:
        queueName: task-queue
        mode: QueueLength
        value: "10"
        activationValue: "1"
      authenticationRef:
        name: rabbitmq-auth

AWS SQS Batch Processor

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: sqs-batch-processor
  namespace: default
spec:
  jobTargetRef:
    parallelism: 1
    completions: 1
    backoffLimit: 3
    template:
      spec:
        containers:
          - name: processor
            image: my-batch-processor:latest
        restartPolicy: Never
  pollingInterval: 30
  minReplicaCount: 0
  maxReplicaCount: 50
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  triggers:
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.eu-west-1.amazonaws.com/123456789/my-queue
        queueLength: "5"
        awsRegion: eu-west-1
      authenticationRef:
        name: aws-pod-identity-auth

Multi-trigger with ScalingModifiers

spec:
  advanced:
    scalingModifiers:
      formula: "kafka_lag / 100 + prometheus_rps / 50"
      target: "10"
      activationTarget: "1"
      metricType: AverageValue
  triggers:
    - name: kafka_lag
      type: kafka
      metadata:
        topic: events
        lagThreshold: "1"
    - name: prometheus_rps
      type: prometheus
      metadata:
        query: sum(rate(http_requests_total[1m]))
        threshold: "1"

CPU-Based Scaling with KEDA

KEDA supports CPU-based scaling through its built-in cpu scaler, which proxies to the standard Kubernetes Metrics Server (not the KEDA external metrics adapter). This behaves similarly to plain HPA CPU scaling, but with the added benefit of being expressed as a ScaledObject — meaning you can combine it with other KEDA triggers in a single resource.

Prerequisite

The standard Kubernetes Metrics Server must be installed. KEDA's CPU scaler reads from metrics.k8s.io, not from KEDA's own external metrics endpoint.

Note on scale-to-zero

CPU-based triggers cannot scale a workload to zero because if there are no pods, there is no CPU metric to read. If you need scale-to-zero, combine the CPU trigger with a second trigger (e.g. cron or queue-depth) that can drive replicas to zero.

How it works

The cpu scaler targets an average CPU utilisation percentage across all pods in the deployment. When average utilisation exceeds the threshold, KEDA instructs the HPA to add replicas. When it drops, replicas are reduced — down to minReplicaCount (minimum 1 for CPU-only triggers).

Example — Scale on CPU utilisation

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: my-api
    kind: Deployment
  minReplicaCount: 2        # Cannot be 0 with CPU-only trigger
  maxReplicaCount: 20
  pollingInterval: 15       # Check every 15 seconds
  cooldownPeriod: 60
  triggers:
    - type: cpu
      metricType: Utilization   # AverageValue or Utilization
      metadata:
        value: "60"             # Target 60% average CPU utilisation

Example — CPU trigger combined with queue depth (recommended pattern)

Combining CPU with a queue trigger gives you reactive scale-up on CPU pressure and proactive scale-up on queue depth. The highest desired replica count from any trigger wins.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-worker-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: api-worker
    kind: Deployment
  minReplicaCount: 1
  maxReplicaCount: 30
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
            - type: Percent
              value: 100
              periodSeconds: 15
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
            - type: Percent
              value: 25
              periodSeconds: 60
  triggers:
    - type: cpu
      metricType: Utilization
      metadata:
        value: "70"             # Scale up if avg CPU exceeds 70%
    - type: rabbitmq
      metadata:
        queueName: work-queue
        mode: QueueLength
        value: "20"             # Also scale up if queue exceeds 20 messages
      authenticationRef:
        name: rabbitmq-auth

If you combine cpu with cron (for example scale to zero on weekends while scaling on CPU during the week), both triggers still feed one KEDA-managed HPA; Kubernetes takes the maximum of the replica counts each metric implies. There is no separate CPU autoscaler “arguing” with cron. For behaviour across inactive windows, minReplicaCount: 0, and strict off-hours policies, see Cron + CPU: schedule vs load.

`metricType` options

metricType	Behaviour
`Utilization`	Target average CPU as a percentage of the pod's CPU request (e.g. `"60"` = 60%)
`AverageValue`	Target an absolute average CPU value in millicores (e.g. `"500m"`)

Memory scaling

The memory scaler works identically to cpu, substituting memory utilisation:

triggers:
  - type: memory
    metricType: Utilization
    metadata:
      value: "75"             # Scale if average memory utilisation exceeds 75%

Time-Based Scaling — Scale to Zero on a Schedule

The KEDA cron trigger scales workloads based on a time schedule using standard cron expressions. This is the correct approach for:

Scaling dev/staging environments to zero overnight and at weekends
Pre-scaling production services ahead of known peak hours
Hard-stopping batch processors outside of business hours

The cron trigger works by expressing a desired replica count for a given time window. When the window opens, KEDA scales to desiredReplicas. When it closes, KEDA reverts to minReplicaCount — which can be 0.

Cron expression format

"minute hour day-of-month month day-of-week"

Examples:
"0 18 * * *"      → 6:00 PM every day
"0 8 * * 1-5"     → 8:00 AM Monday–Friday
"0 0 * * 6,0"     → Midnight Saturday and Sunday
"30 7 * * 1-5"    → 7:30 AM weekdays

Example — Scale to zero between 6PM and 8AM every day

This is the most common overnight cost-saving pattern. The workload runs during business hours and is fully drained outside of them.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: overnight-scale-to-zero
  namespace: default
spec:
  scaleTargetRef:
    name: my-service
    kind: Deployment
  minReplicaCount: 0        # Allow full scale-to-zero
  maxReplicaCount: 10
  triggers:
    # Window 1: business hours — scale UP to 3 replicas
    - type: cron
      metadata:
        timezone: Europe/London   # Always specify — defaults to UTC otherwise
        start: "0 8 * * *"        # 8:00 AM every day
        end: "0 18 * * *"         # 6:00 PM every day
        desiredReplicas: "3"

    # Window 2: outside business hours — scale to zero
    # (implicit — minReplicaCount: 0 applies when no cron window is active)

How the off-window works

You only need to define the active window. When no cron trigger is firing and there are no other active triggers, KEDA scales down to minReplicaCount. Setting minReplicaCount: 0 means the workload reaches zero automatically outside the defined window. You do not need a second cron entry for the off period.

Example — Weekdays only, with weekend scale-to-zero

triggers:
  - type: cron
    metadata:
      timezone: Europe/London
      start: "0 8 * * 1-5"     # 8 AM Monday–Friday
      end: "0 18 * * 1-5"      # 6 PM Monday–Friday
      desiredReplicas: "5"

At 6 PM Friday, KEDA scales to zero. At 8 AM Monday, it scales back to 5. The weekend is fully zero-cost.

Example — Staged scaling across the day (peak hours)

Use multiple cron triggers to express different replica targets throughout the day. KEDA takes the highest desired count from any currently-active trigger.

triggers:
  # Off-peak morning ramp
  - type: cron
    metadata:
      timezone: Europe/London
      start: "0 7 * * 1-5"
      end: "0 9 * * 1-5"
      desiredReplicas: "3"

  # Peak hours
  - type: cron
    metadata:
      timezone: Europe/London
      start: "0 9 * * 1-5"
      end: "0 17 * * 1-5"
      desiredReplicas: "10"

  # Evening wind-down
  - type: cron
    metadata:
      timezone: Europe/London
      start: "0 17 * * 1-5"
      end: "0 20 * * 1-5"
      desiredReplicas: "3"

  # Overnight and weekends → minReplicaCount: 0 applies (scale to zero)

Example — Cron + event trigger hybrid (production-safe pattern)

For production services, you typically want a guaranteed minimum during business hours and the ability to scale beyond that on real traffic. Combining cron with a metric trigger achieves this:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: production-hybrid-scaler
  namespace: production
spec:
  scaleTargetRef:
    name: api-service
    kind: Deployment
  minReplicaCount: 0          # Allow zero outside business hours
  maxReplicaCount: 50
  triggers:
    # Guarantee a baseline during business hours
    - type: cron
      metadata:
        timezone: Europe/London
        start: "0 8 * * 1-5"
        end: "0 18 * * 1-5"
        desiredReplicas: "5"   # Minimum 5 replicas during the day

    # Scale further based on actual queue depth
    - type: rabbitmq
      metadata:
        queueName: api-requests
        mode: QueueLength
        value: "10"            # 1 replica per 10 queued requests
      authenticationRef:
        name: rabbitmq-auth

    # Scale further based on CPU if traffic spikes
    - type: cpu
      metricType: Utilization
      metadata:
        value: "65"

Result: Outside business hours with an empty queue and low CPU, the service scales to zero. During business hours, it holds at least 5 replicas. If traffic spikes beyond what 5 replicas can handle, queue depth and CPU triggers push it higher — up to 50.

Cron + CPU: schedule vs load

A common question is whether a weekend / overnight “off” schedule (cron + minReplicaCount: 0) will fight CPU-based scaling.

They do not run as two separate autoscalers. Cron and CPU are triggers on the same ScaledObject, and KEDA still creates one managed HPA for that object. Each trigger contributes a metric; the HPA evaluates the replica count implied by each metric and adopts the maximum — the same “highest desired replica count wins” rule described for other multi-trigger setups (HPA-backed scale-up, CPU + queue example). That is coordination, not two controllers overwriting each other. The case that does cause fights is a second, manually created HPA on the same workload (Known constraints).

How to read it operationally

Situation	What usually happens
Cron window active (`desiredReplicas` set), traffic is low	Cron sets a floor at least that high; CPU is satisfied or also drives replicas — you get `max(cron, cpu)` capped by `maxReplicaCount`.
Cron window active, traffic is high	CPU can push replicas above the cron baseline up to `maxReplicaCount`.
Cron window inactive, `minReplicaCount: 0`, no pods	There is no CPU utilisation to measure; the workload can stay at zero unless another trigger can activate scale-from-zero (for example queue depth).
Cron window inactive, but pods are running	CPU (and any other triggers) can still recommend replicas. Autoscaling alone cannot enforce a “hard” blackout if something keeps pods alive—use pausing (`autoscaling.keda.sh/paused`), ingress or policy controls, or remove other scale-from-zero triggers if you need a strict off window.

Practical pattern: Use cron for a time-based baseline during known hours (for example weekdays 08:00–18:00) and cpu for burst scaling on top. Use minReplicaCount and any non-CPU triggers to define behaviour outside those windows (for example full scale-to-zero on nights and weekends).

Cron timezone reference

Always specify timezone explicitly. KEDA defaults to UTC if omitted, which will misfire in any non-UTC environment.

Region	Timezone string
UK / Ireland	`Europe/London`
Central Europe	`Europe/Berlin` / `Europe/Paris`
US Eastern	`America/New_York`
US Pacific	`America/Los_Angeles`
India	`Asia/Kolkata`
Singapore / HKT	`Asia/Singapore`

Valid timezone strings follow the IANA tz database format. Full list: List of tz database time zones.

Vertical Scaling — Pod Resource Requests (VPA + KEDA)

Important distinction

KEDA is a horizontal scaler — it controls the number of pods, not their size. Changing pod CPU/memory requests is vertical scaling, which is the domain of the Vertical Pod Autoscaler (VPA). These are complementary tools, not alternatives.

Tool	What it changes
KEDA / HPA	Number of replicas
VPA	CPU and memory requests per pod
Both together	Right-sized pods at the right replica count

Why you might want both

Without VPA, pod resource requests are static — set at deploy time and never adjusted. If you over-provision requests (common, to avoid OOMKills), you pay for headroom on every replica that KEDA spins up. VPA continuously analyses actual usage and recommends (or applies) tighter requests, meaning each KEDA-spawned replica costs less.

How to run VPA alongside KEDA safely

The key risk is that VPA in Auto mode restarts pods to apply new resource sizes. This can conflict with KEDA's scaling decisions, causing unexpected pod churn. The safe pattern is to run VPA in Off mode (recommendations only) and apply changes during planned maintenance windows or through a GitOps pipeline.

KEDA  → controls replica count (horizontal)
VPA   → recommends resource sizes (vertical, apply manually or via pipeline)

Installing VPA

# Install VPA from the Kubernetes autoscaler repo
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

# Verify
kubectl get pods -n kube-system | grep vpa

VPA in recommendation-only mode (safe with KEDA)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-service-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service          # Must match the deployment KEDA is scaling
  updatePolicy:
    updateMode: "Off"         # Recommendations only — no automatic restarts
  resourcePolicy:
    containerPolicies:
      - containerName: my-service
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 4Gi
        controlledResources: [cpu, memory]

Reading VPA recommendations

kubectl describe vpa my-service-vpa -n default

Output excerpt:

Recommendation:
  Container Recommendations:
    Container Name: my-service
    Lower Bound:
      cpu:    180m
      memory: 210Mi
    Target:                     ← apply this to your deployment
      cpu:    350m
      memory: 410Mi
    Upper Bound:
      cpu:    1200m
      memory: 1500Mi

Apply the Target values to your Deployment's resource requests. KEDA will then scale the right-sized pods horizontally.

Complete KEDA + VPA pattern

# 1. Deployment — resource requests informed by VPA recommendations
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-service
  namespace: default
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: my-service
          image: my-service:latest
          resources:
            requests:
              cpu: 350m       # From VPA Target recommendation
              memory: 410Mi
            limits:
              cpu: 1000m
              memory: 1Gi
---
# 2. KEDA ScaledObject — controls replica count
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: my-service-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: my-service
  minReplicaCount: 0
  maxReplicaCount: 20
  triggers:
    - type: cron
      metadata:
        timezone: Europe/London
        start: "0 8 * * 1-5"
        end: "0 18 * * 1-5"
        desiredReplicas: "3"
    - type: rabbitmq
      metadata:
        queueName: work-queue
        mode: QueueLength
        value: "10"
      authenticationRef:
        name: rabbitmq-auth
---
# 3. VPA — monitors and recommends right-sized requests
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-service-vpa
  namespace: default
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service
  updatePolicy:
    updateMode: "Off"         # Safe — recommendations only, no restarts

VPA + KEDA constraints

Constraint	Detail
Don't use VPA `Auto` mode with KEDA	VPA `Auto` restarts pods to resize them, which disrupts KEDA-managed scaling and can cause replica count oscillation
VPA and HPA cannot both control CPU/memory on the same deployment	If VPA manages CPU requests, do not use a KEDA CPU trigger on the same deployment — they will conflict
VPA needs history to be accurate	VPA recommendations improve over time; give it at least a few days of traffic data before applying changes
`Off` mode requires manual application	You must read the VPA recommendation and update the Deployment manifest yourself (or via pipeline)

Operational Commands & Debugging

Status checks

# KEDA component health
kubectl get pods -n keda-system

# All ScaledObjects across the cluster
kubectl get scaledobject -A

# Detailed state of a specific ScaledObject
kubectl describe scaledobject <name> -n <namespace>

# HPA objects created by KEDA
kubectl get hpa -A

# All ScaledJobs
kubectl get scaledjob -A

# TriggerAuthentication resources
kubectl get triggerauthentication -A

Logs

# KEDA operator (scaling decisions, activation events)
kubectl logs -n keda-system -l app=keda-operator -f

# KEDA metrics server (metric fetch errors)
kubectl logs -n keda-system -l app=keda-operator-metrics-apiserver -f

Metrics inspection

# List all external metrics exposed by KEDA
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"

# Query a specific metric value
kubectl get --raw \
  "/apis/external.metrics.k8s.io/v1beta1/namespaces/<namespace>/<metric-name>?labelSelector=scaledobject.keda.sh/name=<scaledobject-name>"

Pausing autoscaling (maintenance)

# Pause a ScaledObject
kubectl annotate scaledobject <name> autoscaling.keda.sh/paused=true

# Resume
kubectl annotate scaledobject <name> autoscaling.keda.sh/paused- --overwrite

ScaledObject status fields to watch

READY    — KEDA is successfully reading the event source
ACTIVE   — at least one trigger is above its activation threshold
FALLBACK — KEDA cannot reach the event source; using fallback replica count
PAUSED   — autoscaling is suspended

Known Constraints & Gotchas

Constraint	Detail
One external metrics adapter only	KEDA must be the sole implementor of `external.metrics.k8s.io`. Running another adapter alongside it will break metric resolution.
Don't mix KEDA + manual HPA	Never create a separate HPA targeting the same Deployment as a ScaledObject. KEDA manages the HPA internally — a second HPA will conflict and cause erratic scaling.
CPU/memory scalers still need standard Metrics Server	KEDA's CPU and memory triggers proxy to `metrics.k8s.io`, not `external.metrics.k8s.io`. Ensure standard Metrics Server is installed.
Cold-start latency	Scaling from 0→1 incurs pod scheduling and startup time. For latency-sensitive workloads consider `minReplicaCount: 1`.
`cooldownPeriod` only applies to 0-scale	The cooldown period only governs the transition to zero replicas. Scale-down between `n` and `m` (where both ≥ 1) is controlled by the HPA stabilisation window.
Resource quotas	KEDA can scale rapidly. Ensure namespace resource quotas are defined to prevent unexpected overconsumption.
`ScaledJob` has no HPA	Unlike `ScaledObject`, `ScaledJob` does not create an HPA. KEDA's controller manages job parallelism directly.

When to Use KEDA vs Plain HPA

Is your scaling signal external to the cluster?
(queue depth, stream lag, cloud service metrics)
        │
        ├── YES ──► Use KEDA
        │
        └── NO
              │
              ▼
        Do you need scale-to-zero?
              │
              ├── YES ──► Use KEDA
              │
              └── NO
                    │
                    ▼
              Is CPU/memory a reliable proxy for your load?
                    │
                    ├── YES ──► Plain HPA is sufficient
                    │
                    └── NO ──► Use KEDA with Prometheus or custom metric trigger

Use KEDA for:

Queue-based workers (RabbitMQ, SQS, Kafka)
Bursty or intermittent batch jobs
Event-driven microservices
Dev/staging environments (scale to zero saves cost)
Any workload where CPU is a lagging or irrelevant indicator

Stick with plain HPA for:

Stateless HTTP APIs where CPU tracks load well
Gradual, predictable traffic growth
Teams wanting minimal cluster complexity

Based on KEDA 2.19 — see the upstream KEDA documentation for the authoritative reference.