How Functions and Container App Jobs scale

Yes, both auto-scale

The short answer to "do they scale automatically?" is yes, both of them. You don't run a daemon that watches a queue and starts more instances. You don't define a CPU threshold and an autoscaler. You declare a trigger, set a couple of bounds, and the platform handles the rest - bringing capacity up as work arrives, taking it back down when the work fades.

What's worth understanding is how they auto-scale - because the unit of scaling, the signal that drives it, and the speed of the response are different on each side. Two services that both "scale automatically" can produce wildly different runtime behaviour for the same workload, and the difference is almost entirely in the machinery below.

The scale signal

Every autoscaler needs a signal that says "there is more work than there is capacity". On Functions and Jobs the signal comes from the same family of triggers - HTTP, queues, schedules, event hubs - but the way the platform reads it is different.

For Azure Functions, the signal is read by the scale controller, a managed component that watches your function's bindings on your behalf. It samples the trigger source (queue depth, event count, request rate) at a regular cadence and decides whether to add or remove function instances. You don't see it, you don't configure it; it is part of the runtime.

For Container App Jobs, the signal is read by KEDA - the same scaler framework Container Apps uses across the platform. You declare a scale rule (which KEDA scaler to use, what its metadata is, how often to poll) and the platform asks KEDA "how much pending work do you see?" on every poll interval. The answer becomes the number of executions the platform wants running.

Either way, you describe what counts as work and the platform figures out how much capacity to bring. That is the part that earns the word "automatic".

How Functions scales

Functions scales by adding function instances. Each instance is a host process that loads your function code and is ready to handle invocations.

On the Consumption plan, the unit is most visible. Instances are created on demand and disappear when idle; you pay per execution and per GB-second. On the Premium and Dedicated plans, you can keep instances always-ready (so cold starts disappear), and on Premium you can also pre-warm extra ones in front of the live demand. The decision of when to add an instance is still the scale controller's; the plan changes what "always there" looks like.

There is one more knob worth knowing about: target-based scaling. For supported triggers (queues, event hubs, blob) the runtime lets the trigger declare a "messages per worker" target - instead of stepping up one instance at a time, the controller can do the math and ask for the right number of instances in one go. For a queue that suddenly has 5,000 messages and a target of 32 per worker, that means asking for ~150 workers in a single scale event instead of climbing there over several ticks.

            JSON
            host.json · scaling shape (illustrative)
          

{
  "version": "2.0",
  "functionTimeout": "00:05:00",    # per-invocation cap
  "concurrency": {
    "dynamicConcurrencyEnabled": true  # let the host pick
  },
  "extensions": {
    "queues": {
      "batchSize": 32,                # messages per invocation
      "maxPollingInterval": "00:00:02"
    }
  }
}
          

The runtime knobs are conservative on purpose - the defaults are usually right. What you mostly control is the plan (Consumption / Premium / Dedicated), the per-trigger batch size, and a maximum instance count if you want to cap fan-out.

How Container App Jobs scale

Container App Jobs scale by adding executions - or, inside one execution, replicas. The terms matter because they bill and behave differently.

An execution is one run of the job from "platform decided to start it" to "the work is done". For an event-driven job, the platform's scaler polls your KEDA rule on its pollingInterval, looks at the answer, and starts additional executions up to maxExecutions. When the queue drains, no more executions are started; the running ones finish their work and exit naturally.

Parallelism is the second layer. Each execution can be configured to run multiple replicas of the container in parallel (parallelism), all sharing the same input source. This is how a job fans out a single execution across many workers. The execution as a whole completes when replicaCompletionCount replicas have exited successfully.

            YAML
            queue-worker.yaml · event-trigger config
          

configuration:
  triggerType: Event
  replicaTimeout: 1800             # max seconds per replica
  eventTriggerConfig:
    pollingInterval: 30             # how often KEDA polls
    minExecutions: 0                # scale to zero when idle
    maxExecutions: 10               # cap concurrent executions
    parallelism: 4                  # replicas per execution
    replicaCompletionCount: 1       # first replica wins
    scale:
      rules:
        - name: sb-rule
          type: azure-servicebus
          metadata:
            queueName: work-items
            messageCount: "1"          # 1 msg = 1 unit of work
          

The mental model: one job description, two levers. maxExecutions controls how many container runs can be in flight at once; parallelism controls how big each one is. For a queue-driven worker, you mostly tune maxExecutions and let parallelism: 1 keep things simple - one execution per message, scaled by the KEDA poll. For a fan-out batch job you crank parallelism and leave maxExecutions: 1.

Scaling down to zero

Both services can scale to zero when nothing is happening - no replicas, no charge. What "back from zero" feels like is different.

Functions on Consumption wakes up in hundreds of milliseconds to a couple of seconds. The host is already managed elsewhere; what's cold is a worker process for your code. This is fast enough for queue-driven and HTTP work in most cases, and noticeable for user-facing sub-second requests. Premium plans avoid this entirely by keeping always-ready instances warm.

Container App Jobs start a fresh container per execution. From zero that means provisioning a replica, pulling the image (if it isn't cached on the node), and waiting for the container to boot - seconds to tens of seconds. For work that runs for minutes or hours, that's a rounding error. For sub-second work, it would be unacceptable - which is why Jobs is not the right tool for sub-second work.

If cold start matters for your workload, the answer is different on each side: keep one Premium instance warm on Functions, or stay on Container App services (with minReplicas: 1) for the long-running side and use Jobs only where seconds-of-cold-start is fine.

Concurrency inside one worker

"How many things does one worker do at once" is the other half of the scaling story. Adding workers is one knob; how much each worker handles in parallel is the other.

A Function instance can run many invocations concurrently inside the same host. The maximum is per-language and per-trigger (and is often configurable - maxConcurrentCalls, batchSize, dynamic concurrency). For an in-process .NET Function on a queue trigger, the default is in the dozens; for Python, it's lower out of the box. Tuning this changes whether the scale controller needs to add another instance or whether your existing instance can soak up the load.

A Container App Job replica does whatever the container's main process does. If your container processes one message at a time, that is the concurrency - one. If it pulls a batch of N and processes them in goroutines or async tasks, the concurrency is N. The platform doesn't peek inside the container; you decide what one execution does, and the platform decides how many executions to run.

The practical effect: on Functions, fewer instances each doing more work is often cheaper than many instances each doing a little. On Jobs, that lever lives inside your code - the platform just gives you executions, and what each one does is up to you.

Limits worth knowing

Both services have ceilings. You usually don't hit them, but knowing where they are saves a surprise.

Axis

Azure Functions

Container App Job

Max workers

Plan-bound: Consumption caps in the hundreds of instances per app; Premium and Dedicated higher.

Set per-job via maxExecutions; bounded by environment quotas.

Max time per run

5 min default on Consumption (configurable to 10); up to 60 min on Premium / Dedicated.

Set per-replica via replicaTimeout (seconds); hours-scale is normal.

Poll interval

Implicit, managed by the scale controller per trigger.

Explicit: pollingInterval in seconds on the event trigger.

Where to look

Azure Monitor logs + Application Insights for the host and your code.

Log Analytics per execution + the execution record's exit code.

None of these are hard product limits in a "you'll hit them on day one" sense - they're the levers most teams end up adjusting once. The two worth setting deliberately are max workers (so a runaway scaler doesn't drain a downstream system) and max time per run (so a hung process doesn't bill forever).

A worked burst

Imagine 10,000 messages land in a Service Bus queue over the next minute. Both services would scale up; here is what each would do, step by step.

FUNCTIONS

Scale controller samples the queue, sees ~10,000 messages pending. With target-based scaling and a target of 32 messages per worker, it asks the platform for ~150 worker instances in one go. Instances start cold; the first ones available begin pulling batches.

JOB

KEDA polls the rule on its pollingInterval (say, every 30s), sees pending work, asks for executions. The platform starts up to maxExecutions container runs - say 10 - and lets them pull messages off the queue themselves.

FUNCTIONS

Each worker invokes the function once per batch of 32. With functionTimeout at a few minutes and per-message work in the tens of milliseconds, the queue drains in single-digit minutes. The scale controller sees the queue depth fall and stops asking for more workers.

JOB

Each execution's container processes messages until its inner batch is done or its replicaTimeout approaches, then exits cleanly. KEDA polls again, sees less pending work, requests fewer executions. The queue drains over the next few minutes.

FUNCTIONS

Workers go idle. The host keeps them around briefly in case more work arrives, then lets them shut down. Bill: many short invocations, each billed in GB-seconds against a free grant.

JOB

All executions finish and the platform returns to minExecutions: 0. Bill: a handful of multi-second container runs, charged in vCPU-seconds and GiB-seconds.

Same workload, two different shapes of response. Functions reaches more workers faster and bills per invocation; Jobs reaches fewer workers, each doing more, and bills per second of container runtime. Neither one is wrong - they are the natural shapes for sub-minute and multi-minute work respectively.

AUTO

Really is

You declare a trigger and a couple of bounds. Both services handle the "when to add capacity" decision on your behalf.

UNIT

Instance or execution

Functions scales by adding host instances. Jobs scales by adding executions, with parallelism as a second lever inside each one.

ZERO

Costs nothing

Both can sit at zero replicas when idle. Cold start is the price - tens of milliseconds for Functions, seconds for Jobs.

SHAPE

Matches the work

Sub-minute work scales beautifully on Functions. Multi-minute work scales beautifully on Jobs. The unit of scaling is the unit of work.

How Functions and Jobs scale.

Yes, both auto-scale

The scale signal

How Functions scales

How Container App Jobs scale

Scaling down to zero

Concurrency inside one worker

Limits worth knowing

A worked burst

Really is

Instance or execution

Costs nothing

Matches the work

References

How Functions and Jobs scale.

Yes, both auto-scale

The scale signal

How Functions scales

How Container App Jobs scale

Scaling down to zero

Concurrency inside one worker

Limits worth knowing

A worked burst

Really is

Instance or execution

Costs nothing

Matches the work

References

Keep reading.

What is an Azure Container App Job?

Azure Function vs Container App Job

Container App vs Container App Job