Distributed Execution

Run large or long-running jobs reliably with batches and loops

Distributed Execution Guide

Most action scripts you write run a small, predictable amount of work and finish in a fraction of a second. But sometimes you need to do something bigger:

  • Send a personalised welcome email to every user who signed up in the last week.
  • Sync the full product catalogue from a partner's API into your database.
  • Generate a PDF for each of 50,000 customers when the monthly invoicing job runs.

These jobs share a property that breaks the simple action model: they take long enough that you can't reasonably wait inside a single script. They also tend to be brittle — if a script crashes halfway through, you lose your place. If the server restarts, you lose everything.

Distributed Execution is Appivo's solution for these cases. It lets you express jobs like the ones above in a few lines of JavaScript, and it takes care of the awkward parts:

  • splitting work into many small action invocations,
  • running them in parallel across all your worker instances,
  • tracking which ones finished and which ones didn't,
  • tolerating worker crashes, server restarts, and message redelivery,
  • running a wrap-up action when everything is done.

You don't need to understand any distributed-systems machinery to use it. This guide walks through everything you need.

When To Use This

Reach for distributed execution whenever you have a task that:

  • Operates on a large set of records (hundreds, thousands, or millions).
  • Takes longer than a few seconds in total.
  • Can be broken into smaller, independent units of work.
  • Calls slow external services (APIs, mail servers, payment gateways).
  • Should keep going even if a worker crashes or the server restarts.

If your script handles a single record, or finishes in well under a second, a regular SCRIPT action is the right choice. Distributed execution is for the heavy stuff.

Two Patterns: Batches and Loops

There are two shapes of "big job" you can express. They solve different problems.

PatternUse WhenExample
Batch (createBatch)You have a known list of items to process the same way, in parallel"Send an email to each user in this list"
Loop (createLoop)You have a sequential job whose endpoint you don't know up-front"Keep paginating through the partner's API until they say there's no more data"

If you can't decide: try a batch first. They're more common and easier to reason about. Loops are the right answer when "the next thing depends on what happened in the last thing."

A Non-Negotiable Rule Before You Start

Before showing the first example, there is one rule that applies to every action script in the platform:

Always Close Query Results

context.query() returns a cursor that holds an open database transaction. It must be closed, or you risk leaking transactions and exhausting the connection pool. The standard idiom is:

let result = context.query("select u from user u where u.active = true");
try {
    while (result.hasNext()) {
        let user = result.next();
        // ... use it
    }
} finally {
    result.close();
}

When you use a batch in the same script, the finally block also seals the batch:

let users = context.query("select u from user u where u.invited = false");
let batch = context.createBatch({ /* ... */ });
try {
    while (users.hasNext()) {
        batch.add("sendOneInvitation", users.next(), null);
    }
} finally {
    users.close();
    batch.seal();
}

Both calls in the finally block are unconditional and idempotent — closing a closed cursor and sealing a sealed batch are both safe no-ops. So even on errors, both resources are cleaned up properly.

Get into this habit early. It's the same pattern as try-finally around any resource handle.


Your First Batch

Here is the simplest possible example. We'll send an invitation email to every user who hasn't been invited yet.

// Action: "sendInvitations"

let users = context.query("select u from user u where u.invited = false");
let batch = context.createBatch({
    onSuccess: "invitationsCompleted",
    onError:   "invitationsFailed"
});
try {
    while (users.hasNext()) {
        batch.add("sendOneInvitation", users.next(), null);
    }
} finally {
    users.close();
    batch.seal();
}

That's it. Three things happen here:

  1. createBatch(...) registers a new batch with the platform. You give it the names of two follow-up actions: one to run when everything finishes successfully, and one to run if things go wrong.
  2. batch.add(...) schedules a single child action to run. Each call queues one invocation of sendOneInvitation with one user record bound to it. The children run on whichever worker instances are free — you don't have to pick.
  3. batch.seal() tells the platform "I'm done adding work — start watching for completion."

After seal() returns, this script is finished. The 8,000 (or however many) child invocations run in the background. The platform tracks how many succeed and how many fail. When the last one finishes, the platform automatically schedules invitationsCompleted and runs it.

The child action looks like any other action you'd write:

// Action: "sendOneInvitation"

let user = context.getRecord();   // the user record passed via batch.add
context.sendMail({
    to:      user.get("email"),
    subject: "You're invited!",
    body:    context.renderTemplate("invitation", user)
});
user.set("invited", true);
context.update(user);

You don't write any plumbing. The child doesn't even know it's part of a batch — it's just an action being run.

The completion action receives a summary in its input:

// Action: "invitationsCompleted"

context.log("Invitations sent: succeeded={}, failed={}, total={}",
    context.getInput("succeededJobs"),
    context.getInput("failedJobs"),
    context.getInput("totalJobs"));

That's the complete picture. Now let's look at what's actually in that summary, what happens when something goes wrong, and the more advanced options.


What the Finalizer Receives

The action you configure as onSuccess, onError, or onPartial is called the finalizer. It receives information from two distinct channels — knowing which channel a value comes from determines how you read it.

Channel 1: Platform-Supplied Input — context.getInput()

The platform always provides the following fields:

FieldDescription
executionIdA unique id for this batch. Useful for logging or correlation.
stateOne of COMPLETED, FAILED, PARTIAL, TIMED_OUT, CANCELLED.
totalJobsHow many children were added to the batch.
succeededJobsHow many returned successfully.
failedJobsHow many failed (after exhausting retries).
durationMsTime from seal() to completion.
reasonA short reason string — COMPLETED, FAILED, TIMED_OUT, etc.
resultsDatasetIdWhere per-child results are stored, if you asked for them.

Read these with context.getInput("name"). They tell the finalizer what happened — the outcome and the counters.

Channel 2: Caller-Supplied Parameters — context.getParameter()

This is the channel you control. Anything you put in the context: {...} option when creating the batch is forwarded to the finalizer as parameters and read with context.getParameter("name"). This is how you tell the finalizer why it was running, what it should do next, and anything else not already in the platform-supplied input.

let batch = context.createBatch({
    onSuccess: "invitationsCompleted",
    onError:   "invitationsFailed",
    context: {
        triggeredBy: context.user.username,
        campaignId:  campaign.get("id")
    }
});
// Action: "invitationsCompleted"
let succeeded   = context.getInput("succeededJobs");      // platform — what happened
let triggeredBy = context.getParameter("triggeredBy");    // caller — who triggered it
let campaignId  = context.getParameter("campaignId");

context.log("Campaign {} completed for {}: {} sent",
    campaignId, triggeredBy, succeeded);

Tip: Be generous with the context: map. It's small, it's free, and if the finalizer needs more state next month you don't have to plumb it through any other layer.

Common things worth forwarding:

  • The user / role that triggered the batch (triggeredBy)
  • A parent record id or job id for traceability
  • A label describing why this run started ("nightly-cron", "manual-button-click", "webhook-from-stripe")
  • Any timestamps, cutoffs, or bounds you used when selecting the children

What Is Not in the Finalizer's Input

The other batch options — failureThreshold, maxConcurrent, deadlineMs, resultsMode, etc. — are not automatically forwarded. This is deliberate: it keeps the finalizer's contract small and explicit. If a finalizer needs one of these values, copy it into the context: map yourself.


What "Succeeded" and "Failed" Mean

A child invocation succeeds if the action script finishes normally — it didn't throw, didn't call context.error(), didn't return an error result.

A child fails if either:

  • the action script throws an exception, or
  • the action explicitly returns a failure result.

Important: Transient failures are retried automatically. The platform retries failing children on a backoff schedule (around 5, 15, and 30 minutes). A child only counts as "failed" after all retries are exhausted. This means temporary problems — a flaky external API, a brief database hiccup — heal themselves without you doing anything.

This also means you generally shouldn't add your own retry logic inside a child action. Let the platform retry.


The Five Terminal States

A batch always ends in exactly one of five states. Most jobs only ever see COMPLETED, but understanding the others is useful for designing robust finalizers.

StateMeaningFinalizer Run
COMPLETEDEvery child succeeded.onSuccess
PARTIALSome succeeded, some failed, but the failure rate stayed below your threshold.onPartial (falls back to onSuccess)
FAILEDEvery child failed, OR the failure threshold was exceeded.onError
TIMED_OUTThe batch's deadline passed before everything finished.onError (with reason: "TIMED_OUT")
CANCELLEDSomebody (an admin, or your code) explicitly cancelled the batch.onError (with reason: "CANCELLED")

The finalizer always runs exactly once. Even if the worker that's supposed to run it crashes, the platform's safety nets republish the request until it succeeds.


Failure Thresholds: Circuit Breakers

By default a batch is only FAILED if every single child failed. That's usually too lenient. If you're sending 100,000 emails and 99,000 succeeded, calling that "completed" is fine — but if 60,000 failed, something is seriously wrong and you don't want to keep going.

Use failureThreshold to set a circuit-breaker:

let batch = context.createBatch({
    onSuccess:         "done",
    onError:           "somethingBroke",
    failureThreshold:  0.10,      // abort if more than 10% fail
    failureMinSamples: 50         // ignore until at least 50 children have failed
});

The failureMinSamples floor exists so the very first failure doesn't trip a low threshold. Without it, a 1% threshold would trip on the first failure (1/1 = 100%).

When the threshold trips, the batch flips to FAILED immediately and onError runs. Children already running in the background will finish, but no new ones are picked up.


Deadlines: Safety Nets

Every batch has a deadline. By default it's 24 hours from when you create the batch. If the deadline passes and the batch is still running, the platform forcibly transitions it to TIMED_OUT and runs onError.

This is the safety net for the rare case where something goes really wrong — a worker silently crashes mid-job, a message gets lost, etc. You don't have to do anything to make it work. It just runs in the background.

You can shorten or extend the deadline:

let batch = context.createBatch({
    onSuccess: "done",
    onError:   "timedOut",
    deadlineMs: Date.now() + 30 * 60 * 1000   // 30 minutes from now
});

Tip: Pick a deadline that's generous compared to how long the batch should take. If you expect 5,000 children at ~2s each with ~50 in flight at once, that's roughly 200 seconds in the perfect case — give it 30–60 minutes to be safe. The deadline is a backstop, not a target.


Concurrency Control: maxConcurrent

If you add() 50,000 children, the platform will happily try to run all 50,000 in parallel on whatever workers are available. That's usually fine — but sometimes you need to be gentler:

  • You're calling a third-party API that limits you to 10 requests per second.
  • The job involves expensive operations and you don't want to crowd out other work.
  • A downstream system can only handle so much load.

Use maxConcurrent to cap the number of children that run simultaneously:

let products = context.query("select p from product p");
let batch = context.createBatch({
    onSuccess:     "done",
    onError:       "failed",
    maxConcurrent: 10              // never more than 10 in flight at once
});
try {
    while (products.hasNext()) {
        batch.add("syncProductWithStripe", products.next(), null);
    }
} finally {
    products.close();
    batch.seal();
}

What happens here is subtle but important: each batch.add() call blocks until a slot is free. If you've already got 10 in flight, the 11th call to add() waits until one of the running children finishes. That's how the cap is enforced.

This means a producer that's much faster than the workers will naturally slow down to match. You don't need any flow-control logic of your own.


Per-Child Results: The Four Modes

A batch's counters are great, but sometimes you want to know what each child returned. For example: "show me which 47 invitations failed and why." There are four storage modes for child results, ranging from "free" to "expensive."

Pick deliberately — the cheapest and most expensive differ by orders of magnitude in storage cost.

COUNT (default)

Just the counters. The finalizer sees how many succeeded and how many failed, but no details. Use this for the typical case where you don't need to drill in.

context.createBatch({ onSuccess: "done" });
// results mode is COUNT by default

FAILURES_ONLY — Best for Debugging

The platform stores a record for each failed child — successes are not stored. Storage is bounded by your failure rate, which is usually small.

let batch = context.createBatch({
    onSuccess:       "invitationsCompleted",
    onError:         "somethingBroke",
    results:         "FAILURES_ONLY",
    resultsTtlHours: 48        // keep failure records for 48h, default 24h
});

In the finalizer, the failed records are available through the scratchpad:

// Action: "invitationsCompleted"

if (context.getInput("failedJobs") > 0) {
    let datasetId = context.getInput("resultsDatasetId");
    let failures = context.scratchpad.getAllRecords(datasetId);
    failures.forEach(function(record) {
        let data = record.getData();
        context.log("Invitation failed: outcome={}, attempt={}",
            data.get("outcome"), data.get("attempt"));
    });
}

The TTL is automatic — after resultsTtlHours hours the records vanish, so failure records don't pile up. Default is 24h; you can set anything from 1 to 168 hours.

PERSIST — Store Every Result

Same as FAILURES_ONLY but stores every child, succeeded or failed. Use this when you want to do something with the actual return values, not just count them.

let batch = context.createBatch({
    onSuccess:       "done",
    results:         "PERSIST",
    resultsTtlHours: 24
});

// Each child's action returns something useful:
// In "computeStats": context.return({ rowsProcessed: ..., bytesScanned: ... });

In the finalizer, prefer a cursor-style read so a million children don't try to load into memory at once:

let datasetId = context.getInput("resultsDatasetId");
let cursor = context.scratchpad.getRecordsCursor(datasetId);
try {
    while (cursor.hasNext()) {
        let record = cursor.next();
        let result = record.get("data").get("result");
        // ... process each child's return value
    }
} finally {
    cursor.close();
}

The scratchpad cursor is closed in a try/finally for the same reason database results are: it holds an open transaction.

REDUCE — Aggregate as You Go

For when you want to aggregate child results without storing them all. Each child's result is folded into a running accumulator by a reducer action you write. Memory cost is constant regardless of how many children you have.

This is the right choice for things like: "sum the bytes scanned across all 100,000 children" or "build a count-by-category histogram."

let batch = context.createBatch({
    onSuccess:      "done",
    results:        "REDUCE",
    reducer:        "sumBytes",
    reducerInitial: { totalBytes: 0, totalRows: 0 }
});

The reducer action receives the current accumulator and one child's result, and returns the new accumulator:

// Action: "sumBytes"
let accumulator = context.getInput("accumulator");
let childResult = context.getInput("result");

context.return({
    totalBytes: accumulator.get("totalBytes") + childResult.get("bytes"),
    totalRows:  accumulator.get("totalRows")  + childResult.get("rows")
});

Critical: The reducer must be pure. If two children report at the same time and the platform retries the reducer to resolve the race, it must be safe to run twice. Don't write to the database, don't call external APIs — just compute the new accumulator from the inputs. Anything else risks duplicate side effects.

The final accumulator is stored on the execution document; it's available in the finalizer as context.getInput("reducerState").

Choosing a Result Mode

ModeStorage CostWhen to Use
COUNTNoneDefault. You only need totals.
FAILURES_ONLYProportional to failure rateYou need to debug or retry failures.
PERSISTProportional to total childrenYou need every return value.
REDUCEConstantYou need an aggregate (sum, count, histogram).

Inspecting and Cancelling a Running Batch

The handle returned by createBatch lets you check on a running batch or cancel it:

let snapshot = batch.snapshot();
context.log("Batch progress: {}/{} done, {} failed",
    snapshot.get("succeededJobs") + snapshot.get("failedJobs"),
    snapshot.get("totalJobs"),
    snapshot.get("failedJobs"));

// Or to abort:
batch.cancel();

Cancelling causes the batch to immediately transition to CANCELLED and run onError with reason: "CANCELLED". Children already running will finish (they can't be unsent from the queue), but their results are ignored.

snapshot() reads from the database, so it's a real-time view across all worker instances — not just what this script knows.


Real-World Use Cases

This is where it all comes together. Each example below is a fully-worked scenario showing how to combine the features above for a common business problem.

Use Case 1: Bulk Email Campaign

The job: A marketing team clicks "Send" on a campaign that targets 80,000 customers. The button click should return immediately, the emails should send in the background, and the team should be notified when it's done.

// Action: "sendCampaign" — triggered by the marketing UI

let campaignId = context.getRecord().get("id");

let recipients = context.query(
    "select c from customer c " +
    "where c.optedInToMarketing = true " +
    "and c.country in ${countries}",
    { countries: campaign.get("targetCountries") });

let batch = context.createBatch({
    onSuccess: "campaignFinishedNotification",
    onPartial: "campaignFinishedNotification",
    onError:   "campaignFailedNotification",
    results:           "FAILURES_ONLY",     // we want to see the bounces
    failureThreshold:  0.05,                // 5% bounce budget
    failureMinSamples: 100,
    maxConcurrent:     50,                  // don't overwhelm the SMTP server
    deadlineMs:        Date.now() + 4 * 60 * 60 * 1000,
    context: {
        campaignId:  campaignId,
        triggeredBy: context.user.username
    }
});
try {
    while (recipients.hasNext()) {
        batch.add("sendOneCampaignEmail", recipients.next(), {
            campaignId: campaignId
        });
    }
} finally {
    recipients.close();
    batch.seal();
}

context.return({
    message: "Campaign queued for "
             + batch.snapshot().get("totalJobs") + " recipients",
    batchId: batch.getId()
});
// Action: "sendOneCampaignEmail"

let customer   = context.getRecord();
let campaignId = context.getParameter("campaignId");
let campaign   = context.getRecord("campaign", campaignId);

// Idempotency: if we've already sent this one, skip it.
// Re-runs (caused by retries or message redelivery) won't double-send.
let alreadySent = context.query(
    "select s from sentEmail s " +
    "where s.campaignId = ${cid} and s.customerId = ${uid}",
    { cid: campaignId, uid: customer.get("id") });
try {
    if (alreadySent.hasNext()) {
        return;   // already sent, nothing to do
    }
} finally {
    alreadySent.close();
}

context.sendMail({
    to:      customer.get("email"),
    subject: campaign.get("subject"),
    body:    context.renderTemplate(campaign.get("template"), customer)
});

context.create("sentEmail", {
    campaignId: campaignId,
    customerId: customer.get("id"),
    sentAt:     Date.now()
});
// Action: "campaignFinishedNotification"

let campaignId  = context.getParameter("campaignId");
let triggeredBy = context.getParameter("triggeredBy");
let succeeded   = context.getInput("succeededJobs");
let failed      = context.getInput("failedJobs");
let total       = context.getInput("totalJobs");

context.sendMail({
    to:      triggeredBy + "@example.com",
    subject: "Campaign " + campaignId + " complete",
    body:    "Sent: " + succeeded + " / " + total
             + ", Bounced: " + failed
});

What this gives you:

  • Instant UI response. The user sees "Campaign queued for 80,000 recipients" within milliseconds.
  • Throttled SMTP load. No more than 50 emails in flight at once.
  • Bounce protection. If more than 5% of emails fail (e.g. SMTP went down), the batch aborts and an alert fires.
  • No duplicate sends. The alreadySent check makes the child idempotent — perfect for the platform's at-least-once delivery model.

Use Case 2: Nightly Product Sync With Rate-Limited API

The job: Every night at 2am, sync 5,000 products with Stripe. Stripe's rate limit is 100 requests/second, so we need to throttle.

// Action: "nightlyStripeSync" — triggered by SCHEDULED rule

let products = context.query("select p from product p where p.active = true");

let batch = context.createBatch({
    onSuccess:        "stripeSyncCompleted",
    onError:          "stripeSyncFailed",
    results:          "FAILURES_ONLY",
    maxConcurrent:    20,                    // ~20 req/s — well under Stripe's limit
    failureThreshold: 0.02,                  // 2% — Stripe is reliable
    deadlineMs:       Date.now() + 2 * 60 * 60 * 1000,
    context: {
        runDate: new Date().toISOString().slice(0, 10),
        cron:    "nightly-stripe-sync"
    }
});
try {
    while (products.hasNext()) {
        batch.add("syncOneProductToStripe", products.next(), null);
    }
} finally {
    products.close();
    batch.seal();
}
// Action: "syncOneProductToStripe"

let product = context.getRecord();
let stripeApiKey = context.getConfig("stripe_api_key");

let response = context.httpRequest({
    method: "POST",
    url:    "https://api.stripe.com/v1/products/" + product.get("stripeId"),
    headers: { "Authorization": "Bearer " + stripeApiKey },
    body: {
        name:        product.get("name"),
        description: product.get("description"),
        metadata: { internalId: product.get("id") }
    }
});

if (response.statusCode === 429) {
    // Stripe is telling us to slow down. Throw to trigger retry.
    throw new Error("Stripe rate limit hit, will retry");
}

if (response.statusCode >= 400) {
    throw new Error("Stripe error " + response.statusCode + ": " + response.body);
}

product.set("lastStripeSync", Date.now());
context.update(product);

The platform's automatic retry with backoff (5/15/30 minutes) handles the rare 429 perfectly without you writing any retry code.


Use Case 3: Order Processing with Three Outcomes

The job: Process all pending orders. Most should succeed cleanly. A small number might fail due to validation issues — those need manual review. If the failure rate spikes, page ops immediately.

// Action: "processNewOrders"

let orders = context.query(
    "select o from order o where o.status = 'pending'");

let batch = context.createBatch({
    onSuccess: "orderBatchOk",
    onPartial: "orderBatchPartial",
    onError:   "orderBatchFailed",
    results:           "FAILURES_ONLY",
    failureThreshold:  0.05,                // 5% failure budget
    failureMinSamples: 20,
    maxConcurrent:     50,
    deadlineMs:        Date.now() + 60 * 60 * 1000,   // 1 hour
    context: {
        triggeredAt: Date.now(),
        triggeredBy: context.user.username,
        cycleId:     cycle.get("id")
    }
});
try {
    while (orders.hasNext()) {
        batch.add("processOneOrder", orders.next(), null);
    }
} finally {
    orders.close();
    batch.seal();
}

Three finalizers, each handling its own outcome:

// Action: "orderBatchOk" — happy path
context.log("Order cycle {} processed {} orders cleanly in {}s",
    context.getParameter("cycleId"),
    context.getInput("totalJobs"),
    Math.round(context.getInput("durationMs") / 1000));
// Action: "orderBatchPartial" — most worked, some didn't
let cycleId   = context.getParameter("cycleId");
let datasetId = context.getInput("resultsDatasetId");
let failures  = context.scratchpad.getAllRecords(datasetId);

context.sendMail({
    to:      "ops@example.com",
    subject: "Order cycle " + cycleId + " had partial failures",
    body:    context.getInput("failedJobs") + " of " + context.getInput("totalJobs")
             + " orders failed; see scratchpad dataset " + datasetId
});

// Queue the failed orders for human review
failures.forEach(function(rec) {
    let data = rec.getData();
    context.create("manualReviewQueue", {
        cycleId:       cycleId,
        orderId:       rec.getRecordId(),
        failureReason: data.get("result")
    });
});
// Action: "orderBatchFailed" — something is seriously wrong
let cycleId = context.getParameter("cycleId");

context.sendMail({
    to:      "ops@example.com",
    subject: "URGENT: Order cycle " + cycleId + " FAILED",
    body:    "State: "  + context.getInput("state")
           + ", reason: " + context.getInput("reason")
           + ".  " + context.getInput("failedJobs") + "/"
           + context.getInput("totalJobs")
           + " failed.  Investigate before retrying."
});

// Optionally page on-call via your incident management webhook
context.httpRequest({
    method: "POST",
    url:    context.getConfig("pagerduty_webhook"),
    body: {
        severity: "critical",
        summary:  "Order processing failed: cycle " + cycleId
    }
});

The three finalizers give each outcome distinct, useful behavior. Happy path just logs. Partial logs and queues for review. Total failure pages someone. This is the entire point of having three hooks.

Pattern: If your post-processing logic is largely shared and only the messaging differs, point all three hooks at the same action and switch on context.getInput("state") inside it. Use three separate finalizers when each branch does substantially different work.


Use Case 4: Aggregation with REDUCE — Storage Audit

The job: Once a week, audit storage usage across all customers. Sum up bytes used, count files, and find the top consumers — without storing 2 million per-customer result records.

// Action: "weeklyStorageAudit"

let customers = context.query("select c from customer c where c.active = true");

let batch = context.createBatch({
    onSuccess: "storageAuditDone",
    results:   "REDUCE",
    reducer:   "accumulateStorageStats",
    reducerInitial: {
        totalBytes:    0,
        totalFiles:    0,
        totalCustomers: 0,
        topConsumers:  []           // top 10 by bytes
    }
});
try {
    while (customers.hasNext()) {
        batch.add("countCustomerStorage", customers.next(), null);
    }
} finally {
    customers.close();
    batch.seal();
}
// Action: "countCustomerStorage" — runs once per customer
let customer = context.getRecord();
let files = context.query(
    "select f from file f where f.customerId = ${cid}",
    { cid: customer.get("id") });
let bytes = 0;
let fileCount = 0;
try {
    while (files.hasNext()) {
        let file = files.next();
        bytes += file.get("size");
        fileCount++;
    }
} finally {
    files.close();
}

context.return({
    customerId: customer.get("id"),
    bytes:      bytes,
    files:      fileCount
});
// Action: "accumulateStorageStats" — REDUCER (must be pure!)
let acc    = context.getInput("accumulator");
let result = context.getInput("result");

// Maintain top-10 list
let top = acc.get("topConsumers");
top.push({ customerId: result.get("customerId"), bytes: result.get("bytes") });
top.sort(function(a, b) { return b.bytes - a.bytes; });
if (top.length > 10) top = top.slice(0, 10);

context.return({
    totalBytes:     acc.get("totalBytes") + result.get("bytes"),
    totalFiles:     acc.get("totalFiles") + result.get("files"),
    totalCustomers: acc.get("totalCustomers") + 1,
    topConsumers:   top
});
// Action: "storageAuditDone"
let stats = context.getInput("reducerState");

context.create("storageReport", {
    runDate:        new Date(),
    totalBytes:     stats.get("totalBytes"),
    totalFiles:     stats.get("totalFiles"),
    totalCustomers: stats.get("totalCustomers"),
    topConsumers:   JSON.stringify(stats.get("topConsumers"))
});

Memory cost is constant regardless of whether you have 1,000 or 10,000,000 customers. The reducer is invoked once per child, folding the result into the running accumulator.


Use Case 5: Multi-Tenant Fan-Out (Nested Batches)

The job: Nightly sync for every active tenant. Each tenant has its own per-customer fan-out. We want both levels visible in the dashboards and trackable as a single hierarchy.

A batch can launch another batch from inside one of its children. The platform automatically wires the inner batch as a child of the outer. From the outer batch's point of view, the tenant's child action is "done" only when the inner batch reaches a terminal state.

// Action: "nightlySyncAllTenants" — triggered by SCHEDULED rule

let tenants = context.query("select t from tenant t where t.active = true");
let outer = context.createBatch({
    onSuccess: "allTenantsSynced",
    onError:   "syncFailed"
});
try {
    while (tenants.hasNext()) {
        outer.add("syncOneTenant", tenants.next(), null);
    }
} finally {
    tenants.close();
    outer.seal();
}
// Action: "syncOneTenant" — runs once per tenant

let tenant = context.getRecord();
let customers = context.query(
    "select c from customer c where c.tenantId = ${tenantId}",
    { tenantId: tenant.get("id") });

// This inner batch is automatically wired as a child of the outer batch.
// You don't have to do anything special.
let inner = context.createBatch({
    onSuccess: "tenantSyncDone",
    onError:   "tenantSyncFailed",
    context: { tenantId: tenant.get("id") }
});
try {
    while (customers.hasNext()) {
        inner.add("syncOneCustomer", customers.next(), null);
    }
} finally {
    customers.close();
    inner.seal();
}
// Note: this action returns NOW, but the outer batch is told to wait
// until the inner batch completes. Reporting back is automatic.

This pattern composes to arbitrary depth.


Use Case 6: Replay Just the Failures

The job: Last night's batch finished with 47 failures. The cause has been fixed. We don't want to re-run the 50,000 successful ones — just the failures.

If you used FAILURES_ONLY mode, the failures are recorded in the scratchpad. Iterate them in a follow-up:

// Action: "retryFailedOrders" — manually triggered

let originalDatasetId = params.datasetId;   // passed from admin UI
let failures = context.scratchpad.getAllRecords(originalDatasetId);

let retry = context.createBatch({
    onSuccess: "retryDone",
    onError:   "retryFailed",
    context:   { reason: "manual-retry-of-" + originalDatasetId }
});
try {
    failures.forEach(function(rec) {
        let originalId = rec.getRecordId();
        let lookup = context.query(
            "select o from order o where o.id = ${id}",
            { id: originalId });
        try {
            if (lookup.hasNext()) {
                retry.add("processOneOrder", lookup.next(), null);
            }
        } finally {
            lookup.close();
        }
    });
} finally {
    retry.seal();
}

Use Case 7: Loop — Paginating an External API

The job: Sync orders from a partner's API. We don't know how many pages there are; the partner will tell us when there's no more data.

This is the canonical case for a loop — sequential work where each step decides whether to keep going.

// Action: kicking off the loop
context.createLoop({
    workItemType:  "syncOrdersFromPartner",
    initialState:  { offset: 0, batchSize: 200 },
    onComplete:    "partnerSyncDone",
    onError:       "partnerSyncFailed",
    maxIterations: 5000,
    deadlineMs:    Date.now() + 4 * 60 * 60 * 1000,   // 4 hours
    context:       { partnerName: "Acme Co" }
});

Unlike createBatch, the work itself is a registered Java component (a LoopWorker) — not a JavaScript action. This is because loop workers need to be carefully written for idempotency, which is easier to enforce in compiled Java code. The workItemType is just a string that names a worker that's already been registered with the platform.

If you're a tenant developer, ask a platform engineer to register the worker for you.

The loop's finalizer receives:

FieldDescription
executionId, state, reasonSame as batches
workItemTypeThe worker name
iterationHow many iterations actually ran
outputOptional per-completion output the worker passes back
stateDatasetIdWhere the final state record lives
// Action: "partnerSyncDone"
context.log("Synced {} pages from {} over {}s",
    context.getInput("iteration"),
    context.getParameter("partnerName"),
    Math.round(context.getInput("durationMs") / 1000));

Note: Loops are a less common pattern. Most "big jobs" in application development are batches, not loops. The loop primitive is here when you genuinely need it — drain a queue, paginate an API, walk a long sequence — but try a batch first.


What the Platform Handles Automatically

Here's a partial list of things you don't need to worry about:

  • Worker crashes mid-child. The message comes back to the queue; another worker picks it up. The action runs again — so make children idempotent.
  • Server restarts mid-batch. All batch state lives in MongoDB. When the server comes back up, sweeps pick up where things left off.
  • Lost messages. The deadline sweeper is the safety net. A batch that's run longer than its deadline is forcibly transitioned to TIMED_OUT and its onError runs.
  • Duplicate child reports. The platform deduplicates internally, so a child running twice (e.g. due to message redelivery) only counts once toward the batch's totals.
  • Finalizer crashes. The finalizer is just another action. If it fails, it's retried like any other action job. If it fails repeatedly, an operator can replay it with one click.

Things You Should Think About

ConcernWhat to Do
Always close query results.Use try/finally consistently. Without result.close() you leak transactions and exhaust the connection pool.
Make child actions idempotent.The platform delivers messages "at least once," not "exactly once." If a child sends an email or charges a card, write your code so a re-run doesn't double-charge (e.g. set a flag on the record before doing the side effect, and check it on entry).
Pick a reasonable deadline.Default 24h is fine for most batches but can be too long for jobs that should obviously be done in minutes. A short deadline trips alarms sooner.
Use failureThreshold deliberately."Send 100,000 marketing emails" can survive 0.1% bouncing; "process 100,000 customer payments" cannot.
Prefer FAILURES_ONLY over PERSIST for very large batches.Persisting every result for a million children is wasteful if you only ever look at the failures.
Reducers must be pure.No database writes, no API calls — just compute the new accumulator from the inputs.

API Reference

context.createBatch(options)

Returns a BatchExecution handle.

context.createBatch({
    // --- Lifecycle hooks ---
    onSuccess:         "actionName",   // when state == COMPLETED
    onError:           "actionName",   // FAILED, TIMED_OUT, CANCELLED
    onPartial:         "actionName",   // PARTIAL (falls back to onSuccess)

    // --- Result persistence ---
    results:           "COUNT",        // or PERSIST | FAILURES_ONLY | REDUCE
    resultsTtlHours:   24,             // 1..168
    reducer:           "reducer",      // REDUCE only
    reducerInitial:    { },            // REDUCE only
    reducerFailure:    "SKIP",         // SKIP | ABORT, default SKIP

    // --- Safety nets ---
    failureThreshold:  0.10,           // null = no threshold
    failureMinSamples: 10,
    deadlineMs:        Date.now() + 3600000,   // default: createdAt + 24h

    // --- Concurrency ---
    maxConcurrent:     50,             // null = unlimited

    // --- Caller state ---
    context:           { }             // forwarded to the finalizer
});

BatchExecution Methods

MethodDescription
add(actionRef, record, params)Schedule one child invocation. Blocks if maxConcurrent is set and the cap is reached.
addAll(actionRef, records, params)Convenience: same action against many records.
seal()Mark the batch fully populated. Idempotent. Required before completion can fire.
cancel()Force the batch to CANCELLED and run onError.
snapshot()Read-only progress view. Reads from the database.
getId()The batch's id; useful for logging or admin tooling.

context.createLoop(options)

Returns a LoopExecution handle.

context.createLoop({
    workItemType:   "registeredWorkerName",   // required
    initialState:   { },
    onComplete:     "actionName",
    onError:        "actionName",
    maxIterations:  1000,                     // null = no cap
    deadlineMs:     Date.now() + 3600000,
    context:        { }
});

Finalizer Input Fields

The finalizer reads its input via context.getInput("name") (or the whole map via context.getInput()). Available fields:

FieldApplies ToDescription
executionIdbothThe batch / loop's id
statebothTerminal state
reasonbothHuman-readable reason string
totalJobsbatchTotal children added
succeededJobsbatchChildren that succeeded
failedJobsbatchChildren that failed (after retries)
iterationloopHow many iterations ran
createdAtbothEpoch ms when created
sealedAtbatchWhen seal() was called
completedAtbothWhen terminal state was reached
durationMsboth`completedAt - (sealedAt
resultsDatasetIdbatch (when results != COUNT)Scratchpad dataset id
stateDatasetIdloopScratchpad dataset id of last state
outputloop (on STOPPED)Optional output from worker
reducerStatebatch (when results == REDUCE)The final accumulator

Whatever you passed as context: {...} when creating the batch / loop arrives as parameters, fetched with context.getParameter("name").


Common Patterns

One Finalizer for Every Outcome

If your post-processing logic is largely shared, point all three hooks at the same action and switch on state:

let batch = context.createBatch({
    onSuccess: "handleBatchOutcome",
    onPartial: "handleBatchOutcome",
    onError:   "handleBatchOutcome",
    context:   { campaignId: campaign.get("id") }
});
// Action: "handleBatchOutcome"
let state      = context.getInput("state");
let succeeded  = context.getInput("succeededJobs");
let failed     = context.getInput("failedJobs");
let total      = context.getInput("totalJobs");
let campaignId = context.getParameter("campaignId");

switch (state) {
    case "COMPLETED":
        context.log("Campaign {} completed cleanly: {} sent",
            campaignId, succeeded);
        break;
    case "PARTIAL":
        context.log("Campaign {} mostly succeeded: {}/{} sent, {} failed",
            campaignId, succeeded, total, failed);
        break;
    case "FAILED":
    case "TIMED_OUT":
    case "CANCELLED":
        context.sendMail({
            to:      "ops@example.com",
            subject: "Campaign " + campaignId + " ended in " + state,
            body:    failed + "/" + total + " failed"
        });
        break;
}

Returning a Batch ID for a "Job Status" UI

When a UI button kicks off a batch, returning the batch id lets you build an admin view that shows running jobs and their progress:

// At the end of "sendCampaign":
context.return({
    message: "Campaign queued for "
             + batch.snapshot().get("totalJobs") + " recipients",
    batchId: batch.getId()
});

The UI can store batchId and poll a server action that calls batch.snapshot() to render a live progress bar.

Periodic Cleanup with a Loop

Loops are a natural fit for "keep doing X until there's no more X":

context.createLoop({
    workItemType: "pruneOldThumbnails",
    initialState: { cursor: null },
    onComplete:   "cleanupDone",
    deadlineMs:   Date.now() + 6 * 60 * 60 * 1000
});

The Java-side pruneOldThumbnails worker deletes thumbnails older than 90 days in pages of 1000, and reports STOP when the page is empty.


What Admins See

Operators have visibility into running and recent batches via JMX (jconsole or any Java monitoring tool):

  • Counters showing how many batches are starting, completing, failing, timing out across the JVM's lifetime — useful for trending.
  • Latency: average and max time from seal() to completion.
  • Live operations they can invoke: force-terminate a runaway batch, replay a finalizer that didn't run cleanly.

You don't usually need to think about this — it's there for the rare day when something genuinely goes wrong and an operator needs to investigate. But it's worth knowing the operations are available; if you're debugging a stuck batch with the ops team, they have tools.


Three Habits That Pay Off

The system rewards three habits:

  1. Trust the platform's safety nets. Don't write your own retry loops; don't write your own deadline tracking; don't write your own deduplication. The platform does these correctly. Your code should focus on the business logic of one child invocation.
  2. Be deliberate about result modes. COUNT is free and almost always enough. Move up the cost ladder only when the finalizer genuinely needs the data.
  3. Write small, idempotent child actions. A child action that's safe to run twice is a child action you'll never have to debug at 3am.

If you find yourself fighting the system — writing complicated state-tracking code, ad-hoc retry loops, manual progress queues — there's almost certainly a simpler way using the primitives in this guide. Reach out before you build something custom.


Next Steps