The moderation gate

This is a real chain that ships in examples/simple-chat-with-moderation.json. It's also a small lesson in what authoring buys you.

The chain

{
  "id": "simple-chat",
  "tasks": [
    {
      "id": "moderate",
      "handler": "prompt_to_int",
      "system_instruction": "Classify the input as safe (0) or unsafe (10) — respond with a single integer 0 to 10.",
      "execute_config": {
        "model": "gemini-3.1-flash-lite-preview",
        "provider": "gemini"
      },
      "transition": {
        "branches": [
          { "operator": ">", "when": "6", "goto": "reject_request" },
          { "operator": "default", "goto": "simple-chat" }
        ]
      }
    },
    {
      "id": "simple-chat",
      "handler": "chat_completion",
      "system_instruction": "You're a helpful assistant talking to an expert.",
      "execute_config": {
        "model": "gemini-3.1-flash-lite-preview",
        "provider": "gemini"
      },
      "transition": { "branches": [{ "operator": "default", "goto": "end" }] }
    },
    {
      "id": "reject_request",
      "handler": "prompt_to_string",
      "system_instruction": "You're a helpful assistant.",
      "prompt_template": "Inform the user that their message was rejected because it was flagged as unsafe.",
      "transition": { "branches": [{ "operator": "default", "goto": "end" }] }
    }
  ]
}

Why a separate task instead of a system-prompt rule

I could have written "if the message is unsafe, refuse" into the chat task's system prompt. People do that. It works most of the time. But it puts the safety decision and the answer-the-user decision into the same model call, which means one drift in the model's behavior changes both. And it gives me no place to put the threshold.

Two tasks separates the questions. The classifier has one job — return an integer — and is judged on that job. The responder is judged on whether it answered well. When something goes wrong, I know which one to look at.

Why `prompt_to_int`, not a yes/no string

The handler for the gate is prompt_to_int, not prompt_to_string. The model is pushed to commit to a number, and the number is what the branch reads. No regex parsing of "I think this is unsafe but…", no dependency on whether the model said "yes" or "Yes" or "yes, definitely." The integer is the contract.

Why `> 6`, not `> 5`

This one is mine. A 5-or-above policy fired too often on borderline-but-fine inputs in my testing. A 7-or-above policy let through cases I didn't want through. The threshold is empirical, and it's mine to tune. Every time I re-read this chain I get to ask: is 6 still the right number? It's a JSON key. I can move it.

Why a rejection task instead of an exception

I could have raised an error. A lot of chains do, and that's fine when the caller is a script that wants to know it failed. But this chain is the chat backend for a person, and I'd rather the rejection be a sentence the user can read than an HTTP error code. So reject_request is itself a small chat call — a different model could write nicer rejection text — and the user gets a graceful response.

Why a small model on the gate

Both tasks here use gemini-3.1-flash-lite-preview for portability of this example, but in production I run the gate on the cheapest fastest classifier I can find and the responder on a stronger model. The gate decides whether the expensive call runs at all. Authoring lets me make that economic decision per task; if the model selection were buried in the engine I'd have one knob, not two.

What you took home

The chain is the artifact. Every decision in it is a JSON key:

The choice to gate at all (a separate task, not a prompt rule)
The shape of the gate's output (prompt_to_int)
The threshold (> 6)
The failure mode (a rejection task, not an exception)
The model on each side (cheap classifier, stronger responder)

Every one of those is yours to author. None of them is a vendor flag.