> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tensor9.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Running Commands

A submitted ops command moves through a state machine on its way from
"you want to run this" to "you have the released output." This page
walks your side of that flow: how to submit, how to watch progress,
what your customer sees in parallel on the `/support/<token>` link
they receive, and how to verify the audit chain after the fact.

<img src="https://mintcdn.com/tensor9/Q7wEl9vOj9lICJg4/images/diagrams/lifecycle-howitworks-dark.svg?fit=max&auto=format&n=Q7wEl9vOj9lICJg4&q=85&s=a51233c2cbad4ea1b1b68c91bfd5c366" className="block dark:hidden" alt="How a command moves: you submit, your customer approves, the output is released to you." width="820" height="200" data-path="images/diagrams/lifecycle-howitworks-dark.svg" />

<img src="https://mintcdn.com/tensor9/Q7wEl9vOj9lICJg4/images/diagrams/lifecycle-howitworks-light.svg?fit=max&auto=format&n=Q7wEl9vOj9lICJg4&q=85&s=d8bdad5a81c18c051216483fc263395d" className="hidden dark:block" alt="How a command moves: you submit, your customer approves, the output is released to you." width="820" height="200" data-path="images/diagrams/lifecycle-howitworks-light.svg" />

## Submitting a command

```bash theme={null}
tensor9 ops command create \
  --appName my-app \
  --customerName acme-corp \
  --template linux-disk-usage \
  --vars MOUNT_PREFIX=/var/lib/myapp \
  --commandName check-myapp-disk \
  --reason "investigating disk pressure on tenant alerts"
```

Required flags:

| Flag             | Purpose                                                                                             |
| ---------------- | --------------------------------------------------------------------------------------------------- |
| `--appName`      | The Tensor9 app the command targets.                                                                |
| `--customerName` | Which of your customers' appliances will execute the command.                                       |
| `--commandName`  | A memorable identifier (3-64 chars, lowercase + hyphens). You'll use this to retrieve, cancel, etc. |

One of the following picks the body of the command:

| Flag         | What it does                                                                               |
| ------------ | ------------------------------------------------------------------------------------------ |
| `--template` | Reference an already-imported template by id. Most common.                                 |
| `--command`  | Inline ad-hoc command body. Useful for one-off shell snippets that don't merit a template. |

Common modifier flags:

| Flag            | Purpose                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--vars`        | Comma-separated `KEY=VALUE` pairs for template variables (`--vars NAMESPACE=prod,DEPLOYMENT=api`). See "A note on `--vars` escaping" below.                                                                                                                                                                                                                                                                                                                                                         |
| `--permissions` | `ReadOnly` (default), `ReadWrite`, or `Admin`. Drives the role minted on the appliance for Kubectl-tier commands.                                                                                                                                                                                                                                                                                                                                                                                   |
| `--reason`      | Free-text justification. Shown to your customer at approval time and recorded in the audit trail.                                                                                                                                                                                                                                                                                                                                                                                                   |
| `--timeout`     | How long to wait for customer approval before the command times out. Default `7d`.                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `--originRsxId` | Required only for Kubectl ad-hoc and Kubectl templates: the Terraform resource address of the target cluster (e.g. `aws_eks_cluster.production`). Ignored for Tf and Script paths. The id is resolved against your **latest published release** of this app, not against the version currently running on your customer's appliance; if those differ and the resource was renamed across versions, the cluster lookup may fail. There is no flag today to pin the resolution to a specific release. |

The `--commandType` flag selects between command shapes. It's almost
always inferred from `--template` or auto-defaults; only set
explicitly if you're authoring tooling that needs to be specific:

| `--commandType` value | When the system uses it                                           |
| --------------------- | ----------------------------------------------------------------- |
| `Kubectl`             | Ad-hoc kubectl invocation (no template). Default for ad-hoc.      |
| `KubectlFromTmpl`     | Auto-selected when `--template` resolves to a Kubectl template.   |
| `ScriptFromTmpl`      | Auto-selected when `--template` resolves to a Script template.    |
| `TfFromTmpl`          | Auto-selected when `--template` resolves to a Terraform template. |

#### A note on `--vars` escaping

Today `--vars` is a single comma-separated string, which means values
cannot contain commas or `=` characters. This is a known limitation;
support for repeated `--var KEY=VALUE` flags is planned. Until then,
work around with templates whose variable values are constrained to
simple alphanumerics + path characters.

The action prints the assigned `commandName` and the initial state,
then returns. The command is now `Submitted`; your customer's review
experience begins next.

## Lifecycle at a glance

<img src="https://mintcdn.com/tensor9/m_BTtWnjOEN3oQjN/images/diagrams/ops-command-lifecycle-dark.svg?fit=max&auto=format&n=m_BTtWnjOEN3oQjN&q=85&s=bd255f4fc8d4b608c1ab4f38bc2165f8" className="block dark:hidden" alt="Ops command lifecycle: three lanes (Command Approval, Execution, Output Release) with happy-path states across the top, terminal unhappy states across the bottom, and an intermediate Cancelling state reachable from Submitted, CmdApproving, or Executing." width="920" height="420" data-path="images/diagrams/ops-command-lifecycle-dark.svg" />

<img src="https://mintcdn.com/tensor9/m_BTtWnjOEN3oQjN/images/diagrams/ops-command-lifecycle-light.svg?fit=max&auto=format&n=m_BTtWnjOEN3oQjN&q=85&s=fe11227981a2b06d5d80d8104c24fd09" className="hidden dark:block" alt="Ops command lifecycle: three lanes (Command Approval, Execution, Output Release) with happy-path states across the top, terminal unhappy states across the bottom, and an intermediate Cancelling state reachable from Submitted, CmdApproving, or Executing." width="920" height="420" data-path="images/diagrams/ops-command-lifecycle-light.svg" />

The happy path has six in-flight states, two terminal happy states,
and five terminal unhappy states. One intermediate state
(`Cancelling`) covers the brief window where a cancel request has
landed but the appliance is still tearing down.

### Happy path in words

<Steps>
  <Step title="Submitted">
    The command exists; your customer's appliance has not picked it up yet.
    You see this immediately after `tensor9 ops command create`.
  </Step>

  <Step title="CmdApproving">
    Your customer's appliance has the command in its inbox and is waiting
    on a human review decision. Your customer sees the
    `/support/<token>` link and walks the four-step approval UI.
  </Step>

  <Step title="CmdApproved">
    Your customer approved execution and the appliance is preparing to
    run the command.
  </Step>

  <Step title="Executing">
    The command body is running inside the appliance's sandboxed
    working directory. The appliance captures stdout / stderr / exit code,
    uploads each output stream to your blob store (S3 in your customer's
    account) and stores a small `[blob: ...]\\n<presigned-url>` payload
    on the command record. That payload is then encrypted with a
    per-command key.
  </Step>

  <Step title="Executed">
    Execution finished. The appliance is now waiting for your customer to
    review the output and decide whether to release it to you.
  </Step>

  <Step title="OutputApproved">
    Your customer signed an Ed25519 release manifest. The control plane
    surfaces the decrypted blob-payload to you; calling `tensor9 ops
        command retrieve` returns the payload, and you fetch the actual
    bytes by curling the presigned URL it carries. State advances to
    `Completed` next.
  </Step>
</Steps>

### Terminal unhappy states

| State             | What happened                                                                                                                                                                                                                                         |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `CmdRejected`     | Your customer rejected the command at review (Step 1 of their approval UI).                                                                                                                                                                           |
| `OutputRejected`  | Your customer approved execution, but rejected releasing the output. You never see stdout / stderr.                                                                                                                                                   |
| `ExecutionFailed` | The command ran but the appliance reported a non-zero exit code or an internal error (or the staleness recovery fired). Terminal: no transition out, no output release path. The failure stderr is uploaded to your blob store like any other output. |
| `Cancelled`       | You explicitly cancelled (`tensor9 ops command cancel`) before the command was approved.                                                                                                                                                              |
| `Timeout`         | Your customer never decided within the `--timeout` window.                                                                                                                                                                                            |

### Encryption mechanism

Output is encrypted with a per-command AES-256-GCM key that the
appliance generates on-the-fly. The ciphertext carries a SHA-256
fingerprint of the key (in the AAD), so the decrypt path can find
the right key in the appliance's secret store even after a
re-execution overwrites the per-scope slot. Keys live in the
appliance's secret store under
`/t9-private/projection/.../ops-cmd/...` and are deleted after
successful release or output rejection.

When you see a "decryption failed" alert from the appliance, the
likely causes are: (a) a re-execution clobbered the scope-keyed
slot before your customer released the previous run's output, (b)
the secret store is unreachable, (c) your customer rotated keys
mid-flight. The fingerprint addressing is the defense for (a); see
the appliance audit log for the specific failure mode.

## Watching progress

```bash theme={null}
# Snapshot of every command across this app
tensor9 ops command list --appName my-app

# Same, including completed and rejected history
tensor9 ops command list --appName my-app --history

# Drill into one command
tensor9 ops command retrieve --appName my-app --commandName check-myapp-disk
```

`retrieve` shows the current state, full audit chain (who approved
what, when), and (once `Completed`) the released stdout / stderr / exit
code. Both commands accept `--output json` for scripting; pipe into
`jq .lifecycle` to poll a single state value.

## What your customer sees

When you submit, your customer is sent (via your existing notification
channel) a unique `/support/<token>` web link. Clicking it opens a
four-step approval UI:

<Note>
  If your customer hasn't subscribed a notification channel for
  ops-command events, submission silently produces no notification.
  The command still appears in your `tensor9 ops command list` (state:
  `Submitted`), and your customer would only see it if they happened
  to visit the support portal directly. Confirm channel subscription
  with each customer at onboarding; otherwise an "unanswered" command
  is more likely to mean "your customer doesn't know about it" than
  "your customer is ignoring it."
</Note>

| Step        | What your customer does                                                                                                                                                   |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Review**  | Reads the template description, declared `data_access` tags, your reason, and the exact template body and variable values you submitted. Picks Approve, Reject, or close. |
| **Approve** | Confirms the approval. State advances to `CmdApproved`; the appliance executes immediately.                                                                               |
| **Execute** | Watches a status pane while the appliance runs the command. State stays `Executing` until the output lands.                                                               |
| **Release** | Reviews the decrypted stdout / stderr / exit code, then signs an Ed25519 release manifest. Picks Release or Reject.                                                       |

### What "review" actually shows

For Terraform templates, your customer sees the literal HCL source
plus the values they're submitting for each `variable`. They do
**not** see a `tofu plan` output: a plan would require evaluating
data sources, which can only run after the appliance is authorized
to do so. Your customer is reviewing "the shape of what will execute"
plus the declared `data_access`, `side_effects`, and permission tier,
not a fully-resolved diff.

Things the HCL surface does NOT pre-evaluate for the reviewer:

* **`${var.X}` interpolations stay as literal strings in the
  displayed HCL.** Your customer sees `${var.MOUNT_PREFIX}` in the
  command body and the submitted value of `MOUNT_PREFIX` separately;
  they have to substitute mentally at review time. The approval UI
  shows the submitted variable values next to the HCL.
* **`for_each` cardinality is invisible at review time.** A `for_each =
  toset(data.aws_s3_buckets.all.buckets[*].name)` does not show
  whether it will iterate over 3 buckets or 30,000. Bound the
  potential blast radius via `data_access` + `side_effects`
  declarations and use `description` to explain the cardinality
  semantics in plain English.
* **`local-exec` heredocs are reviewed as shell.** A multi-line
  `command = <<-EOT ... EOT` is shown verbatim. Customers reviewing
  a `kubectl drain ... && kubectl ...` heredoc are reviewing a shell
  program, not a Terraform plan. Keep heredocs short and named in
  the `description`.
* **`null_resource.triggers` are not re-evaluated against prior
  state** (there is no prior state; see
  [Authoring templates](/fundamentals/operations/templates)).
  A `triggers = { mount_prefix = var.MOUNT_PREFIX }` block makes the
  resource look like it fires only on change, which is misleading.
  Document the behavior in the template's `description` or omit the
  triggers block.

Because the review surface is the HCL, the `description` field on
`tensor9_command` carries a lot of weight. Treat it as the
plain-English equivalent of the HCL: name the exact APIs called, the
expected output shape, the cardinality of any fan-out, and the
intended side effects.

For Script and Kubectl templates the review surface is the literal
script body or kubectl invocation. Same caveats: `${VAR}` references
are unsubstituted; the customer reads the script and the variable
values side-by-side.

### Trust properties for the release step

Release has two properties to explain to your customer:

* **The plaintext output passes through the appliance your customer
  already controls before you see it.** Decryption happens on the
  appliance (which lives in your customer's cloud account under
  their IAM); the control plane only ever sees the ciphertext before
  release. Once your customer signs release, the control plane
  forwards the plaintext to you.
* **The release decision is non-repudiable.** Your customer's signed
  release manifest is preserved in the audit chain and can be
  verified independently with [`tensor9 ops command audit verify`](#audit-and-forensics).

Pre-approved templates skip the per-command Review and Approve steps;
see [Pre-approvals](/fundamentals/operations/preapproval).

## Resource limits and queueing

Operations is for diagnostics and short-lived interventions, not for
bulk data extraction. The appliance enforces a few limits that you
should size your templates against:

| Limit                             | Value                        | Implication                                                                                                                                                                                                           |
| --------------------------------- | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Concurrent commands per appliance | 10                           | Approved commands dispatch onto the appliance's bounded execution pool (max 10 worker threads). Commands beyond the cap stay in `CmdApproved` and are picked up on the next polling cycle (\~10s) as workers free up. |
| Per-command runtime cap           | 10 minutes                   | A command running past 10 minutes will be killed and surfaced as `ExecutionFailed`.                                                                                                                                   |
| Stuck-execution recovery          | 20 minutes (2 × runtime cap) | A command that's been in `Executing` for 20 minutes (e.g. because the appliance restarted mid-run) is auto-transitioned to `ExecutionFailed` with stderr noting the likely cause.                                     |
| Per-stream output cap             | 5 GiB                        | stdout and stderr each route through your blob store (uploaded by the appliance, fetched by the customer's release script and your `retrieve` call via a presigned S3 URL). 5 GiB is the AWS S3 single-PUT ceiling.   |
| Children per batch                | 50                           | See "Batches" below.                                                                                                                                                                                                  |

The 5 GiB cap is high enough that you generally don't think about
it. The appliance uploads each stream to your S3 bucket and stores
only a small `[blob: bucket=..., key=..., size=..., sha256=...]\\n<presigned-url>`
payload (encrypted) on the command record. The customer's release
script fetches the actual bytes, sha256-verifies them, and shows
them in the local preview. Your `tensor9 ops command retrieve`
returns the same payload; curling the URL gives you the bytes back.
Bulk log dumps (`journalctl --since 24h`, `kubectl logs deployment/...`)
flow through without the per-template `| tail -c` self-capping
templates used to need.

On Kubernetes-form-factor appliances whose blob store does not yet
support presigned URLs (MinIO is in this category as of this writing),
the appliance falls back to an inline 4 MiB cap with a marker like
`[stdout truncated - 4 MB cap]`; the release script's preview still
works, but the customer and you see only the first 4 MiB of any
stream over the cap.

If you're on-call and you see a command stuck in `Executing` for
more than five minutes, you can either wait for the 20-minute
staleness recovery (automatic) or interrupt it manually with
`tensor9 ops command cancel --commandName <name>`. Cancelling an
`Executing` command transitions
it to `Cancelling` while the appliance tears down; the eventual
terminal state depends on what the appliance was doing at the time.
See "Cancelling" below for the full state-machine view.

### Not yet enforced

Two limits ship in the codebase as constants but no enforcement call
site exists today. Treat these as documentation-of-intent, not as
guarantees:

* **Submissions per hour: 100 per appliance.** Plan around it; do not
  rely on it. The 101st submission this hour will succeed.
* **Cooldown between submissions: 60 seconds per appliance.** Same
  caveat: not enforced today.

Enforcement will land in a future release; until then, rate-limiting
in your own scripts is the only real bound.

## Cancelling

```bash theme={null}
tensor9 ops command cancel --commandName check-myapp-disk
```

What happens depends on the command's current state:

| State at cancel time                                 | Outcome                                                                                                                                                                                                                                                                                                         |
| ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Submitted`, `CmdApproving`                          | Transitions directly to `Cancelled`.                                                                                                                                                                                                                                                                            |
| `CmdApproved`                                        | Transitions to `Cancelled`; the appliance never picks the command up to execute.                                                                                                                                                                                                                                |
| `Executing`                                          | Transitions to `Cancelling`; the appliance tears down its sandbox while the cancel intent propagates. The eventual terminal state depends on what the appliance was doing (typically `Cancelled`, but may surface as `ExecutionFailed` if the process had already produced partial state that needed teardown). |
| `Executed`, `OutputApproving`, or any terminal state | No-op. The cancel arrives too late; output release proceeds normally.                                                                                                                                                                                                                                           |

The Cancelling intermediate state exists specifically because a
running `tofu apply` or `kubectl` invocation needs a moment to wind
down cleanly. In-flight side effects (a partially-created cloud
resource, a partially-applied K8s manifest) may or may not be
backed out depending on the template; if you cancel a mutating
template mid-execution, expect to inspect your customer's environment
afterwards.

If you submitted many commands by accident and need to cancel them
all in one shot:

```bash theme={null}
# Cancel everything against this customer regardless of when submitted
tensor9 ops command batch cancel-bulk \
  --appName my-app \
  --customerName acme-corp \
  --yes

# Or scope to a time window
tensor9 ops command batch cancel-bulk \
  --appName my-app \
  --submittedAfter "2026-05-09T13:55:00Z"
```

`cancel-bulk` lists the commands it's about to cancel and prompts
for confirmation; pass `--yes` to skip the prompt for scripts. Only
commands still in cancellable states are touched; anything past
`CmdApproving` is reported as "skipped" so you know what's still
running.

## Batches

When you need to fan a command out across multiple customers or
appliances, use the batch surface:

```bash theme={null}
tensor9 ops command batch submit --appName my-app --file ./batch-spec.json
tensor9 ops command batch list --appName my-app
tensor9 ops command batch retrieve --appName my-app --batchId <id>
tensor9 ops command batch cancel --appName my-app --batchId <id>
```

A batch creates one underlying ops command per appliance. The
lifecycle tracks each child command independently, so different
customers approving at different times is normal. `batch retrieve`
rolls the children up into a single status summary.

A single batch is capped at **50 child commands** (one per
appliance). For larger fleets, submit multiple batches with a small
delay between them to avoid a thundering-herd against the
notification path. We are working on a higher cap; let us know what
your steady-state fan-out looks like.

## Audit and forensics

Three Ed25519 signatures protect every ops command. Together they
form a non-repudiation chain your customer can verify independently:

| Signature         | Signed by             | What it proves                                                                                                          |
| ----------------- | --------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| `commandApproval` | Appliance signing key | "The command body you submitted, with these specific variable values, was approved by this person at this time."        |
| `outputIntegrity` | Appliance signing key | "Exactly these stdout / stderr / exitCode bytes came out of the command's execution, before any encryption or transit." |
| `outputApproval`  | Appliance signing key | "These specific output bytes were approved for release to you by this person at this time."                             |

The signatures are stored on the command's audit record and survive
the encrypt/decrypt cycle (the integrity signature is computed over
plaintext before encryption, then preserved in the ciphertext
metadata).

Either you or your customer can verify the full chain on a specific
command:

```bash theme={null}
tensor9 ops command audit verify \
  --appName my-app \
  --commandName check-myapp-disk
```

The action retrieves the command record + the appliance's pinned
signing public key, reconstructs the canonical signed-data for each
of the three signatures, and verifies. The three checks use two
different trust anchors:

* `commandApproval` and `outputApproval` verify the buyer's signature
  on the approval / release manifest, using the public key embedded
  in the manifest itself. The output reports the
  `signerPublicKeyFingerprint`, which you (or the customer) should
  cross-reference against the buyer-signing pubkey currently pinned
  on the appliance vault, which is the actual trust anchor.
* `outputIntegrity` verifies the appliance's signature over the
  plaintext output, against the customer's pinned `opsCmdPubKey`.

Output (in the healthy case):

```
Audit chain for check-myapp-disk (Completed)
Appliance signer fingerprint: 3a8c4b1f...

  ✓ commandApproval [OK]
  ✓ outputIntegrity [OK]
  ✓ outputApproval [OK]

✓ Audit chain verified.
```

Any failure is surfaced as `[FAIL]` with a one-line reason, and the
process exits non-zero. Customers running compliance audits should
script this against their full ops-command history; you should run
it whenever a customer reports "you ran something I didn't approve"
so the disagreement turns into a verifiable record very quickly.

`UNSIGNED_LEGACY` records (commands authored before the signature
chain was required, with no buyer-signed manifest attached) are
reported but do not cause a failure exit by default; pass `--strict`
to fail on those too.

For programmatic use, pass `--output json`. The JSON `checks` array
carries per-check `trustAnchor`, `signerPublicKeyFingerprint`,
`signedBy`, and `signedAt`, which let you distinguish
auto-approved-by-pre-approval from manually-approved commands and
archive the chain independently.

## Related

* [Authoring templates](/fundamentals/operations/templates): the templates this command body comes from.
* [Pre-approvals](/fundamentals/operations/preapproval): skip per-command approval for repeated runs.
* [Security model](/fundamentals/operations/security): the keys, signatures, and storage-side audit guarantees that make the audit chain non-repudiable.
* [Permissions model](/fundamentals/permissions-model): how permission tiers map onto appliance-side roles.
