Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tensor9.com/llms.txt

Use this file to discover all available pages before exploring further.

A submitted ops command moves through a state machine on its way from “you want to run this” to “you have the released output.” This page walks your side of that flow: how to submit, how to watch progress, what your customer sees in parallel on the /support/<token> link they receive, and how to verify the audit chain after the fact. How a command moves: you submit, your customer approves, the output is released to you.

Submitting a command

tensor9 ops command create \
  --appName my-app \
  --customerName acme-corp \
  --template linux-disk-usage \
  --vars MOUNT_PREFIX=/var/lib/myapp \
  --commandName check-myapp-disk \
  --reason "investigating disk pressure on tenant alerts"
Required flags:
FlagPurpose
--appNameThe Tensor9 app the command targets.
--customerNameWhich of your customers’ appliances will execute the command.
--commandNameA memorable identifier (3-64 chars, lowercase + hyphens). You’ll use this to retrieve, cancel, etc.
One of the following picks the body of the command:
FlagWhat it does
--templateReference an already-imported template by id. Most common.
--commandInline ad-hoc command body. Useful for one-off shell snippets that don’t merit a template.
Common modifier flags:
FlagPurpose
--varsComma-separated KEY=VALUE pairs for template variables (--vars NAMESPACE=prod,DEPLOYMENT=api). See “A note on --vars escaping” below.
--permissionsReadOnly (default), ReadWrite, or Admin. Drives the role minted on the appliance for Kubectl-tier commands.
--reasonFree-text justification. Shown to your customer at approval time and recorded in the audit trail.
--timeoutHow long to wait for customer approval before the command times out. Default 7d.
--originRsxIdRequired only for Kubectl ad-hoc and Kubectl templates: the Terraform resource address of the target cluster (e.g. aws_eks_cluster.production). Ignored for Tf and Script paths. The id is resolved against your latest published release of this app, not against the version currently running on your customer’s appliance; if those differ and the resource was renamed across versions, the cluster lookup may fail. There is no flag today to pin the resolution to a specific release.
The --commandType flag selects between command shapes. It’s almost always inferred from --template or auto-defaults; only set explicitly if you’re authoring tooling that needs to be specific:
--commandType valueWhen the system uses it
KubectlAd-hoc kubectl invocation (no template). Default for ad-hoc.
KubectlFromTmplAuto-selected when --template resolves to a Kubectl template.
ScriptFromTmplAuto-selected when --template resolves to a Script template.
TfFromTmplAuto-selected when --template resolves to a Terraform template.

A note on --vars escaping

Today --vars is a single comma-separated string, which means values cannot contain commas or = characters. This is a known limitation; support for repeated --var KEY=VALUE flags is planned. Until then, work around with templates whose variable values are constrained to simple alphanumerics + path characters. The action prints the assigned commandName and the initial state, then returns. The command is now Submitted; your customer’s review experience begins next.

Lifecycle at a glance

Ops command lifecycle: three lanes (Command Approval, Execution, Output Release) with happy-path states across the top, terminal unhappy states across the bottom, and an intermediate Cancelling state reachable from Submitted, CmdApproving, or Executing. The happy path has six in-flight states, two terminal happy states, and five terminal unhappy states. One intermediate state (Cancelling) covers the brief window where a cancel request has landed but the appliance is still tearing down.

Happy path in words

1

Submitted

The command exists; your customer’s appliance has not picked it up yet. You see this immediately after tensor9 ops command create.
2

CmdApproving

Your customer’s appliance has the command in its inbox and is waiting on a human review decision. Your customer sees the /support/<token> link and walks the four-step approval UI.
3

CmdApproved

Your customer approved execution and the appliance is preparing to run the command.
4

Executing

The command body is running inside the appliance’s sandboxed working directory. The appliance captures stdout / stderr / exit code, uploads each output stream to your blob store (S3 in your customer’s account) and stores a small [blob: ...]\\n<presigned-url> payload on the command record. That payload is then encrypted with a per-command key.
5

Executed

Execution finished. The appliance is now waiting for your customer to review the output and decide whether to release it to you.
6

OutputApproved

Your customer signed an Ed25519 release manifest. The control plane surfaces the decrypted blob-payload to you; calling tensor9 ops command retrieve returns the payload, and you fetch the actual bytes by curling the presigned URL it carries. State advances to Completed next.

Terminal unhappy states

StateWhat happened
CmdRejectedYour customer rejected the command at review (Step 1 of their approval UI).
OutputRejectedYour customer approved execution, but rejected releasing the output. You never see stdout / stderr.
ExecutionFailedThe command ran but the appliance reported a non-zero exit code or an internal error (or the staleness recovery fired). Terminal: no transition out, no output release path. The failure stderr is uploaded to your blob store like any other output.
CancelledYou explicitly cancelled (tensor9 ops command cancel) before the command was approved.
TimeoutYour customer never decided within the --timeout window.

Encryption mechanism

Output is encrypted with a per-command AES-256-GCM key that the appliance generates on-the-fly. The ciphertext carries a SHA-256 fingerprint of the key (in the AAD), so the decrypt path can find the right key in the appliance’s secret store even after a re-execution overwrites the per-scope slot. Keys live in the appliance’s secret store under /t9-private/projection/.../ops-cmd/... and are deleted after successful release or output rejection. When you see a “decryption failed” alert from the appliance, the likely causes are: (a) a re-execution clobbered the scope-keyed slot before your customer released the previous run’s output, (b) the secret store is unreachable, (c) your customer rotated keys mid-flight. The fingerprint addressing is the defense for (a); see the appliance audit log for the specific failure mode.

Watching progress

# Snapshot of every command across this app
tensor9 ops command list --appName my-app

# Same, including completed and rejected history
tensor9 ops command list --appName my-app --history

# Drill into one command
tensor9 ops command retrieve --appName my-app --commandName check-myapp-disk
retrieve shows the current state, full audit chain (who approved what, when), and (once Completed) the released stdout / stderr / exit code. Both commands accept --output json for scripting; pipe into jq .lifecycle to poll a single state value.

What your customer sees

When you submit, your customer is sent (via your existing notification channel) a unique /support/<token> web link. Clicking it opens a four-step approval UI:
If your customer hasn’t subscribed a notification channel for ops-command events, submission silently produces no notification. The command still appears in your tensor9 ops command list (state: Submitted), and your customer would only see it if they happened to visit the support portal directly. Confirm channel subscription with each customer at onboarding; otherwise an “unanswered” command is more likely to mean “your customer doesn’t know about it” than “your customer is ignoring it.”
StepWhat your customer does
ReviewReads the template description, declared data_access tags, your reason, and the exact template body and variable values you submitted. Picks Approve, Reject, or close.
ApproveConfirms the approval. State advances to CmdApproved; the appliance executes immediately.
ExecuteWatches a status pane while the appliance runs the command. State stays Executing until the output lands.
ReleaseReviews the decrypted stdout / stderr / exit code, then signs an Ed25519 release manifest. Picks Release or Reject.

What “review” actually shows

For Terraform templates, your customer sees the literal HCL source plus the values they’re submitting for each variable. They do not see a tofu plan output: a plan would require evaluating data sources, which can only run after the appliance is authorized to do so. Your customer is reviewing “the shape of what will execute” plus the declared data_access, side_effects, and permission tier, not a fully-resolved diff. Things the HCL surface does NOT pre-evaluate for the reviewer:
  • ${var.X} interpolations stay as literal strings in the displayed HCL. Your customer sees ${var.MOUNT_PREFIX} in the command body and the submitted value of MOUNT_PREFIX separately; they have to substitute mentally at review time. The approval UI shows the submitted variable values next to the HCL.
  • for_each cardinality is invisible at review time. A for_each = toset(data.aws_s3_buckets.all.buckets[*].name) does not show whether it will iterate over 3 buckets or 30,000. Bound the potential blast radius via data_access + side_effects declarations and use description to explain the cardinality semantics in plain English.
  • local-exec heredocs are reviewed as shell. A multi-line command = <<-EOT ... EOT is shown verbatim. Customers reviewing a kubectl drain ... && kubectl ... heredoc are reviewing a shell program, not a Terraform plan. Keep heredocs short and named in the description.
  • null_resource.triggers are not re-evaluated against prior state (there is no prior state; see Authoring templates). A triggers = { mount_prefix = var.MOUNT_PREFIX } block makes the resource look like it fires only on change, which is misleading. Document the behavior in the template’s description or omit the triggers block.
Because the review surface is the HCL, the description field on tensor9_command carries a lot of weight. Treat it as the plain-English equivalent of the HCL: name the exact APIs called, the expected output shape, the cardinality of any fan-out, and the intended side effects. For Script and Kubectl templates the review surface is the literal script body or kubectl invocation. Same caveats: ${VAR} references are unsubstituted; the customer reads the script and the variable values side-by-side.

Trust properties for the release step

Release has two properties to explain to your customer:
  • The plaintext output passes through the appliance your customer already controls before you see it. Decryption happens on the appliance (which lives in your customer’s cloud account under their IAM); the control plane only ever sees the ciphertext before release. Once your customer signs release, the control plane forwards the plaintext to you.
  • The release decision is non-repudiable. Your customer’s signed release manifest is preserved in the audit chain and can be verified independently with tensor9 ops command audit verify.
Pre-approved templates skip the per-command Review and Approve steps; see Pre-approvals.

Resource limits and queueing

Operations is for diagnostics and short-lived interventions, not for bulk data extraction. The appliance enforces a few limits that you should size your templates against:
LimitValueImplication
Concurrent commands per appliance10Approved commands dispatch onto the appliance’s bounded execution pool (max 10 worker threads). Commands beyond the cap stay in CmdApproved and are picked up on the next polling cycle (~10s) as workers free up.
Per-command runtime cap10 minutesA command running past 10 minutes will be killed and surfaced as ExecutionFailed.
Stuck-execution recovery20 minutes (2 × runtime cap)A command that’s been in Executing for 20 minutes (e.g. because the appliance restarted mid-run) is auto-transitioned to ExecutionFailed with stderr noting the likely cause.
Per-stream output cap5 GiBstdout and stderr each route through your blob store (uploaded by the appliance, fetched by the customer’s release script and your retrieve call via a presigned S3 URL). 5 GiB is the AWS S3 single-PUT ceiling.
Children per batch50See “Batches” below.
The 5 GiB cap is high enough that you generally don’t think about it. The appliance uploads each stream to your S3 bucket and stores only a small [blob: bucket=..., key=..., size=..., sha256=...]\\n<presigned-url> payload (encrypted) on the command record. The customer’s release script fetches the actual bytes, sha256-verifies them, and shows them in the local preview. Your tensor9 ops command retrieve returns the same payload; curling the URL gives you the bytes back. Bulk log dumps (journalctl --since 24h, kubectl logs deployment/...) flow through without the per-template | tail -c self-capping templates used to need. On Kubernetes-form-factor appliances whose blob store does not yet support presigned URLs (MinIO is in this category as of this writing), the appliance falls back to an inline 4 MiB cap with a marker like [stdout truncated - 4 MB cap]; the release script’s preview still works, but the customer and you see only the first 4 MiB of any stream over the cap. If you’re on-call and you see a command stuck in Executing for more than five minutes, you can either wait for the 20-minute staleness recovery (automatic) or interrupt it manually with tensor9 ops command cancel --commandName <name>. Cancelling an Executing command transitions it to Cancelling while the appliance tears down; the eventual terminal state depends on what the appliance was doing at the time. See “Cancelling” below for the full state-machine view.

Not yet enforced

Two limits ship in the codebase as constants but no enforcement call site exists today. Treat these as documentation-of-intent, not as guarantees:
  • Submissions per hour: 100 per appliance. Plan around it; do not rely on it. The 101st submission this hour will succeed.
  • Cooldown between submissions: 60 seconds per appliance. Same caveat: not enforced today.
Enforcement will land in a future release; until then, rate-limiting in your own scripts is the only real bound.

Cancelling

tensor9 ops command cancel --commandName check-myapp-disk
What happens depends on the command’s current state:
State at cancel timeOutcome
Submitted, CmdApprovingTransitions directly to Cancelled.
CmdApprovedTransitions to Cancelled; the appliance never picks the command up to execute.
ExecutingTransitions to Cancelling; the appliance tears down its sandbox while the cancel intent propagates. The eventual terminal state depends on what the appliance was doing (typically Cancelled, but may surface as ExecutionFailed if the process had already produced partial state that needed teardown).
Executed, OutputApproving, or any terminal stateNo-op. The cancel arrives too late; output release proceeds normally.
The Cancelling intermediate state exists specifically because a running tofu apply or kubectl invocation needs a moment to wind down cleanly. In-flight side effects (a partially-created cloud resource, a partially-applied K8s manifest) may or may not be backed out depending on the template; if you cancel a mutating template mid-execution, expect to inspect your customer’s environment afterwards. If you submitted many commands by accident and need to cancel them all in one shot:
# Cancel everything against this customer regardless of when submitted
tensor9 ops command batch cancel-bulk \
  --appName my-app \
  --customerName acme-corp \
  --yes

# Or scope to a time window
tensor9 ops command batch cancel-bulk \
  --appName my-app \
  --submittedAfter "2026-05-09T13:55:00Z"
cancel-bulk lists the commands it’s about to cancel and prompts for confirmation; pass --yes to skip the prompt for scripts. Only commands still in cancellable states are touched; anything past CmdApproving is reported as “skipped” so you know what’s still running.

Batches

When you need to fan a command out across multiple customers or appliances, use the batch surface:
tensor9 ops command batch submit --appName my-app --file ./batch-spec.json
tensor9 ops command batch list --appName my-app
tensor9 ops command batch retrieve --appName my-app --batchId <id>
tensor9 ops command batch cancel --appName my-app --batchId <id>
A batch creates one underlying ops command per appliance. The lifecycle tracks each child command independently, so different customers approving at different times is normal. batch retrieve rolls the children up into a single status summary. A single batch is capped at 50 child commands (one per appliance). For larger fleets, submit multiple batches with a small delay between them to avoid a thundering-herd against the notification path. We are working on a higher cap; let us know what your steady-state fan-out looks like.

Audit and forensics

Three Ed25519 signatures protect every ops command. Together they form a non-repudiation chain your customer can verify independently:
SignatureSigned byWhat it proves
commandApprovalAppliance signing key”The command body you submitted, with these specific variable values, was approved by this person at this time.”
outputIntegrityAppliance signing key”Exactly these stdout / stderr / exitCode bytes came out of the command’s execution, before any encryption or transit.”
outputApprovalAppliance signing key”These specific output bytes were approved for release to you by this person at this time.”
The signatures are stored on the command’s audit record and survive the encrypt/decrypt cycle (the integrity signature is computed over plaintext before encryption, then preserved in the ciphertext metadata). Either you or your customer can verify the full chain on a specific command:
tensor9 ops command audit verify \
  --appName my-app \
  --commandName check-myapp-disk
The action retrieves the command record + the appliance’s pinned signing public key, reconstructs the canonical signed-data for each of the three signatures, and verifies. The three checks use two different trust anchors:
  • commandApproval and outputApproval verify the buyer’s signature on the approval / release manifest, using the public key embedded in the manifest itself. The output reports the signerPublicKeyFingerprint, which you (or the customer) should cross-reference against the buyer-signing pubkey currently pinned on the appliance vault, which is the actual trust anchor.
  • outputIntegrity verifies the appliance’s signature over the plaintext output, against the projection’s pinned opsCmdPubKey.
Output (in the healthy case):
Audit chain for check-myapp-disk (Completed)
Appliance signer fingerprint: 3a8c4b1f...

  ✓ commandApproval [OK]
  ✓ outputIntegrity [OK]
  ✓ outputApproval [OK]

✓ Audit chain verified.
Any failure is surfaced as [FAIL] with a one-line reason, and the process exits non-zero. Customers running compliance audits should script this against their full ops-command history; you should run it whenever a customer reports “you ran something I didn’t approve” so the disagreement turns into a verifiable record very quickly. UNSIGNED_LEGACY records (commands authored before the signature chain was required, with no buyer-signed manifest attached) are reported but do not cause a failure exit by default; pass --strict to fail on those too. For programmatic use, pass --output json. The JSON checks array carries per-check trustAnchor, signerPublicKeyFingerprint, signedBy, and signedAt, which let you distinguish auto-approved-by-pre-approval from manually-approved commands and archive the chain independently.