A submitted ops command moves through a state machine on its way from “you want to run this” to “you have the released output.” This page walks your side of that flow: how to submit, how to watch progress, what your customer sees in parallel on theDocumentation Index
Fetch the complete documentation index at: https://docs.tensor9.com/llms.txt
Use this file to discover all available pages before exploring further.
/support/<token> link
they receive, and how to verify the audit chain after the fact.
Submitting a command
| Flag | Purpose |
|---|---|
--appName | The Tensor9 app the command targets. |
--customerName | Which of your customers’ appliances will execute the command. |
--commandName | A memorable identifier (3-64 chars, lowercase + hyphens). You’ll use this to retrieve, cancel, etc. |
| Flag | What it does |
|---|---|
--template | Reference an already-imported template by id. Most common. |
--command | Inline ad-hoc command body. Useful for one-off shell snippets that don’t merit a template. |
| Flag | Purpose |
|---|---|
--vars | Comma-separated KEY=VALUE pairs for template variables (--vars NAMESPACE=prod,DEPLOYMENT=api). See “A note on --vars escaping” below. |
--permissions | ReadOnly (default), ReadWrite, or Admin. Drives the role minted on the appliance for Kubectl-tier commands. |
--reason | Free-text justification. Shown to your customer at approval time and recorded in the audit trail. |
--timeout | How long to wait for customer approval before the command times out. Default 7d. |
--originRsxId | Required only for Kubectl ad-hoc and Kubectl templates: the Terraform resource address of the target cluster (e.g. aws_eks_cluster.production). Ignored for Tf and Script paths. The id is resolved against your latest published release of this app, not against the version currently running on your customer’s appliance; if those differ and the resource was renamed across versions, the cluster lookup may fail. There is no flag today to pin the resolution to a specific release. |
--commandType flag selects between command shapes. It’s almost
always inferred from --template or auto-defaults; only set
explicitly if you’re authoring tooling that needs to be specific:
--commandType value | When the system uses it |
|---|---|
Kubectl | Ad-hoc kubectl invocation (no template). Default for ad-hoc. |
KubectlFromTmpl | Auto-selected when --template resolves to a Kubectl template. |
ScriptFromTmpl | Auto-selected when --template resolves to a Script template. |
TfFromTmpl | Auto-selected when --template resolves to a Terraform template. |
A note on --vars escaping
Today --vars is a single comma-separated string, which means values
cannot contain commas or = characters. This is a known limitation;
support for repeated --var KEY=VALUE flags is planned. Until then,
work around with templates whose variable values are constrained to
simple alphanumerics + path characters.
The action prints the assigned commandName and the initial state,
then returns. The command is now Submitted; your customer’s review
experience begins next.
Lifecycle at a glance
Cancelling) covers the brief window where a cancel request has
landed but the appliance is still tearing down.
Happy path in words
Submitted
The command exists; your customer’s appliance has not picked it up yet.
You see this immediately after
tensor9 ops command create.CmdApproving
Your customer’s appliance has the command in its inbox and is waiting
on a human review decision. Your customer sees the
/support/<token> link and walks the four-step approval UI.Executing
The command body is running inside the appliance’s sandboxed
working directory. The appliance captures stdout / stderr / exit code,
uploads each output stream to your blob store (S3 in your customer’s
account) and stores a small
[blob: ...]\\n<presigned-url> payload
on the command record. That payload is then encrypted with a
per-command key.Executed
Execution finished. The appliance is now waiting for your customer to
review the output and decide whether to release it to you.
Terminal unhappy states
| State | What happened |
|---|---|
CmdRejected | Your customer rejected the command at review (Step 1 of their approval UI). |
OutputRejected | Your customer approved execution, but rejected releasing the output. You never see stdout / stderr. |
ExecutionFailed | The command ran but the appliance reported a non-zero exit code or an internal error (or the staleness recovery fired). Terminal: no transition out, no output release path. The failure stderr is uploaded to your blob store like any other output. |
Cancelled | You explicitly cancelled (tensor9 ops command cancel) before the command was approved. |
Timeout | Your customer never decided within the --timeout window. |
Encryption mechanism
Output is encrypted with a per-command AES-256-GCM key that the appliance generates on-the-fly. The ciphertext carries a SHA-256 fingerprint of the key (in the AAD), so the decrypt path can find the right key in the appliance’s secret store even after a re-execution overwrites the per-scope slot. Keys live in the appliance’s secret store under/t9-private/projection/.../ops-cmd/... and are deleted after
successful release or output rejection.
When you see a “decryption failed” alert from the appliance, the
likely causes are: (a) a re-execution clobbered the scope-keyed
slot before your customer released the previous run’s output, (b)
the secret store is unreachable, (c) your customer rotated keys
mid-flight. The fingerprint addressing is the defense for (a); see
the appliance audit log for the specific failure mode.
Watching progress
retrieve shows the current state, full audit chain (who approved
what, when), and (once Completed) the released stdout / stderr / exit
code. Both commands accept --output json for scripting; pipe into
jq .lifecycle to poll a single state value.
What your customer sees
When you submit, your customer is sent (via your existing notification channel) a unique/support/<token> web link. Clicking it opens a
four-step approval UI:
If your customer hasn’t subscribed a notification channel for
ops-command events, submission silently produces no notification.
The command still appears in your
tensor9 ops command list (state:
Submitted), and your customer would only see it if they happened
to visit the support portal directly. Confirm channel subscription
with each customer at onboarding; otherwise an “unanswered” command
is more likely to mean “your customer doesn’t know about it” than
“your customer is ignoring it.”| Step | What your customer does |
|---|---|
| Review | Reads the template description, declared data_access tags, your reason, and the exact template body and variable values you submitted. Picks Approve, Reject, or close. |
| Approve | Confirms the approval. State advances to CmdApproved; the appliance executes immediately. |
| Execute | Watches a status pane while the appliance runs the command. State stays Executing until the output lands. |
| Release | Reviews the decrypted stdout / stderr / exit code, then signs an Ed25519 release manifest. Picks Release or Reject. |
What “review” actually shows
For Terraform templates, your customer sees the literal HCL source plus the values they’re submitting for eachvariable. They do
not see a tofu plan output: a plan would require evaluating
data sources, which can only run after the appliance is authorized
to do so. Your customer is reviewing “the shape of what will execute”
plus the declared data_access, side_effects, and permission tier,
not a fully-resolved diff.
Things the HCL surface does NOT pre-evaluate for the reviewer:
${var.X}interpolations stay as literal strings in the displayed HCL. Your customer sees${var.MOUNT_PREFIX}in the command body and the submitted value ofMOUNT_PREFIXseparately; they have to substitute mentally at review time. The approval UI shows the submitted variable values next to the HCL.for_eachcardinality is invisible at review time. Afor_each = toset(data.aws_s3_buckets.all.buckets[*].name)does not show whether it will iterate over 3 buckets or 30,000. Bound the potential blast radius viadata_access+side_effectsdeclarations and usedescriptionto explain the cardinality semantics in plain English.local-execheredocs are reviewed as shell. A multi-linecommand = <<-EOT ... EOTis shown verbatim. Customers reviewing akubectl drain ... && kubectl ...heredoc are reviewing a shell program, not a Terraform plan. Keep heredocs short and named in thedescription.null_resource.triggersare not re-evaluated against prior state (there is no prior state; see Authoring templates). Atriggers = { mount_prefix = var.MOUNT_PREFIX }block makes the resource look like it fires only on change, which is misleading. Document the behavior in the template’sdescriptionor omit the triggers block.
description field on
tensor9_command carries a lot of weight. Treat it as the
plain-English equivalent of the HCL: name the exact APIs called, the
expected output shape, the cardinality of any fan-out, and the
intended side effects.
For Script and Kubectl templates the review surface is the literal
script body or kubectl invocation. Same caveats: ${VAR} references
are unsubstituted; the customer reads the script and the variable
values side-by-side.
Trust properties for the release step
Release has two properties to explain to your customer:- The plaintext output passes through the appliance your customer already controls before you see it. Decryption happens on the appliance (which lives in your customer’s cloud account under their IAM); the control plane only ever sees the ciphertext before release. Once your customer signs release, the control plane forwards the plaintext to you.
- The release decision is non-repudiable. Your customer’s signed
release manifest is preserved in the audit chain and can be
verified independently with
tensor9 ops command audit verify.
Resource limits and queueing
Operations is for diagnostics and short-lived interventions, not for bulk data extraction. The appliance enforces a few limits that you should size your templates against:| Limit | Value | Implication |
|---|---|---|
| Concurrent commands per appliance | 10 | Approved commands dispatch onto the appliance’s bounded execution pool (max 10 worker threads). Commands beyond the cap stay in CmdApproved and are picked up on the next polling cycle (~10s) as workers free up. |
| Per-command runtime cap | 10 minutes | A command running past 10 minutes will be killed and surfaced as ExecutionFailed. |
| Stuck-execution recovery | 20 minutes (2 × runtime cap) | A command that’s been in Executing for 20 minutes (e.g. because the appliance restarted mid-run) is auto-transitioned to ExecutionFailed with stderr noting the likely cause. |
| Per-stream output cap | 5 GiB | stdout and stderr each route through your blob store (uploaded by the appliance, fetched by the customer’s release script and your retrieve call via a presigned S3 URL). 5 GiB is the AWS S3 single-PUT ceiling. |
| Children per batch | 50 | See “Batches” below. |
[blob: bucket=..., key=..., size=..., sha256=...]\\n<presigned-url>
payload (encrypted) on the command record. The customer’s release
script fetches the actual bytes, sha256-verifies them, and shows
them in the local preview. Your tensor9 ops command retrieve
returns the same payload; curling the URL gives you the bytes back.
Bulk log dumps (journalctl --since 24h, kubectl logs deployment/...)
flow through without the per-template | tail -c self-capping
templates used to need.
On Kubernetes-form-factor appliances whose blob store does not yet
support presigned URLs (MinIO is in this category as of this writing),
the appliance falls back to an inline 4 MiB cap with a marker like
[stdout truncated - 4 MB cap]; the release script’s preview still
works, but the customer and you see only the first 4 MiB of any
stream over the cap.
If you’re on-call and you see a command stuck in Executing for
more than five minutes, you can either wait for the 20-minute
staleness recovery (automatic) or interrupt it manually with
tensor9 ops command cancel --commandName <name>. Cancelling an
Executing command transitions
it to Cancelling while the appliance tears down; the eventual
terminal state depends on what the appliance was doing at the time.
See “Cancelling” below for the full state-machine view.
Not yet enforced
Two limits ship in the codebase as constants but no enforcement call site exists today. Treat these as documentation-of-intent, not as guarantees:- Submissions per hour: 100 per appliance. Plan around it; do not rely on it. The 101st submission this hour will succeed.
- Cooldown between submissions: 60 seconds per appliance. Same caveat: not enforced today.
Cancelling
| State at cancel time | Outcome |
|---|---|
Submitted, CmdApproving | Transitions directly to Cancelled. |
CmdApproved | Transitions to Cancelled; the appliance never picks the command up to execute. |
Executing | Transitions to Cancelling; the appliance tears down its sandbox while the cancel intent propagates. The eventual terminal state depends on what the appliance was doing (typically Cancelled, but may surface as ExecutionFailed if the process had already produced partial state that needed teardown). |
Executed, OutputApproving, or any terminal state | No-op. The cancel arrives too late; output release proceeds normally. |
tofu apply or kubectl invocation needs a moment to wind
down cleanly. In-flight side effects (a partially-created cloud
resource, a partially-applied K8s manifest) may or may not be
backed out depending on the template; if you cancel a mutating
template mid-execution, expect to inspect your customer’s environment
afterwards.
If you submitted many commands by accident and need to cancel them
all in one shot:
cancel-bulk lists the commands it’s about to cancel and prompts
for confirmation; pass --yes to skip the prompt for scripts. Only
commands still in cancellable states are touched; anything past
CmdApproving is reported as “skipped” so you know what’s still
running.
Batches
When you need to fan a command out across multiple customers or appliances, use the batch surface:batch retrieve
rolls the children up into a single status summary.
A single batch is capped at 50 child commands (one per
appliance). For larger fleets, submit multiple batches with a small
delay between them to avoid a thundering-herd against the
notification path. We are working on a higher cap; let us know what
your steady-state fan-out looks like.
Audit and forensics
Three Ed25519 signatures protect every ops command. Together they form a non-repudiation chain your customer can verify independently:| Signature | Signed by | What it proves |
|---|---|---|
commandApproval | Appliance signing key | ”The command body you submitted, with these specific variable values, was approved by this person at this time.” |
outputIntegrity | Appliance signing key | ”Exactly these stdout / stderr / exitCode bytes came out of the command’s execution, before any encryption or transit.” |
outputApproval | Appliance signing key | ”These specific output bytes were approved for release to you by this person at this time.” |
commandApprovalandoutputApprovalverify the buyer’s signature on the approval / release manifest, using the public key embedded in the manifest itself. The output reports thesignerPublicKeyFingerprint, which you (or the customer) should cross-reference against the buyer-signing pubkey currently pinned on the appliance vault, which is the actual trust anchor.outputIntegrityverifies the appliance’s signature over the plaintext output, against the projection’s pinnedopsCmdPubKey.
[FAIL] with a one-line reason, and the
process exits non-zero. Customers running compliance audits should
script this against their full ops-command history; you should run
it whenever a customer reports “you ran something I didn’t approve”
so the disagreement turns into a verifiable record very quickly.
UNSIGNED_LEGACY records (commands authored before the signature
chain was required, with no buyer-signed manifest attached) are
reported but do not cause a failure exit by default; pass --strict
to fail on those too.
For programmatic use, pass --output json. The JSON checks array
carries per-check trustAnchor, signerPublicKeyFingerprint,
signedBy, and signedAt, which let you distinguish
auto-approved-by-pre-approval from manually-approved commands and
archive the chain independently.
Related
- Authoring templates: the templates this command body comes from.
- Pre-approvals: skip per-command approval for repeated runs.
- Security model: the keys, signatures, and storage-side audit guarantees that make the audit chain non-repudiable.
- Permissions model: how permission tiers map onto appliance-side roles.