Observability - Tensor9

Observability in Tensor9 enables you to monitor all your customer appliances from a single observability platform. Logs, metrics, and traces flow from each customer’s infrastructure to your control plane, which then routes each telemetry stream to the observability sinks you’ve configured, giving you unified visibility across all deployments regardless of where they run.

How observability works

When you deploy applications through Tensor9, each customer appliance runs in isolated infrastructure (their AWS account, Google Cloud project, or private environment). Without observability, you would have no visibility into how these appliances are performing, whether deployments succeeded, or how customers are using your application. Tensor9’s observability system solves this by collecting telemetry from each appliance and forwarding it to your centralized observability platform:

Telemetry generation

Resources in the customer appliance (containers, Lambda functions, databases, load balancers) generate logs, metrics, and traces during normal operation.

Collection

Tensor9 uses steady-state permissions to collect telemetry from appliance resources. Collection runs inside the appliance, tapping the telemetry your resources already emit and forwarding it to your control plane (see How telemetry flows for the per-runtime mechanism).

Forwarding

Collected telemetry is forwarded from the customer appliance to your control plane over secure connections. Your control plane then routes each telemetry stream to the observability sink (or sinks) you’ve configured. You decide which sources feed which sinks, per signal (see Telemetry routing).

Analysis

Your team monitors all customer appliances from your observability platform. You can track deployment health, investigate incidents, analyze usage patterns, and troubleshoot issues across all customers from one place.

Observability collection uses steady-state permissions, which are always active and read-only. Customers do not need to approve observability access; it runs continuously to ensure you maintain visibility into appliance health.

What you can observe

Tensor9 collects comprehensive telemetry from customer appliances:

Application logs

Logs from your application components running in customer appliances:

Container logs from your workloads in EKS, GKE, AKS, or private Kubernetes, captured through your existing log agent (Datadog, OpenTelemetry, or Loki)
Function logs: Execution logs from Lambda, Cloud Functions, or Azure Functions
Application logs: Custom application logs written to CloudWatch, Cloud Logging, or other logging services

Logs include the t9_appliance_id and t9_customer_name (plus t9_service_name for CloudWatch-sourced logs), allowing you to filter and correlate logs across customers.

Infrastructure metrics

Performance and health metrics from infrastructure resources:

Compute metrics: CPU, memory, network for containers, VMs, or functions
Database metrics: Connections, queries per second, replication lag, storage utilization
Storage metrics: Object count, storage used, request rates for S3, GCS, or Azure Blob Storage
Load balancer metrics: Request count, latency, error rates, healthy/unhealthy targets

Custom metrics

Application-level metrics you instrument in your code:

Business metrics: User signups, API calls, feature usage
Performance metrics: Request duration, queue depth, cache hit rates
Error tracking: Exception rates, failed operations, validation errors

Distributed traces

Request traces across your application components:

Cross-service traces: Track requests across microservices, databases, and external APIs
Performance analysis: Identify slow operations and bottlenecks
Dependency mapping: Visualize how services communicate within an appliance

Observability sinks

An observability sink is a destination your telemetry is forwarded to. You can configure multiple sinks and route different telemetry to each. For example, application logs can go to Datadog while high-volume infrastructure logs go to CloudWatch. Tensor9 supports these sink types natively:

Sink	Logs	Metrics	Traces	Configuration
Datadog	✓	✓	✓	API key and site
CloudWatch	✓	✓		Default credentials or cross-account role
OpenTelemetry (OTLP) (coming soon)	✓	✓		OTLP endpoint and optional authentication
Loki	✓			Endpoint and credentials
Elasticsearch (coming soon)	✓			Cluster endpoint(s) and credentials
Prometheus Remote Write		✓		Endpoint and credentials

Any backend that speaks OTLP (New Relic, Sumo Logic, Honeycomb, Grafana Cloud, and most modern observability platforms) can be used through the OpenTelemetry sink. Grafana stacks are reached directly through the Loki (logs) and Prometheus Remote Write (metrics) sinks. You configure sinks in your control plane, and Tensor9 applies the configuration to all of that app’s appliances automatically.

Telemetry routing

By default, each sink receives only the telemetry from its matching source: a Datadog sink receives Datadog telemetry, a CloudWatch sink receives CloudWatch logs, a Loki sink receives Loki logs. This keeps high-volume infrastructure logs (such as Kubernetes or CloudWatch control-plane logs) out of your SaaS sinks, where they would inflate cost, unless you deliberately send them there. When you need a different topology, you control exactly which sources feed which sinks, independently for logs, metrics, and traces. You edit routing visually in the portal’s Routing view; see Route your telemetry.

Telemetry sources

Tensor9 recognizes telemetry from these sources in your appliances:

Source	Logs	Metrics	Traces
Datadog (Datadog Agent)	✓	✓	✓
OpenTelemetry (OTLP)	✓	✓	✓
Loki	✓
Prometheus		✓
CloudWatch	✓

Per-signal routing

Routing is per signal. You can send a source’s logs to one sink and its metrics to another, or fan a single source out to several sinks. For each sink, logs, metrics, and traces are routed independently:

Default (no routes set): the sink receives only its matching source for each signal. OpenTelemetry and Elasticsearch sinks have no matching source (OTLP is vendor-neutral, so it is routed explicitly rather than matched by default), so they receive nothing until you route something to them. Both sink types are coming soon.
Routed: the sink receives exactly the sources you connect, for that signal.
Disabled: remove every route for a signal and that signal is no longer delivered to that sink.

You can configure and manage observability sink settings from the Vendor Portal under Observability, or via the CLI as shown below.

Configuring observability

Observability is off by default; you turn it on per appliance, and optionally per resource, in the vendor portal.

Configuring Observability

Add sinks, wire up routing, control which appliances and resources are observed, and instrument telemetry in your origin stack.

How the pipeline works

Under the hood, telemetry moves through a fixed pipeline:

Observability pipeline: native cloud logging, to a forwarder, to a vendor-owned stream, to a router, to your sinks

Buffered in native cloud logging. Collected logs, metrics, and traces are written to the customer’s native logging service (CloudWatch Logs on AWS, Cloud Logging on Google Cloud, Azure Monitor Logs on Azure), which buffers them and doubles as the customer audit trail.
Forwarded to your control plane. A forwarder tags each record at the edge with its appliance and customer identity, then sends it on to a stream that you (the vendor) own.
Pushed to your sinks. A router reads the stream, applies your routing, and pushes each telemetry stream to its configured sink. Sink credentials are applied in your control plane and never leave it; they are never deployed to a customer appliance.

The observability pipeline scales automatically with the volume of telemetry, so it absorbs traffic spikes without any tuning on your part.

Customer audit trail

Before any telemetry leaves the customer’s environment, it is recorded in their own native log service: CloudWatch Logs on AWS, Cloud Logging on Google Cloud, and Azure Monitor Logs on Azure. The collected telemetry passes through this local record on its way to your control plane, so the customer keeps a complete, independent copy of exactly what was captured and forwarded out of their account. They can audit everything that crosses their boundary. In the future, customers will also be able to redact and filter this telemetry before it is forwarded, giving them direct control over what leaves their environment.

Observability across form factors

Observability collection adapts to each appliance’s form factor:

Form Factor	Log Collection	Metrics Collection	Trace Collection
AWS	CloudWatch Logs	CloudWatch Metrics, resource-specific metrics	X-Ray or application instrumentation
Google Cloud	Cloud Logging	Cloud Monitoring, resource-specific metrics	Cloud Trace or application instrumentation
Azure	Azure Monitor Logs	Azure Monitor Metrics, resource-specific metrics	Application Insights or application instrumentation
DigitalOcean	Logs via Fluent Bit/Fluentd	Prometheus metrics	OpenTelemetry Collector
Private Kubernetes	Logs via Fluent Bit/Fluentd	Prometheus metrics	OpenTelemetry Collector
On-prem	Logs via Fluent Bit/Fluentd	Prometheus metrics	OpenTelemetry Collector

Tensor9 provisions the appropriate collection infrastructure for each environment during compilation.

How telemetry flows

Tensor9 captures the telemetry your application already emits and carries it to your sinks, without new agents or application-code changes. How it taps in depends on the runtime:

Kubernetes

Tensor9 deploys a lightweight collection DaemonSet to the cluster and redirects your existing telemetry agents to it. Whatever your workloads already run (the Datadog Agent, an OpenTelemetry Collector, Prometheus, or Loki) keeps running unchanged but sends through the Tensor9 collector, which tags its logs, metrics, and traces with the appliance and customer metadata and forwards them to your control plane. No changes to your workloads.

AWS Lambda

Tensor9 injects a Lambda extension into your functions during compilation. The extension intercepts the function’s telemetry (Datadog, OpenTelemetry, and so on), tags it, and forwards it, again without changing your function code. In both cases the capture happens inside the customer’s environment; only the resulting telemetry leaves it.

Native cloud logs

Both paths above work by emitting your application’s logs, metrics, and traces into the native cloud log service (CloudWatch Logs on AWS, and the equivalent on other clouds). Tensor9 collects by mirroring that service, so anything written to it flows to your sinks. A useful consequence: the cloud’s own logs are forwarded too, with no extra setup. Control-plane logs (for example, EKS control-plane logs) and VPC flow logs reach your sinks through the same path.

Appliance identification

Every forwarded telemetry record is stamped with two Tensor9 identity tags so you can attribute it to an appliance and customer:

t9_appliance_id: Tensor9’s unique identifier for the appliance
t9_customer_name: Customer that owns the appliance

The emitting service is identified by the source’s own convention: CloudWatch-sourced telemetry adds a t9_service_name tag (derived from the log group), while Datadog, Loki, and Prometheus telemetry carry the service in their native tag/label (Datadog service, Kubernetes app/container). Tensor9 doesn’t override those. These tags let you filter, group, and correlate telemetry across customers and services. They’re distinct from instance_id, the origin-stack variable you tag resources with (see Configure telemetry in your origin stack).

Example: Filtering logs by customer

In Datadog:

service:myapp-api t9_customer_name:acme-corp

In Grafana Loki:

{t9_customer_name="acme-corp", app="myapp-api"}

Unified dashboards

With telemetry from all appliances flowing to your observability sink, you can create unified dashboards that aggregate metrics across customers:

Deployment health: Track successful vs. failed deployments across all appliances
Performance trends: Compare response times and error rates across customers
Resource utilization: Monitor database CPU, storage usage, function execution counts
Version adoption: See which customers are running which versions

You can also create customer-specific dashboards filtered to a single t9_appliance_id or t9_customer_name for troubleshooting individual appliances.

Telemetry and customer data

Your responsibility: Tensor9 does not guarantee that your logs do not contain customer data. It is your responsibility as the vendor to ensure that your application does not log sensitive customer data (PII, financial information, proprietary content, or customer business data) that will be forwarded to your observability sink.

Observability telemetry should contain application logs and infrastructure metrics, not customer business data. While logs may include operational metadata (timestamps, user IDs, API endpoints, error codes), they should never include sensitive customer information. You must take precautions to prevent customer data from appearing in logs:

Sanitize logs: Remove or redact sensitive information before logging. Never log request/response payloads containing customer data.
Use structured logging: Log metadata and identifiers, not full payloads. Log user_id: 12345 instead of the entire user object.
Configure log levels: Use DEBUG/INFO for development, WARN/ERROR for production. Avoid verbose logging that may capture customer data.
Review what you collect: Audit what data flows to your observability sink. Test your logging to ensure no customer data leaks through.
Filter at the source: Configure log filters to exclude patterns that may contain sensitive data (credit card numbers, SSNs, API keys).

Tensor9 forwards whatever telemetry your application emits; it is your responsibility to ensure that telemetry does not contain customer data.

Observability permissions

Telemetry collection requires steady-state permissions in customer appliances. These permissions are:

Read-only: Cannot modify infrastructure or customer data
Always active: Observability runs continuously without customer approval
Scoped to vendor resources: Can only access resources deployed by your application

Example steady-state role for observability in AWS:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricData",
        "cloudwatch:ListMetrics"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/tensor9:instance": "${var.instance_id}"
        }
      }
    }
  ]
}

This role allows reading metrics only from resources tagged with the appliance’s instance_id.

Alerting and incident response

Once telemetry flows to your observability platform, you can configure alerts that notify your team when issues occur across customer appliances:

Deployment failures: Alert when a deployment to any appliance fails
High error rates: Notify when error rates exceed thresholds
Performance degradation: Alert on slow response times or database latency
Resource exhaustion: Warn when databases approach storage limits

Alerts can include the t9_appliance_id and t9_customer_name, allowing you to quickly identify which customer is affected and route incidents to the right team.

Best practices

Never log customer data

It is your responsibility to ensure your application does not log sensitive customer data (PII, financial information, customer business data). Tensor9 forwards whatever telemetry your application emits; it does not filter or sanitize logs for customer data. Implement log sanitization in your application code, avoid logging request/response payloads, and regularly audit what data flows to your observability sink.

Tag all resources with instance_id

Ensure every resource in your origin stack is tagged with the instance_id variable. This enables filtering telemetry by appliance and ensures observability permissions are correctly scoped.

Permissions Model: Understanding steady-state permissions for observability
Appliances: Customer environments where telemetry is collected
Deployments: Tracking deployment success through observability
Operations: Using observability to inform remote operations

​How observability works

​What you can observe

​Application logs

​Infrastructure metrics

​Custom metrics

​Distributed traces

​Observability sinks

​Telemetry routing

​Telemetry sources

​Per-signal routing

​Configuring observability

Configuring Observability

​How the pipeline works

​Customer audit trail

​Observability across form factors

​How telemetry flows

​Kubernetes

​AWS Lambda

​Native cloud logs

​Appliance identification

​Example: Filtering logs by customer

​Unified dashboards

​Telemetry and customer data

​Observability permissions

​Alerting and incident response

​Best practices

​Related topics

How observability works

What you can observe

Application logs

Infrastructure metrics

Custom metrics

Distributed traces

Observability sinks

Telemetry routing

Telemetry sources

Per-signal routing

Configuring observability

How the pipeline works

Customer audit trail

Observability across form factors

How telemetry flows

Kubernetes

AWS Lambda

Native cloud logs

Appliance identification

Example: Filtering logs by customer

Unified dashboards

Telemetry and customer data

Observability permissions

Alerting and incident response

Best practices

Related topics