How observability works
When you deploy applications through Tensor9, each customer appliance runs in isolated infrastructure (their AWS account, Google Cloud project, or private environment). Without observability, you would have no visibility into how these appliances are performing, whether deployments succeeded, or how customers are using your application. Tensor9’s observability system solves this by collecting telemetry from each appliance and forwarding it to your centralized observability platform:Telemetry generation
Resources in the customer appliance (containers, Lambda functions, databases, load balancers) generate logs, metrics, and traces during normal operation.
Collection
Tensor9 uses steady-state permissions to collect telemetry from appliance resources. Collection runs inside the appliance, tapping the telemetry your resources already emit and forwarding it to your control plane (see How telemetry flows for the per-runtime mechanism).
Forwarding
Collected telemetry is forwarded from the customer appliance to your control plane over secure connections. Your control plane then routes each telemetry stream to the observability sink (or sinks) you’ve configured. You decide which sources feed which sinks, per signal (see Telemetry routing).
Observability collection uses steady-state permissions, which are always active and read-only. Customers do not need to approve observability access; it runs continuously to ensure you maintain visibility into appliance health.
What you can observe
Tensor9 collects comprehensive telemetry from customer appliances:Application logs
Logs from your application components running in customer appliances:- Container logs from your workloads in EKS, GKE, AKS, or private Kubernetes, captured through your existing log agent (Datadog, OpenTelemetry, or Loki)
- Function logs: Execution logs from Lambda, Cloud Functions, or Azure Functions
- Application logs: Custom application logs written to CloudWatch, Cloud Logging, or other logging services
t9_appliance_id and t9_customer_name (plus t9_service_name for CloudWatch-sourced logs), allowing you to filter and correlate logs across customers.
Infrastructure metrics
Performance and health metrics from infrastructure resources:- Compute metrics: CPU, memory, network for containers, VMs, or functions
- Database metrics: Connections, queries per second, replication lag, storage utilization
- Storage metrics: Object count, storage used, request rates for S3, GCS, or Azure Blob Storage
- Load balancer metrics: Request count, latency, error rates, healthy/unhealthy targets
Custom metrics
Application-level metrics you instrument in your code:- Business metrics: User signups, API calls, feature usage
- Performance metrics: Request duration, queue depth, cache hit rates
- Error tracking: Exception rates, failed operations, validation errors
Distributed traces
Request traces across your application components:- Cross-service traces: Track requests across microservices, databases, and external APIs
- Performance analysis: Identify slow operations and bottlenecks
- Dependency mapping: Visualize how services communicate within an appliance
Observability sinks
An observability sink is a destination your telemetry is forwarded to. You can configure multiple sinks and route different telemetry to each. For example, application logs can go to Datadog while high-volume infrastructure logs go to CloudWatch. Tensor9 supports these sink types natively:| Sink | Logs | Metrics | Traces | Configuration |
|---|---|---|---|---|
| Datadog | ✓ | ✓ | ✓ | API key and site |
| CloudWatch | ✓ | ✓ | Default credentials or cross-account role | |
| OpenTelemetry (OTLP) (coming soon) | ✓ | ✓ | OTLP endpoint and optional authentication | |
| Loki | ✓ | Endpoint and credentials | ||
| Elasticsearch (coming soon) | ✓ | Cluster endpoint(s) and credentials | ||
| Prometheus Remote Write | ✓ | Endpoint and credentials |
Telemetry routing
By default, each sink receives only the telemetry from its matching source: a Datadog sink receives Datadog telemetry, a CloudWatch sink receives CloudWatch logs, a Loki sink receives Loki logs. This keeps high-volume infrastructure logs (such as Kubernetes or CloudWatch control-plane logs) out of your SaaS sinks, where they would inflate cost, unless you deliberately send them there. When you need a different topology, you control exactly which sources feed which sinks, independently for logs, metrics, and traces. You edit routing visually in the portal’s Routing view; see Route your telemetry.Telemetry sources
Tensor9 recognizes telemetry from these sources in your appliances:| Source | Logs | Metrics | Traces |
|---|---|---|---|
| Datadog (Datadog Agent) | ✓ | ✓ | ✓ |
| OpenTelemetry (OTLP) | ✓ | ✓ | ✓ |
| Loki | ✓ | ||
| Prometheus | ✓ | ||
| CloudWatch | ✓ |
Per-signal routing
Routing is per signal. You can send a source’s logs to one sink and its metrics to another, or fan a single source out to several sinks. For each sink, logs, metrics, and traces are routed independently:- Default (no routes set): the sink receives only its matching source for each signal. OpenTelemetry and Elasticsearch sinks have no matching source (OTLP is vendor-neutral, so it is routed explicitly rather than matched by default), so they receive nothing until you route something to them. Both sink types are coming soon.
- Routed: the sink receives exactly the sources you connect, for that signal.
- Disabled: remove every route for a signal and that signal is no longer delivered to that sink.
Configuring observability
Observability is off by default; you turn it on per appliance, and optionally per resource, in the vendor portal.Configuring Observability
Add sinks, wire up routing, control which appliances and resources are observed, and instrument telemetry in your origin stack.
How the pipeline works
Under the hood, telemetry moves through a fixed pipeline:- Buffered in native cloud logging. Collected logs, metrics, and traces are written to the customer’s native logging service (CloudWatch Logs on AWS, Cloud Logging on Google Cloud, Azure Monitor Logs on Azure), which buffers them and doubles as the customer audit trail.
- Forwarded to your control plane. A forwarder tags each record at the edge with its appliance and customer identity, then sends it on to a stream that you (the vendor) own.
- Pushed to your sinks. A router reads the stream, applies your routing, and pushes each telemetry stream to its configured sink. Sink credentials are applied in your control plane and never leave it; they are never deployed to a customer appliance.
Customer audit trail
Before any telemetry leaves the customer’s environment, it is recorded in their own native log service: CloudWatch Logs on AWS, Cloud Logging on Google Cloud, and Azure Monitor Logs on Azure. The collected telemetry passes through this local record on its way to your control plane, so the customer keeps a complete, independent copy of exactly what was captured and forwarded out of their account. They can audit everything that crosses their boundary. In the future, customers will also be able to redact and filter this telemetry before it is forwarded, giving them direct control over what leaves their environment.Observability across form factors
Observability collection adapts to each appliance’s form factor:| Form Factor | Log Collection | Metrics Collection | Trace Collection |
|---|---|---|---|
| AWS | CloudWatch Logs | CloudWatch Metrics, resource-specific metrics | X-Ray or application instrumentation |
| Google Cloud | Cloud Logging | Cloud Monitoring, resource-specific metrics | Cloud Trace or application instrumentation |
| Azure | Azure Monitor Logs | Azure Monitor Metrics, resource-specific metrics | Application Insights or application instrumentation |
| DigitalOcean | Logs via Fluent Bit/Fluentd | Prometheus metrics | OpenTelemetry Collector |
| Private Kubernetes | Logs via Fluent Bit/Fluentd | Prometheus metrics | OpenTelemetry Collector |
| On-prem | Logs via Fluent Bit/Fluentd | Prometheus metrics | OpenTelemetry Collector |
How telemetry flows
Tensor9 captures the telemetry your application already emits and carries it to your sinks, without new agents or application-code changes. How it taps in depends on the runtime:Kubernetes
Tensor9 deploys a lightweight collection DaemonSet to the cluster and redirects your existing telemetry agents to it. Whatever your workloads already run (the Datadog Agent, an OpenTelemetry Collector, Prometheus, or Loki) keeps running unchanged but sends through the Tensor9 collector, which tags its logs, metrics, and traces with the appliance and customer metadata and forwards them to your control plane. No changes to your workloads.AWS Lambda
Tensor9 injects a Lambda extension into your functions during compilation. The extension intercepts the function’s telemetry (Datadog, OpenTelemetry, and so on), tags it, and forwards it, again without changing your function code. In both cases the capture happens inside the customer’s environment; only the resulting telemetry leaves it.Native cloud logs
Both paths above work by emitting your application’s logs, metrics, and traces into the native cloud log service (CloudWatch Logs on AWS, and the equivalent on other clouds). Tensor9 collects by mirroring that service, so anything written to it flows to your sinks. A useful consequence: the cloud’s own logs are forwarded too, with no extra setup. Control-plane logs (for example, EKS control-plane logs) and VPC flow logs reach your sinks through the same path.Appliance identification
Every forwarded telemetry record is stamped with two Tensor9 identity tags so you can attribute it to an appliance and customer:- t9_appliance_id: Tensor9’s unique identifier for the appliance
- t9_customer_name: Customer that owns the appliance
t9_service_name tag (derived from the log group), while Datadog, Loki, and Prometheus telemetry carry the service in their native tag/label (Datadog service, Kubernetes app/container). Tensor9 doesn’t override those.
These tags let you filter, group, and correlate telemetry across customers and services. They’re distinct from instance_id, the origin-stack variable you tag resources with (see Configure telemetry in your origin stack).
Example: Filtering logs by customer
In Datadog:Unified dashboards
With telemetry from all appliances flowing to your observability sink, you can create unified dashboards that aggregate metrics across customers:- Deployment health: Track successful vs. failed deployments across all appliances
- Performance trends: Compare response times and error rates across customers
- Resource utilization: Monitor database CPU, storage usage, function execution counts
- Version adoption: See which customers are running which versions
t9_appliance_id or t9_customer_name for troubleshooting individual appliances.
Telemetry and customer data
Your responsibility: Tensor9 does not guarantee that your logs do not contain customer data. It is your responsibility as the vendor to ensure that your application does not log sensitive customer data (PII, financial information, proprietary content, or customer business data) that will be forwarded to your observability sink.
- Sanitize logs: Remove or redact sensitive information before logging. Never log request/response payloads containing customer data.
- Use structured logging: Log metadata and identifiers, not full payloads. Log
user_id: 12345instead of the entire user object. - Configure log levels: Use DEBUG/INFO for development, WARN/ERROR for production. Avoid verbose logging that may capture customer data.
- Review what you collect: Audit what data flows to your observability sink. Test your logging to ensure no customer data leaks through.
- Filter at the source: Configure log filters to exclude patterns that may contain sensitive data (credit card numbers, SSNs, API keys).
Observability permissions
Telemetry collection requires steady-state permissions in customer appliances. These permissions are:- Read-only: Cannot modify infrastructure or customer data
- Always active: Observability runs continuously without customer approval
- Scoped to vendor resources: Can only access resources deployed by your application
instance_id.
Alerting and incident response
Once telemetry flows to your observability platform, you can configure alerts that notify your team when issues occur across customer appliances:- Deployment failures: Alert when a deployment to any appliance fails
- High error rates: Notify when error rates exceed thresholds
- Performance degradation: Alert on slow response times or database latency
- Resource exhaustion: Warn when databases approach storage limits
t9_appliance_id and t9_customer_name, allowing you to quickly identify which customer is affected and route incidents to the right team.
Best practices
Never log customer data
Never log customer data
It is your responsibility to ensure your application does not log sensitive customer data (PII, financial information, customer business data). Tensor9 forwards whatever telemetry your application emits; it does not filter or sanitize logs for customer data. Implement log sanitization in your application code, avoid logging request/response payloads, and regularly audit what data flows to your observability sink.
Tag all resources with instance_id
Tag all resources with instance_id
Ensure every resource in your origin stack is tagged with the
instance_id variable. This enables filtering telemetry by appliance and ensures observability permissions are correctly scoped.Related topics
- Permissions Model: Understanding steady-state permissions for observability
- Appliances: Customer environments where telemetry is collected
- Deployments: Tracking deployment success through observability
- Operations: Using observability to inform remote operations