> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tensor9.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Troubleshooting

Common issues and how to resolve them.

## General Issues

These apply regardless of environment.

### Terraform Apply Fails

**Common causes:**

* **Permission denied** - Verify your cloud credentials have the required permissions (see [Prerequisites](/customer/getting-started/prerequisites))
* **Resource quota exceeded** - Check your cloud account's service limits
* **State conflict** - If you re-run `terraform apply` after a partial failure, Terraform should pick up where it left off

### DNS Not Resolving

**Symptom:** Domain configured but not resolving.

**Check:**

```bash theme={null}
dig <your-domain>
nslookup <your-domain>
```

**Common causes:**

* **Propagation delay** - DNS changes can take up to 48 hours to propagate (usually much faster)
* **Incorrect DNS provider credentials** - Verify your Route 53 or Cloudflare credentials are correct
* **Zone delegation** - For custom domains, verify you've delegated to the correct nameservers

### Can't Reach the Application

<Steps>
  <Step title="Check DNS">
    Verify DNS is resolving: `dig <your-domain>`
  </Step>

  <Step title="Check load balancer">
    Check that the load balancer / ingress is healthy.
  </Step>

  <Step title="Check TLS">
    Verify TLS certificates are valid: `curl -v https://<your-domain>`
  </Step>
</Steps>

## Environment-Specific Issues

<Tabs>
  <Tab title="Kubernetes">
    ### Controller Pod Won't Start

    **Symptom:** Pod stuck in `Pending`, `CrashLoopBackOff`, or `ImagePullBackOff`.

    **Check pod status:**

    ```bash theme={null}
    kubectl describe pod -n <namespace> <pod-name>
    ```

    **Common causes:**

    | Status             | Likely Cause                              | Fix                                                               |
    | ------------------ | ----------------------------------------- | ----------------------------------------------------------------- |
    | `Pending`          | Insufficient CPU or memory on nodes       | Scale up your cluster or free resources: `kubectl describe nodes` |
    | `CrashLoopBackOff` | Configuration error or missing dependency | Check logs: `kubectl logs -n <namespace> <pod-name>`              |
    | `ImagePullBackOff` | Container registry access issue           | Verify your cluster can pull images from the internet             |

    ### Setup Wizard Shows "Waiting for Controller"

    The controller hasn't connected to our systems yet. This usually resolves in 2-5 minutes.

    **If it persists:**

    <Steps>
      <Step title="Check pod">
        Verify the pod is running: `kubectl get pods -n <namespace>`
      </Step>

      <Step title="Check logs">
        Check logs for connection errors: `kubectl logs -n <namespace> <pod-name>`
      </Step>

      <Step title="Check network">
        Verify outbound internet access from the pod's namespace.
      </Step>

      <Step title="Check policies">
        Check if network policies are blocking egress on port 443.
      </Step>
    </Steps>

    ### Application Not Responding

    <Steps>
      <Step title="Controller status">
        Check controller status: `kubectl get pods -n <namespace>`
      </Step>

      <Step title="App pod status">
        Check application pod status: `kubectl get pods -n <namespace> -l app=<app-label>`
      </Step>

      <Step title="App logs">
        View application logs: `kubectl logs -n <namespace> <app-pod-name>`
      </Step>

      <Step title="Events">
        Check events: `kubectl get events -n <namespace> --sort-by='.lastTimestamp'`
      </Step>

      <Step title="Contact us">
        Contact us with the error details.
      </Step>
    </Steps>

    ### High Resource Usage

    ```bash theme={null}
    kubectl top pods -n <namespace>
    kubectl top nodes
    ```

    If the controller or application is consuming more resources than expected, contact us. It may indicate a configuration issue or a need to scale.
  </Tab>

  <Tab title="AWS">
    ### Controller Instance Won't Start

    **Symptom:** EC2 instance in `stopped` or `terminated` state.

    **Check:**

    ```bash theme={null}
    aws ec2 describe-instance-status --instance-ids <instance-id>
    ```

    **Common causes:**

    * **Insufficient instance quota** - Request a limit increase in your AWS account
    * **IAM role issues** - Verify the IAM role from the infrastructure template was created successfully
    * **Subnet issues** - Ensure the private subnet has a route to a NAT gateway for outbound access

    ### Setup Wizard Shows "Waiting for Controller"

    The controller hasn't connected to our systems yet. This usually resolves in 2-5 minutes.

    **If it persists:**

    <Steps>
      <Step title="Check instance">
        Verify the instance is running: `aws ec2 describe-instances --instance-ids <instance-id>`
      </Step>

      <Step title="Check logs">
        Check the system log: AWS console - EC2 - Instances - your instance - Monitor and troubleshoot - Get system log.
      </Step>

      <Step title="Check security group">
        Verify the security group allows outbound HTTPS (port 443).
      </Step>

      <Step title="Check routing">
        Verify the private subnet routes through a NAT gateway.
      </Step>
    </Steps>

    ### Application Not Responding

    <Steps>
      <Step title="Check instance">
        Verify the controller instance is running (see above).
      </Step>

      <Step title="Check Customer Portal">
        Check your Customer Portal for deployment errors.
      </Step>

      <Step title="Check logs">
        View the controller system log via the AWS console.
      </Step>

      <Step title="Contact us">
        Contact us with the error details.
      </Step>
    </Steps>

    ### High Resource Usage

    Check CloudWatch metrics for the controller instance:

    ```bash theme={null}
    aws cloudwatch get-metric-statistics \
      --namespace AWS/EC2 \
      --metric-name CPUUtilization \
      --dimensions Name=InstanceId,Value=<instance-id> \
      --start-time $(date -u -v-1H +%Y-%m-%dT%H:%M:%S) \
      --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
      --period 300 \
      --statistics Average
    ```

    If the controller is consuming more resources than expected, contact us. It may indicate a configuration issue or a need to resize the instance.
  </Tab>
</Tabs>

## Getting Help

If you can't resolve an issue:

<Steps>
  <Step title="Gather information">
    * Controller logs
    * Controller status
    * Any error messages from the setup wizard or Customer Portal
    * When the issue started
  </Step>

  <Step title="Contact us">
    Send us the above information. The more detail you provide, the faster we can help.
  </Step>

  <Step title="Enable troubleshooting permissions">
    If we need to investigate, enable the appropriate [permission tier](/customer/security/permissions). You can revoke them as soon as the investigation is complete.
  </Step>
</Steps>
