Skip to main content
Version: v4 (current)

Failures and Diagnostics

Orchestrator failures can come from provider setup, git synchronization, cache restore/save, container startup, engine execution, artifact upload, or cleanup. The right response depends on where the failure happened: retrying a provider API timeout is different from clearing a corrupt Unity Library, fixing an LFS pointer, or garbage-collecting stale cloud resources.

This page covers the failure-handling model, diagnostic options, and remediation paths available in Orchestrator. Unity build diagnostics are included because Unity is the primary built-in engine package, but the same investigation flow applies to Godot, Unreal, custom jobs, and custom provider types.

Failure Layers

Start by identifying the layer that failed before changing caches or retry settings.

LayerCommon signalsFirst response
Provider setupAuthentication errors, missing region/cluster, quota errorsCheck provider credentials, provider-specific inputs, and IAM/RBAC.
Git sync and LFSClone errors, missing files, pointer text in binary filesCheck gitPrivateToken, LFS hydration, submodule profile settings.
Cache restore/saveEmpty Library, stale assemblies, failed archive extractionInspect cache mode/key/root; clear only the affected cache segment.
Container or runner startupDocker pull failure, entrypoint errors, host path mismatchVerify image, runner filesystem, mounts, and provider working dir.
Engine executionEditor log errors, test failures, build method not invokedRead the engine log and apply engine-specific diagnostics.
Output and artifactsBuild succeeded but upload failed or output path is emptyCheck buildsPath, artifact paths, storage provider, and retention.
Cleanup and garbage collectStale tasks, old pods, orphaned volumes, locked workspacesRun list/watch/garbage-collect commands before deleting state.
Custom provider/plugin logicProvider method throws, protocol JSON invalid, stderr onlyRun the provider directly with debug logging and inspect stderr.

Avoid treating every failure as cache corruption. Cache deletion is expensive and can hide the real root cause.

Diagnostic Controls

Use these controls to increase visibility before changing behavior:

ControlWhere usedWhat it gives you
orchestratorDebugUnity Builder / OrchestratorVerbose provider, workspace, and resolved-parameter logging.
resourceTrackingUnity Builder / OrchestratorDisk/resource summaries during provider runs.
githubCheckGitHub Actions integrationsGitHub Check records for Orchestrator steps.
watchToEndAsync provider runsFollows provider logs until completion when supported.
game-ci statusStandalone Orchestrator CLILocal environment diagnostics.
game-ci list-resourcesStandalone Orchestrator CLIActive provider resources, when supported by the selected provider.
game-ci watchStandalone Orchestrator CLILogs or state for an active workflow.
game-ci garbage-collectStandalone Orchestrator CLICleanup of stale provider resources and caches.

Provider stderr is treated as diagnostic output. Config-defined providers, CLI protocol providers, and TypeScript providers should write actionable setup or API details to stderr/loggers and reserve thrown errors for failures Orchestrator should treat as unsuccessful lifecycle operations.

Investigation Flow

  1. Find the failing layer - provider setup, sync, cache, runner startup, engine execution, output, or cleanup.
  2. Read the authoritative log - provider API output for setup failures, git output for sync failures, engine logs for build/test failures, and artifact/storage logs for upload failures.
  3. Check whether the failure is transient - cloud API throttling and network pulls may justify retry; deterministic compile errors do not.
  4. Apply targeted remediation - clear only the affected cache area, retry only the failed provider operation, or fix the input/configuration that caused the failure.
  5. Escalate to full reset last - full workspace, cache, or Library deletion should be the final recovery step, not the default.

Provider and Infrastructure Failures

Provider failures usually appear before the engine starts. Common causes include expired cloud credentials, missing Kubernetes context, unavailable Docker daemon, private image pull failures, quota limits, stale async workflows, and provider-specific resource naming conflicts.

Useful commands:

game-ci list-resources --provider-strategy aws
game-ci watch --provider-strategy aws
game-ci garbage-collect --provider-strategy aws
game-ci status

Use provider-specific docs when the failing layer is clearly infrastructure-related:

Cache and Workspace Failures

Cache failures should be handled surgically. For Unity, clearing Library/ScriptAssemblies, Library/Bee, or Library/PackageCache is often enough. For Godot, imported asset caches such as .godot/imported may be the right target. For Unreal, Derived Data Cache and intermediate build folders are usually better first targets than deleting the whole workspace.

See Caching, Self-Hosting and Orchestrator, and Build Reliability for cache modes, fallback behavior, LFS pointer checks, and self-hosted runner cleanup options.

Unity Failure Classification

Unity CI builds fail in predictable patterns. Recognizing which failure mode you are dealing with determines whether the correct response is a retry, a targeted cleanup, or a full cache reset.

Unity engine failures fall into one of seven categories. The diagnostics service classifies failures by scanning the Unity Editor log for signal patterns and correlating them with the exit code and runtime duration.

The Seven Categories

CategoryExit CodeKey SignalsTypical Cause
LICENSE-1Access token is unavailable, Licensing is not yet initialized, runtime < 60sConcurrent license activation, Hub in wrong session
CRASH-1Fatal error, native stack trace, Crash!!! in log, runtime > 60sMemory pressure, ILPP crash, asset import failure
COMPILE1Scripts have compiler errors, error CSMissing assemblies, stale ScriptAssemblies, profile mismatch
PACKAGE1 or -1Could not restore immutable package asset, CS0246 under Library/PackageCacheCorrupt PackageCache
SKIP0Build method not invoked, no build output producedInitializeOnLoad timestamp race, SourceAssetDB mismatch
EXIT_NEG1-1None of the above signals, runtime > 60sUnclassified Unity failure
GENERICanyNo recognized signalsUnknown failure mode

Detection Details

LICENSE — Unity exits -1 within 60 seconds of launch. The Editor log contains licensing-related messages but no compilation or crash evidence. This almost always indicates a licensing startup race on multi-runner machines, not Library corruption.

CRASH — Unity exits -1 after running long enough to begin asset import or compilation. The log contains native crash evidence: stack traces with Unity.exe!, Crash!!! markers, or crash dump references. The Library may be in a partially-written state.

COMPILE — Unity exits with code 1 and the log contains C# compiler errors. Common causes: missing assembly definitions after a profile switch, stale ScriptAssemblies from a previous profile, or unhydrated LFS .dll files.

PACKAGE — The log contains PackageCache-specific errors. CS0246 errors that reference paths under Library/PackageCache or messages about immutable package assets indicate a corrupt PackageCache rather than a source code problem.

SKIP — Unity exits with code 0 but the build method was never invoked. This is the most dangerous category because it masquerades as success. Old build artifacts from a previous run can make it appear that a build succeeded when no new output was produced.

EXIT_NEG1 — Unity exits -1 without matching any specific signal pattern. This is a catch-all for unclassified -1 exits that need manual investigation.

GENERIC — Any failure that does not match the above categories. Check the Editor log directly.

Remediation Per Category

Each category has a specific remediation that avoids unnecessary cache destruction:

CategoryFirst ResponseEscalationWhat NOT to Do
LICENSERetry after 30s delayCheck Hub session IDDo not delete Library
CRASHClear Library/Bee, retryRestore Library backup, retryDo not immediately nuke Library
COMPILEClear Library/ScriptAssemblies, retryCheck LFS pointers, then clear LibraryDo not retry without clearing assemblies
PACKAGEClear Library/PackageCache, retryFull Library deleteDo not clear ScriptAssemblies (wrong target)
SKIPDelete SourceAssetDB + auto-generated assets, retryTwo-phase recovery (import-only pass, then build)Do not treat exit 0 as success
EXIT_NEG1Read Editor log, classify manuallyVariesDo not guess — read the log
GENERICRead Editor log, classify manuallyVariesDo not retry blindly

Selective Library Cleanup

Full Library deletion is expensive and should be a last resort. Target the specific subdirectory that matches the failure:

SymptomClear ThisPreserves
Compiler errors after profile switchLibrary/ScriptAssemblies + Library/BeeAsset imports
PackageCache GUID errorsLibrary/PackageCacheCompiled assemblies + asset imports
Exit 0 with no build outputLibrary/SourceAssetDBEverything else
Native crash during importFull Library/Nothing (last resort)

Multi-Phase Retry Chains

The diagnostics service uses independent retry budgets per failure type. A licensing retry does not consume the budget for crash recovery. This prevents a single intermittent failure from exhausting all retry attempts.

Retry Budget Defaults

CategoryMax RetriesCleanup Before RetryDelay
LICENSE3None (preserve Library)30s
CRASH2Clear Library/Bee on first, full Library on second10s
PACKAGE2Clear Library/PackageCache0s
COMPILE1Clear Library/ScriptAssemblies + check LFS pointers0s
SKIP1Delete SourceAssetDB + auto-generated assets0s

Circuit Breaker

When a runner fails the same build repeatedly despite retry chains, the circuit breaker prevents infinite retry loops:

  1. After exhausting all category-specific retry budgets, the build is marked as a hard failure
  2. The failure is logged with full diagnostics (category, retry history, log excerpts)
  3. The next build on the same runner starts with a full state reset (Library delete + git integrity check)
  4. If the full reset build also fails, the runner is flagged for manual investigation

The circuit breaker resets after a successful build. A single success demonstrates the runner is healthy and clears any accumulated failure state.

GitHub Step Summary and Checks

Failure diagnostics are written to the GitHub Actions Step Summary when running in a GitHub Actions environment and the installed integration supports summaries. The summary includes:

  • Failure category and confidence level
  • Key log excerpts that triggered the classification
  • Recommended remediation
  • Retry history (if retries were attempted)

This provides at-a-glance failure analysis in the GitHub Actions UI without requiring log downloads. Enable githubCheck when you also want Orchestrator step status reflected as GitHub Checks.

- name: Build
uses: game-ci/unity-builder@v4
with:
targetPlatform: StandaloneLinux64
# Diagnostics are written to Step Summary automatically

Using Diagnostics in Custom Scripts

The Unity diagnostics service is available as a programmatic API for custom build scripts and provider plugins:

import { UnityBuildDiagnosticsService, UnityRecoveryService } from '@game-ci/orchestrator';

const diagnostics = UnityBuildDiagnosticsService.analyzeRun({
exitCode,
runtimeSeconds,
logText: editorLog,
projectPath,
});

// diagnostics.category: 'LICENSE' | 'CRASH' | 'COMPILE' | 'PACKAGE' | 'SKIP' | 'EXIT_NEG1' | 'GENERIC'
// diagnostics.signals: string[] — matched signal patterns
// diagnostics.confidence: number — 0.0 to 1.0

const decision = UnityRecoveryService.decide(
diagnostics,
UnityRecoveryService.createDefaultBudgets(),
);

// decision.action: 'retry' | 'escalate' | 'fail'
// decision.cleanup: string[] — paths to clear before retry
// decision.delaySeconds: number

Investigating Unclassified Failures

When a failure lands in EXIT_NEG1 or GENERIC, manual investigation is required. The diagnostics service cannot classify what it cannot match. Follow this sequence:

  1. Pull the actual Unity Editor log — CI wrapper output is a summary. The real failure details are in Editor.log on the runner or in the provider's build container.
  2. Search for error, Error, CRASH, Fatal — these keywords narrow the log to relevant sections.
  3. Check the last 100 lines before exit — Unity often logs the proximate cause immediately before shutting down.
  4. Compare with known signal patterns — if you find a new pattern, consider adding it to the classification system.
caution

Never toggle CI features (caching, Accelerator, ILPP) to isolate an unclassified failure. Each toggle is a wasted CI run if the theory is wrong. Diagnose from log evidence first.