Run GitHub Actions on ephemeral Lambda Labs GPU instances.
Call runner.yml as a reusable workflow:
name: GPU Tests
on: [push]
jobs:
lambda:
uses: Open-Athena/lambda-gha/.github/workflows/runner.yml@main
secrets: inherit
with:
instance_type: gpu_1x_a10
gpu-test:
needs: lambda
runs-on: ${{ needs.lambda.outputs.id }}
steps:
- run: nvidia-smi # GPU node!Get your API key from Lambda Labs Cloud Dashboard and add it as a repository secret:
gh secret set LAMBDA_API_KEY --body "your_api_key_here"Create a GitHub Personal Access Token with repo scope and admin access, and add it as a repository secret:
gh secret set GH_SA_TOKEN --body "your_personal_access_token_here"The private key corresponding to one of your Lambda Labs SSH keys. Used to connect to instances during setup:
gh secret set LAMBDA_SSH_PRIVATE_KEY < ~/.ssh/my-lambda-keyRegister an SSH key in your Lambda Labs account, then set the key name(s):
gh variable set LAMBDA_SSH_KEY_NAMES --body "my-ssh-key"| Input | Description | Default |
|---|---|---|
instance_type |
Instance type(s), comma-separated for fallback | gpu_1x_a10 |
region |
Region(s), comma-separated for fallback; omit to auto-select | |
instance_count |
Number of instances for parallel jobs | 1 |
retry_count |
Retries per instance/region combination | 1 |
retry_delay |
Initial retry delay in seconds (exponential backoff) | 5 |
debug |
Debug mode: false=off, true=tracing, number=sleep N minutes |
false |
extra_gh_labels |
Extra GitHub labels for the runner (comma-separated) | |
max_instance_lifetime |
Max lifetime in minutes before shutdown | 120 |
runner_grace_period |
Seconds before terminating after last job | 60 |
runner_initial_grace_period |
Seconds before terminating if no jobs start | 180 |
userdata |
Additional script to run before runner setup |
| Output | Description |
|---|---|
id |
Runner label for runs-on (single instance) |
mtx |
JSON array for matrix strategies |
Common GPU instance types:
| Type | GPUs | Description |
|---|---|---|
gpu_1x_a10 |
1x A10 | Entry-level GPU |
gpu_1x_a100_sxm4 |
1x A100 40GB | High-end single GPU |
gpu_8x_a100_80gb_sxm4 |
8x A100 80GB | Multi-GPU workloads |
See Lambda Labs pricing for full list.
Create multiple instances for parallel execution:
jobs:
lambda:
uses: Open-Athena/lambda-gha/.github/workflows/runner.yml@main
secrets: inherit
with:
instance_count: "3"
parallel-jobs:
needs: lambda
strategy:
matrix:
runner: ${{ fromJson(needs.lambda.outputs.mtx) }}
runs-on: ${{ matrix.runner.id }}
steps:
- run: echo "Running on instance ${{ matrix.runner.idx }}"Lambda Labs GPU instances frequently have capacity constraints. Use comma-separated values to specify fallback options:
with:
# Try A10 first, fall back to A100 if unavailable
instance_type: gpu_1x_a10,gpu_1x_a100,gpu_1x_rtx6000
# Try multiple regions for each instance type
region: us-east-1,us-west-1,us-south-1The action tries each instance type in order, and for each type tries each region. On capacity failures, it moves to the next option. The job summary shows all attempts:
| # | Instance Type | Region | Result |
|---|---|---|---|
| 1 | gpu_1x_a10 |
us-east-1 | |
| 2 | gpu_1x_a10 |
us-west-1 | |
| 3 | gpu_1x_a100 |
us-east-1 | ✅ Launched |
For rate limit errors, use retry_count and retry_delay to retry with exponential backoff.
| Aspect | ec2-gha | lambda-gha |
|---|---|---|
| Auth | AWS OIDC / IAM | API key |
| Instance types | g4dn.xlarge etc |
gpu_1x_a10 etc |
| Metadata service | IMDSv2 | None |
| Termination | shutdown -h now |
API call |
| Networking | VPC, Security Groups | SSH keys only |
Enable debug mode to keep the instance alive for SSH access:
with:
debug: "30" # Sleep 30 minutes before terminationThen SSH to the instance IP shown in the workflow logs:
ssh ubuntu@<instance-ip>Log files:
/var/log/runner-setup.log- Runner installation/tmp/termination-check.log- Termination checks~/runner-*/- GitHub Actions runner directories
Based on ec2-gha, adapted for Lambda Labs.