AnyVLM (Any Variant-Level Matching) is an off-the-shelf solution for adding local aggregate-level variant information to a Variant-Level Matching (VLM) network. It provides a REST API service that integrates with GA4GH standards for genomic data exchange.
AnyVLM enables genomic research organizations to:
- Ingest VCF files containing variant and allele frequency data
- Register variants using the GA4GH Variant Representation Specification (VRS) via AnyVar
- Store cohort allele frequencies (CAF) with zygosity-stratified counts
- Serve VLM protocol-compliant responses with Beacon handover capabilities
This service is designed for rare disease variant frequency tracking in genomic research networks such as GREGoR.
- VCF File Ingestion: Streaming upload with comprehensive validation, batch processing, and support for multiple alternate alleles
- VRS Compliance: Integration with AnyVar for standardized variant representation
- Zygosity Tracking: Separate counts for homozygous, heterozygous, and hemizygous variants
- GA4GH Beacon v2 Compatible: Standards-compliant responses for network interoperability
- Flexible Deployment: Docker support, configurable storage backends, and CLI tools
- Assembly Support: Both GRCh37/hg19 and GRCh38/hg38 reference assemblies
- Python 3.11 - 3.14
- PostgreSQL 17+
- AnyVar for variant registration
- SeqRepo for sequence data
- UTA for transcript alignment
pip install anyvlmgit clone https://github.com/genomicmedlab/anyvlm.git
cd anyvlm
pip install -e ".[dev,test]"docker pull ghcr.io/genomicmedlab/anyvlm:latestAnyVLM is configured via environment variables. Create a .env file in your project root:
# Required: Database connection
ANYVLM_STORAGE_URI=postgresql://anyvlm:anyvlm-pw@localhost:5435/anyvlm
# Required for /variant_counts endpoint: VLM handover configuration
HANDOVER_TYPE_ID="GREGoR-NCH"
HANDOVER_TYPE_LABEL="GREGoR AnyVLM Reference"
BEACON_HANDOVER_URL="https://variants.example.org/"
BEACON_NODE_ID="org.anyvlm.example"
# AnyVar configuration
UTA_DB_URL=postgresql://anonymous@localhost:5432/uta/uta_20241220
SEQREPO_DATAPROXY_URI=seqrepo+file:///usr/local/share/seqrepo/2024-12-20
ANYVAR_STORAGE_URI=postgresql://anyvar:anyvar-pw@localhost:5434/anyvar
# Optional: Service configuration
ANYVLM_ENV=local # local, test, dev, staging, prod
ANYVLM_SERVICE_URI=http://localhost:8080
ANYVLM_ANYVAR_URI=http://localhost:8000 # Omit to use embedded Python client
# Optional: Custom logging configuration
ANYVLM_LOGGING_CONFIG=/path/to/logging.yamlSee .env.example for a complete template.
-
Create required volumes:
make volumes
-
Start the full stack:
# Development mode with hot-reload make up-dev # Or production mode ANYVLM_VERSION=latest make up
-
Access the service:
- AnyVLM API: http://localhost:8080
- API Documentation: http://localhost:8080/docs
- AnyVar (if using compose.anyvar.yaml): http://localhost:8000
| File | Purpose |
|---|---|
compose.yaml |
Production deployment with pre-built images |
compose.dev.yaml |
Development with local build and hot-reload |
compose.anyvar.yaml |
AnyVar dependencies (SeqRepo, UTA, AnyVar service) |
compose.test.yaml |
Minimal services for testing |
Full stack with AnyVar:
docker compose -f compose.dev.yaml -f compose.anyvar.yaml up --buildcurl http://localhost:8080/service-infoReturns GA4GH-compliant service metadata.
curl -X POST "http://localhost:8080/ingest_vcf?assembly=grch38" \
-F "file=@/path/to/variants.vcf.gz"Requirements:
- File must be gzip-compressed (
.vcf.gz) - Maximum file size: 5GB
- Required INFO fields:
AC,AN,AC_Het,AC_Hom,AC_Hemi
Response:
{
"status": "success",
"message": "Successfully ingested variants.vcf.gz",
"details": null
}curl "http://localhost:8080/variant_counts?assemblyId=GRCh38&referenceName=22&start=44389414&referenceBases=A&alternateBases=G"Parameters:
| Parameter | Description | Example |
|---|---|---|
assemblyId |
Reference assembly | GRCh37, GRCh38, hg19, hg38 |
referenceName |
Chromosome | 1-22, X, Y, MT |
start |
Position (1-based) | 44389414 |
referenceBases |
Reference allele | A, ACGT, etc. |
alternateBases |
Alternate allele | G, TGCA, etc. |
Response:
VLM protocol-compliant JSON with:
beaconHandovers: Handover metadata for network integrationmeta: Beacon metadataresponseSummary: Whether variant exists and total resultsresponse: ResultSets grouped by zygosity (Homozygous, Heterozygous, Hemizygous, Unknown)
# Ingest a VCF file
anyvlm ingest-vcf --file /path/to/variants.vcf.gz --assembly grch38The CLI sends VCF data to the running AnyVLM service endpoint.
anyvlm/
├── src/anyvlm/
│ ├── main.py # FastAPI application
│ ├── cli.py # Command-line interface
│ ├── config.py # Configuration management
│ ├── restapi/ # REST API routes
│ │ ├── vlm.py # VLM protocol endpoints
│ │ └── deps.py # Dependency injection
│ ├── functions/ # Core business logic
│ │ ├── ingest_vcf.py # VCF processing
│ │ ├── get_caf.py # CAF retrieval
│ │ └── build_vlm_response.py
│ ├── storage/ # Database layer
│ │ ├── postgres.py # PostgreSQL implementation
│ │ └── orm.py # SQLAlchemy models
│ ├── anyvar/ # AnyVar integration
│ │ ├── http_client.py # HTTP-based client
│ │ └── python_client.py # Embedded Python client
│ └── schemas/ # Pydantic data models
├── tests/
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
└── docs/ # Sphinx documentation
# Clone repository
git clone https://github.com/genomicmedlab/anyvlm.git
cd anyvlm
# Install development dependencies
pip install -e ".[dev,test]"
# Install pre-commit hooks
pre-commit install# Run all tests
make test
# Run with coverage
pytest --cov=anyvlm --cov-report=term-missing
# Run specific test file
pytest tests/unit/test_restapi.py
# Start test database services
docker compose -f compose.test.yaml up -d# Format code
ruff format src tests
# Lint code
ruff check src tests
# Run all pre-commit hooks
pre-commit run --all-filesmake -C docs html
# Output: docs/_build/html/index.html| Command | Description |
|---|---|
make develop |
Install package in development mode |
make test |
Run test suite |
make volumes |
Create required Docker volumes |
make up |
Start production stack |
make up-dev |
Start development stack with hot-reload |
make up-test |
Start test services |
make down |
Remove all containers |
make stop |
Stop running services |
When the service is running, interactive API documentation is available at:
- Swagger UI: http://localhost:8080/docs
- ReDoc: http://localhost:8080/redoc
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests and linting (
make test && ruff check src tests) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Repository: https://github.com/genomicmedlab/anyvlm
- Issues: https://github.com/genomicmedlab/anyvlm/issues
- Email: biocommons-dev@googlegroups.com
AnyVLM is developed by The Wagner Lab at Nationwide Children's and The Translational Genomics Group at Broad Institute.
This project integrates with:
- GA4GH VRS - Variant Representation Specification
- AnyVar - Variant annotation service
- GA4GH Beacon - Standards for genomic data discovery