AnyVLM

AnyVLM (Any Variant-Level Matching) is an off-the-shelf solution for adding local aggregate-level variant information to a Variant-Level Matching (VLM) network. It provides a REST API service that integrates with GA4GH standards for genomic data exchange.

Overview

AnyVLM enables genomic research organizations to:

Ingest VCF files containing variant and allele frequency data
Register variants using the GA4GH Variant Representation Specification (VRS) via AnyVar
Store cohort allele frequencies (CAF) with zygosity-stratified counts
Serve VLM protocol-compliant responses with Beacon handover capabilities

This service is designed for rare disease variant frequency tracking in genomic research networks such as GREGoR.

Features

VCF File Ingestion: Streaming upload with comprehensive validation, batch processing, and support for multiple alternate alleles
VRS Compliance: Integration with AnyVar for standardized variant representation
Zygosity Tracking: Separate counts for homozygous, heterozygous, and hemizygous variants
GA4GH Beacon v2 Compatible: Standards-compliant responses for network interoperability
Flexible Deployment: Docker support, configurable storage backends, and CLI tools
Assembly Support: Both GRCh37/hg19 and GRCh38/hg38 reference assemblies

Requirements

Python 3.11 - 3.14
PostgreSQL 17+
AnyVar for variant registration
SeqRepo for sequence data
UTA for transcript alignment

Installation

Via pip

pip install anyvlm

Development Installation

git clone https://github.com/genomicmedlab/anyvlm.git
cd anyvlm
pip install -e ".[dev,test]"

Docker

docker pull ghcr.io/genomicmedlab/anyvlm:latest

Configuration

AnyVLM is configured via environment variables. Create a .env file in your project root:

# Required: Database connection
ANYVLM_STORAGE_URI=postgresql://anyvlm:anyvlm-pw@localhost:5435/anyvlm

# Required for /variant_counts endpoint: VLM handover configuration
HANDOVER_TYPE_ID="GREGoR-NCH"
HANDOVER_TYPE_LABEL="GREGoR AnyVLM Reference"
BEACON_HANDOVER_URL="https://variants.example.org/"
BEACON_NODE_ID="org.anyvlm.example"

# AnyVar configuration
UTA_DB_URL=postgresql://anonymous@localhost:5432/uta/uta_20241220
SEQREPO_DATAPROXY_URI=seqrepo+file:///usr/local/share/seqrepo/2024-12-20
ANYVAR_STORAGE_URI=postgresql://anyvar:anyvar-pw@localhost:5434/anyvar

# Optional: Service configuration
ANYVLM_ENV=local                          # local, test, dev, staging, prod
ANYVLM_SERVICE_URI=http://localhost:8080
ANYVLM_ANYVAR_URI=http://localhost:8000   # Omit to use embedded Python client

# Optional: Custom logging configuration
ANYVLM_LOGGING_CONFIG=/path/to/logging.yaml

See .env.example for a complete template.

Quick Start

Using Docker Compose (Recommended)

Create required volumes:
```
make volumes
```

Start the full stack:

# Development mode with hot-reload
make up-dev

# Or production mode
ANYVLM_VERSION=latest make up

Access the service:
- AnyVLM API: http://localhost:8080
- API Documentation: http://localhost:8080/docs
- AnyVar (if using compose.anyvar.yaml): http://localhost:8000

Available Docker Compose Configurations

File	Purpose
`compose.yaml`	Production deployment with pre-built images
`compose.dev.yaml`	Development with local build and hot-reload
`compose.anyvar.yaml`	AnyVar dependencies (SeqRepo, UTA, AnyVar service)
`compose.test.yaml`	Minimal services for testing

Full stack with AnyVar:

docker compose -f compose.dev.yaml -f compose.anyvar.yaml up --build

Usage

REST API

Service Info

curl http://localhost:8080/service-info

Returns GA4GH-compliant service metadata.

Ingest VCF File

curl -X POST "http://localhost:8080/ingest_vcf?assembly=grch38" \
  -F "file=@/path/to/variants.vcf.gz"

Requirements:

File must be gzip-compressed (.vcf.gz)
Maximum file size: 5GB
Required INFO fields: AC, AN, AC_Het, AC_Hom, AC_Hemi

Response:

{
  "status": "success",
  "message": "Successfully ingested variants.vcf.gz",
  "details": null
}

Query Variant Counts

curl "http://localhost:8080/variant_counts?assemblyId=GRCh38&referenceName=22&start=44389414&referenceBases=A&alternateBases=G"

Parameters:

Parameter	Description	Example
`assemblyId`	Reference assembly	`GRCh37`, `GRCh38`, `hg19`, `hg38`
`referenceName`	Chromosome	`1-22`, `X`, `Y`, `MT`
`start`	Position (1-based)	`44389414`
`referenceBases`	Reference allele	`A`, `ACGT`, etc.
`alternateBases`	Alternate allele	`G`, `TGCA`, etc.

Response:

VLM protocol-compliant JSON with:

beaconHandovers: Handover metadata for network integration
meta: Beacon metadata
responseSummary: Whether variant exists and total results
response: ResultSets grouped by zygosity (Homozygous, Heterozygous, Hemizygous, Unknown)

Command-Line Interface

# Ingest a VCF file
anyvlm ingest-vcf --file /path/to/variants.vcf.gz --assembly grch38

The CLI sends VCF data to the running AnyVLM service endpoint.

Project Structure

anyvlm/
├── src/anyvlm/
│   ├── main.py              # FastAPI application
│   ├── cli.py               # Command-line interface
│   ├── config.py            # Configuration management
│   ├── restapi/             # REST API routes
│   │   ├── vlm.py           # VLM protocol endpoints
│   │   └── deps.py          # Dependency injection
│   ├── functions/           # Core business logic
│   │   ├── ingest_vcf.py    # VCF processing
│   │   ├── get_caf.py       # CAF retrieval
│   │   └── build_vlm_response.py
│   ├── storage/             # Database layer
│   │   ├── postgres.py      # PostgreSQL implementation
│   │   └── orm.py           # SQLAlchemy models
│   ├── anyvar/              # AnyVar integration
│   │   ├── http_client.py   # HTTP-based client
│   │   └── python_client.py # Embedded Python client
│   └── schemas/             # Pydantic data models
├── tests/
│   ├── unit/                # Unit tests
│   └── integration/         # Integration tests
└── docs/                    # Sphinx documentation

Development

Setup

# Clone repository
git clone https://github.com/genomicmedlab/anyvlm.git
cd anyvlm

# Install development dependencies
pip install -e ".[dev,test]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
make test

# Run with coverage
pytest --cov=anyvlm --cov-report=term-missing

# Run specific test file
pytest tests/unit/test_restapi.py

# Start test database services
docker compose -f compose.test.yaml up -d

Code Quality

# Format code
ruff format src tests

# Lint code
ruff check src tests

# Run all pre-commit hooks
pre-commit run --all-files

Building Documentation

make -C docs html
# Output: docs/_build/html/index.html

Makefile Commands

Command	Description
`make develop`	Install package in development mode
`make test`	Run test suite
`make volumes`	Create required Docker volumes
`make up`	Start production stack
`make up-dev`	Start development stack with hot-reload
`make up-test`	Start test services
`make down`	Remove all containers
`make stop`	Stop running services

API Documentation

When the service is running, interactive API documentation is available at:

Swagger UI: http://localhost:8080/docs
ReDoc: http://localhost:8080/redoc

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests and linting (make test && ruff check src tests)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contact

Repository: https://github.com/genomicmedlab/anyvlm
Issues: https://github.com/genomicmedlab/anyvlm/issues
Email: biocommons-dev@googlegroups.com

Acknowledgments

AnyVLM is developed by The Wagner Lab at Nationwide Children's and The Translational Genomics Group at Broad Institute.

This project integrates with:

GA4GH VRS - Variant Representation Specification
AnyVar - Variant annotation service
GA4GH Beacon - Standards for genomic data discovery

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
docs		docs
src/anyvlm		src/anyvlm
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compose.anyvar.yaml		compose.anyvar.yaml
compose.dev.yaml		compose.dev.yaml
compose.test.yaml		compose.test.yaml
compose.yaml		compose.yaml
pyproject.toml		pyproject.toml

License

GenomicMedLab/AnyVLM

Folders and files

Latest commit

History

Repository files navigation

AnyVLM

Overview

Features

Requirements

Installation

Via pip

Development Installation

Docker

Configuration

Quick Start

Using Docker Compose (Recommended)

Available Docker Compose Configurations

Usage

REST API

Service Info

Ingest VCF File

Query Variant Counts

Command-Line Interface

Project Structure

Development

Setup

Running Tests

Code Quality

Building Documentation

Makefile Commands

API Documentation

Contributing

License

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages