Conversation
Add database schema and adapter for CMIP7 datasets based on CMIP7 Global Attributes v1.0 specification (DOI: 10.5281/zenodo.17250297). Changes: - Add CMIP7Dataset model with core DRS attributes, parent info, and variable metadata - Add tracking_id column to DatasetFile for CMIP7 file identifiers - Create CMIP7DatasetAdapter with find_local_datasets() and instance_id construction following CMIP7 DRS format - Add database migration for cmip7_dataset table with indexes - Add unit tests for adapter and model
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive support for CMIP7 datasets to the climate-ref system based on the CMIP7 Global Attributes v1.0 specification. The PR introduces a new database schema, adapter implementation, and complete test coverage for handling CMIP7 climate model data.
Changes:
- Added CMIP7Dataset model with core DRS attributes (activity_id, institution_id, source_id, experiment_id, variant_label, variable_id, grid_label, frequency, region, branding_suffix, version), mandatory attributes (mip_era, realm, nominal_resolution), parent information fields, and variable metadata
- Introduced tracking_id column to DatasetFile table for CMIP7 file-level handle-based identifiers
- Implemented CMIP7DatasetAdapter with file parsing, instance_id construction following CMIP7 DRS format, and comprehensive metadata handling
- Created database migration to establish cmip7_dataset table with appropriate indexes and foreign key constraints
- Registered CMIP7DatasetAdapter in the factory method for dataset type routing
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/climate-ref/src/climate_ref/models/dataset.py | Adds CMIP7Dataset model class and tracking_id field to DatasetFile for CMIP7 file identifiers |
| packages/climate-ref/src/climate_ref/datasets/cmip7.py | Implements CMIP7DatasetAdapter with parsing, instance_id construction, and metadata handling functions |
| packages/climate-ref/src/climate_ref/datasets/init.py | Registers CMIP7DatasetAdapter in get_dataset_adapter factory method |
| packages/climate-ref/src/climate_ref/migrations/versions/2026-02-02T1645_c47703d514ba_add_cmip7_tables.py | Database migration creating cmip7_dataset table and adding tracking_id column |
| packages/climate-ref/tests/unit/datasets/test_cmip7.py | Comprehensive test suite covering adapter initialization, metadata structure, parsing, instance_id construction, and database operations |
| changelog/503.feature.md | Documents the new CMIP7 dataset support feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add CMIP7DatasetAdapter to ExecutionSolver.build_from_db - Add CMIP7 to parametrized test_get_dataset_adapter test - Extract parse_datetime and clean_branch_time to shared utils module - Update docstring for clean_branch_time explaining EC-Earth3 suffixes
Add two missing CMIP7 spec attributes: - license_id (mandatory): creative commons license identifier - external_variables (conditionally required): cell measure variable names Updates the DB model, file parser, dataset adapter, and includes an Alembic migration. Also improves the DRS comment to clarify the omitted leading drs_specs/mip_era fixed values.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add database schema and adapter for CMIP7 datasets based on CMIP7 Global Attributes v1.0 specification (DOI: 10.5281/zenodo.17250297).
Changes
models/dataset.py): New SQLAlchemy model with core DRS attributes (activity_id,institution_id,source_id,experiment_id,variant_label,variable_id,grid_label,frequency,region,branding_suffix,version), additional mandatory attributes (mip_era,realm,nominal_resolution), parent info fields, and variable metadata (standard_name,long_name,units)DatasetFiletable for CMIP7 file-level identifiers (handle-based)datasets/cmip7.py): New adapter implementingfind_local_datasets()with instance_id construction following CMIP7 DRS formatcmip7_datasettable with indexes onsource_id,experiment_id, andinstance_iddatasets/__init__.py): Registers CMIP7DatasetAdapter inget_dataset_adapter()Checklist
Please confirm that this pull request has done the following:
changelog/