Skip to content

Conversation

@dimitri-yatsenko
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko commented Jan 16, 2026

Summary

  1. Fix incorrect codec notation in type-system.md comparison table
  2. Add comprehensive how-to guide for using plugin codecs
  3. Reference dj-zarr-codecs and dj-photon-codecs as DataJoint plugin codec examples

Note: This PR targets main branch. The same changes were previously merged to pre/v2.0 via PR #121.

Changes

1. Fix Codec Notation

File: src/reference/specs/type-system.md (line 638)

Changed the "External dtype" row in the codec comparison table from <hash> to <hash@> to match the correct store-only notation.

Before:

| External dtype | `<hash>` | `<hash>` | `json` | `json` | `json` |

After:

| External dtype | `<hash@>` | `<hash@>` | `json` | `json` | `json` |

2. Add Plugin Codecs Documentation

New file: src/how-to/use-plugin-codecs.md (332 lines)

Comprehensive guide for using plugin codec packages that extend DataJoint via entry point discovery.

Key sections:

  • Installation and automatic registration via Python entry points
  • Complete dj-zarr-codecs usage example with Zarr array storage
  • Schema-addressed storage structure explanation
  • Finding DataJoint-maintained codecs (dj-zarr-codecs, dj-photon-codecs)
  • Comparison with built-in codecs (<npy@>, <blob@>)
  • Best practices for dependency management
  • Troubleshooting common issues

Terminology: Uses "plugin codecs" instead of "external/third-party" to accurately describe the architectural pattern (separate packages with entry point discovery) without implying ownership.

DataJoint Plugin Codecs:

  • dj-zarr-codecs - Zarr array storage for general numpy arrays
  • dj-photon-codecs - Photon-limited movies with Anscombe transformation and compression

Note: anscombe-transform is a Zarr/Numcodecs codec (dependency), not a DataJoint plugin codec.

Updated navigation:

  • src/how-to/index.md - Added entry
  • mkdocs.yaml - Added to Object Storage section

3. Reference Plugin Codecs in Explanations

File: src/explanation/custom-codecs.md

Added "Before Creating Your Own" section that directs readers to check existing plugin codecs before implementing custom solutions:

  • dj-zarr-codecs — General numpy arrays with Zarr
  • dj-photon-codecs — Photon-limited movies with Anscombe + compression

Context

The <hash@> codec is external-only and requires the @ modifier. The original table incorrectly showed <hash> (without @) in the "External dtype" row.

The plugin codecs guide establishes terminology and best practices for DataJoint plugin codecs - packages that register via datajoint.codecs entry points. Both dj-zarr-codecs and dj-photon-codecs follow this pattern.

Related

Changed <hash> to <hash@> in the External dtype row to match
the correct store-only notation used throughout the documentation.
Add comprehensive how-to guide for using plugin codecs - codec packages
that extend DataJoint via entry point discovery. Uses dj-zarr-codecs
as the primary example.

Key sections:
- Installation and automatic registration via entry points
- Complete Zarr codec usage example with storage structure
- Finding DataJoint-maintained and community codecs
- Comparison with built-in codecs (<npy@>, <blob@>)
- Best practices for dependency management
- Troubleshooting common issues

Terminology: Uses 'plugin codecs' instead of 'external/third-party' to
accurately describe the architectural pattern (separate packages with
entry point discovery) without implying ownership.
Update plugin codecs documentation to include dj-photon-codecs:
- Add to DataJoint-maintained codecs list
- Include in imaging domain examples
- Reference in See Also section

dj-photon-codecs provides Anscombe transformation + Zarr compression
for photon-limited imaging data (calcium imaging, low-light microscopy).
Add 'Before Creating Your Own' section to custom-codecs.md that directs
readers to check existing plugin codecs (dj-zarr-codecs, dj-photon-codecs,
anscombe-transform) before implementing their own.

Encourages reuse and ensures users are aware of existing solutions.
anscombe-transform is a Zarr/Numcodecs codec (not a DataJoint codec).
It doesn't have a datajoint.codecs entry point - it's a dependency
used by dj-photon-codecs, not a standalone DataJoint plugin codec.

Removed from:
- DataJoint-maintained codecs list in use-plugin-codecs.md
- Before Creating Your Own section in custom-codecs.md
Add detailed guidance on versioning plugin codecs for backward compatibility:

- Version strategy: package version vs data format version
- When to bump versions (breaking vs non-breaking changes)
- Implementation patterns for version dispatch
- Migration strategies (lazy, explicit, deprecation warnings)
- Real-world example with dj-photon-codecs evolution
- Testing version compatibility
- Semantic versioning guidelines for codec packages

Critical for maintaining data accessibility as codecs evolve.
Add section explaining why built-in codecs don't need explicit versioning:
- Built-in codecs versioned with DataJoint releases
- Plugin codecs have independent lifecycles and need codec_version
- DataJoint's semantic versioning handles built-in codec evolution
- Plugin versioning protects against independent evolution

Key distinction: Built-in codecs are part of DataJoint's API surface
(versioned by framework), while plugin codecs are independent packages
(need self-versioning).
Add comprehensive documentation of DataJoint's custom blob serialization:

Explanation docs (type-system.md):
- Protocol headers (mYm for MATLAB compat, dj0 for Python-extended)
- Optional zlib compression for data > 1KB
- Type-specific encoding with serialization codes
- Version detection via embedded protocol headers
- Supported types list
- Storage modes (<blob> vs <blob@>)

Reference docs (type-system.md):
- Detailed type code mapping for all supported Python types
- Protocol header format (mYm\0, dj0\0)
- Version detection mechanism
- MD5 deduplication for <blob@>

Clarifies that <blob> does NOT use pickle - it uses DataJoint's
custom binary format with intrinsic versioning via protocol headers.
Add references to mYm format documentation:
- MATLAB FileExchange: https://www.mathworks.com/matlabcentral/fileexchange/81208-mym
- GitHub repository: https://github.com/datajoint/mym

Add intrinsic versioning explanation to plugin codecs guide:
- How built-in codecs embed version in data format
- Protocol headers in <blob> (mYm\0, dj0\0)
- NumPy format version in <npy@> headers
- Self-describing structure in <object@>
- Why built-in codecs don't need explicit codec_version field

Clarifies the distinction between built-in codecs (intrinsic versioning)
and plugin codecs (explicit codec_version field).
@MilagrosMarin MilagrosMarin merged commit fa22cf7 into main Jan 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants