Column Filtering for Change Retention#2110
Open
leejustin wants to merge 10 commits intosequinstream:mainfrom
Open
Column Filtering for Change Retention#2110leejustin wants to merge 10 commits intosequinstream:mainfrom
leejustin wants to merge 10 commits intosequinstream:mainfrom
Conversation
Add support for excluding or including specific columns in WAL pipeline CDC events. This allows users to filter out sensitive columns (like passwords, SSNs) or include only specific columns they need. - Add include_column_attnums and exclude_column_attnums fields to SourceTable schema with mutual exclusivity validation - Create ColumnSelection module with filtering logic - Integrate column filtering into message_record and message_changes - Add database migration to update existing WAL pipelines - Add comprehensive tests for column filtering functionality
- Add ColumnSelectionForm Svelte component for UI column selection - Support include/exclude column filtering in WAL pipeline configuration - Update backend to handle column selection (includeColumnAttnums/excludeColumnAttnums) - Add column selection support in YAML loader and transforms - Update WAL pipeline form and show pages to display column selection - Add comprehensive tests for column selection functionality - Update documentation for change retention with column filtering details This allows users to selectively include or exclude specific columns when setting up change retention pipelines, providing fine-grained control over which column data is replicated.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Table Column Filtering
Adds column filtering capabilities to change retention pipelines, allowing users to exclude sensitive columns (like user data fields or large metadata columns) or include only specific columns they need. This provides fine-grained control over which column data is captured and stored in change retention tables.
Data schema changes
include_column_attnumsandexclude_column_attnumsarray fields to thesource_tablesJSONB column inwal_pipelines. Existing pipelines are migrated with NULL values for these fields.SourceTableembedded schema with two new optional array fields storing column attribute numbers. The changeset validates that these fields are mutually exclusive—only one can be set at a time.attnum), which are resolved from column names during YAML parsing and UI configuration.UI changes
Form.svelte), appearing after table selection. It displays available columns with checkboxes, automatically handles primary key constraints (PKs cannot be excluded and are always included), and shows selected columns as removable tags.Setting Column Filtering at Change Retention Creation
Note that the PK is disabled and cannot be included/excluded
Viewing Change Retention Details
The details show the selected columns if filtering is enabled
Editing Change Retention
The user can modify the configuration to update the filtering. There is a note in this field indicating that the changes will not backfill historic data for those columns in those rows, which is out of scope.
Implementation details
ColumnSelectionmodule handles filtering at multiple points:recordandchangespayloadsMessageHandler.wal_event/2andConsumers.message_record/2/message_changes/2, ensuring filtered columns are excluded from all downstream consumers and sinks.exclude_columnsandinclude_columns(column name lists) and converts them to attribute numbers, with validation to ensure primary keys are never excluded and that both options aren't specified simultaneously.Configuration
Configure column selection via:
exclude_columnsorinclude_columnsarrays inchange_retentionssource table configurationPrimary key columns are automatically protected—they cannot be excluded and are always included, even when using include mode.
Tests
Added a few tests and ran
mix test: