Skip to content

Batch concat_ws calls to support tables with 100+ columns#80

Merged
mason-sharp merged 1 commit intomainfrom
fix/ACE-168-wide-tables
Feb 18, 2026
Merged

Batch concat_ws calls to support tables with 100+ columns#80
mason-sharp merged 1 commit intomainfrom
fix/ACE-168-wide-tables

Conversation

@mason-sharp
Copy link
Member

PostgreSQL limits functions to 100 arguments. The row-hashing expression used concat_ws('|', col1, ..., colN), which fails on wide tables. Split column expressions into groups of 99 with nested concat_ws calls. This affects both table-diff and merkle tree hash queries.

PostgreSQL limits functions to 100 arguments. The row-hashing expression
used concat_ws('|', col1, ..., colN), which fails on wide tables. Split
column expressions into groups of 99 with nested concat_ws calls. This
affects both table-diff and merkle tree hash queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

📝 Walkthrough

Walkthrough

Added batching logic for PostgreSQL concat_ws function calls to handle the 100-argument limit. Introduced a concatWSBatched helper function that nests concat_ws calls when expressions exceed 99 arguments per call, with corresponding comprehensive test coverage.

Changes

Cohort / File(s) Summary
Query Expression Batching
db/queries/queries.go
Introduced concatWSBatched helper function with explanatory comments about PostgreSQL's 100-argument limit and batching strategy. Modified buildRowTextExpr to utilize batching, splitting expressions into nested concat_ws calls when exceeding 99-argument threshold.
Batching Test Suite
db/queries/queries_test.go
Added TestConcatWSBatched with four subtests validating single concat_ws behavior under and at limit (50, 99 exprs) and nested batching for over-limit scenarios (100, 200 exprs).

Poem

🐰 Hopping through queries with glee,
Batching concat_ws calls, so spry!
PostgreSQL limits we see,
Nested functions reach the sky,
A hundred arguments? No cry!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing batching for concat_ws calls to support tables with 100+ columns, which directly matches the changeset's primary objective.
Description check ✅ Passed The description is directly related to the changeset, explaining the PostgreSQL 100-argument limit, the problem with wide tables, and the batching solution implemented in the code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/ACE-168-wide-tables

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
db/queries/queries.go (1)

583-599: Document the single-depth nesting assumption and guard against empty input.

Two minor gaps worth addressing:

  1. Single-depth nesting: The outer concat_ws itself has no overflow protection — if len(batches) ever exceeded 99, it would hit the same PostgreSQL 100-arg limit. In practice this can't happen because PostgreSQL's column limit (~1600) means at most ⌈1600/99⌉ = 17 batches, but the assumption is implicit. A brief comment would prevent confusion if this function is ever reused outside its current context.

  2. Empty-slice precondition: concatWSBatched([]string{}) emits concat_ws('|', ) — invalid SQL. The only call site (buildRowTextExpr) guards this with an early return, but the function itself carries no documented precondition.

♻️ Proposed improvements
 // concatWSBatched produces a concat_ws('|', ...) expression. When len(exprs)
 // exceeds 99 (the max value-arguments per concat_ws call, since the separator
 // takes one slot), it splits the expressions into batches and nests the calls.
+//
+// Only one level of nesting is applied. This is safe because PostgreSQL's
+// maximum column count (~1600) means at most ceil(1600/99)=17 inner batches,
+// keeping the outer concat_ws well within the 100-argument limit.
+//
+// Precondition: exprs must not be empty.
 func concatWSBatched(exprs []string) string {
 	const maxArgs = 99 // 100 total - 1 for the separator
 
+	if len(exprs) == 0 {
+		return "''"
+	}
+
 	if len(exprs) <= maxArgs {
 		return fmt.Sprintf("concat_ws('|', %s)", strings.Join(exprs, ", "))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/queries/queries.go` around lines 583 - 599, The concatWSBatched function
needs a short doc comment stating it assumes only single-depth nesting (the
outer concat_ws is not batched and therefore must not exceed PostgreSQL's
~100-arg limit in this usage) and must guard against an empty input slice;
update concatWSBatched to return an empty string immediately when len(exprs) ==
0 (or otherwise no-op) to avoid emitting invalid SQL like "concat_ws('|', )",
and add the comment near the const maxArgs / batches logic referencing
concatWSBatched and batches so future readers understand the single-depth
assumption.
db/queries/queries_test.go (2)

457-468: Add expression-presence check to the 200-expression subtest.

Unlike the 100-expression subtest, this case only verifies the number of concat_ws calls but not that all 200 expressions appear in the output. A bug that drops the third batch entirely would still pass.

♻️ Proposed addition
 		if strings.Count(result, "concat_ws(") != 4 {
 			t.Errorf("expected 4 concat_ws calls for 200 exprs, got %d in: %s",
 				strings.Count(result, "concat_ws("), result)
 		}
+		for _, e := range exprs {
+			if !strings.Contains(result, e) {
+				t.Errorf("missing expression %s in result", e)
+			}
+		}
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/queries/queries_test.go` around lines 457 - 468, Update the "200
expressions nests into 3 inner batches" subtest to also assert that all
expressions are present in the result: after calling concatWSBatched(exprs)
iterate the expected expression names (e.g., "col0".."col199") and assert each
appears in result (or assert the total occurrences of "col" labels equals 200).
Modify the test that calls concatWSBatched to include this presence check so a
dropped batch would fail; reference the concatWSBatched call and the local exprs
slice when adding the assertions.

449-455: Substring collision weakens the expression-presence check.

Column names col0col99 include prefix collisions: "col1" is a substring of "col10""col19", so strings.Contains(result, "col1") passes even if col1 is absent but any of col10col19 is present. The test will not catch a regression where a specific column is silently dropped.

♻️ More precise presence check
-		// Every expression should be present
-		for _, e := range exprs {
-			if !strings.Contains(result, e) {
-				t.Errorf("missing expression %s in result", e)
-			}
-		}
+		// Every expression should be present — check for the exact token
+		// (comma or parenthesis delimited) to avoid substring false-positives.
+		for _, e := range exprs {
+			// A column appears as ", colN" or "(colN" inside the concat_ws calls.
+			if !strings.Contains(result, ", "+e) && !strings.Contains(result, "("+e) {
+				t.Errorf("missing expression %s in result: %s", e, result)
+			}
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@db/queries/queries_test.go` around lines 449 - 455, The check using
strings.Contains(result, e) is vulnerable to substring collisions (e.g., "col1"
matching "col10"); update the loop that iterates over exprs and checks result to
use a word-boundary regex match instead: compile regexp.MustCompile(`\b` +
regexp.QuoteMeta(e) + `\b`) and use its MatchString(result) in place of
strings.Contains; ensure you add the regexp import and use exprs/result as
before so each column name is matched as a whole token.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@db/queries/queries_test.go`:
- Around line 457-468: Update the "200 expressions nests into 3 inner batches"
subtest to also assert that all expressions are present in the result: after
calling concatWSBatched(exprs) iterate the expected expression names (e.g.,
"col0".."col199") and assert each appears in result (or assert the total
occurrences of "col" labels equals 200). Modify the test that calls
concatWSBatched to include this presence check so a dropped batch would fail;
reference the concatWSBatched call and the local exprs slice when adding the
assertions.
- Around line 449-455: The check using strings.Contains(result, e) is vulnerable
to substring collisions (e.g., "col1" matching "col10"); update the loop that
iterates over exprs and checks result to use a word-boundary regex match
instead: compile regexp.MustCompile(`\b` + regexp.QuoteMeta(e) + `\b`) and use
its MatchString(result) in place of strings.Contains; ensure you add the regexp
import and use exprs/result as before so each column name is matched as a whole
token.

In `@db/queries/queries.go`:
- Around line 583-599: The concatWSBatched function needs a short doc comment
stating it assumes only single-depth nesting (the outer concat_ws is not batched
and therefore must not exceed PostgreSQL's ~100-arg limit in this usage) and
must guard against an empty input slice; update concatWSBatched to return an
empty string immediately when len(exprs) == 0 (or otherwise no-op) to avoid
emitting invalid SQL like "concat_ws('|', )", and add the comment near the const
maxArgs / batches logic referencing concatWSBatched and batches so future
readers understand the single-depth assumption.

@mason-sharp mason-sharp merged commit f8374a4 into main Feb 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant