Skip to content

Conversation

@roomote
Copy link
Contributor

@roomote roomote bot commented Jan 17, 2026

This PR attempts to address Issue #10653. Feedback and guidance are welcome.

Summary

Implements file content hashing in the FileContextTracker to detect when files are unchanged and still in the conversation context, allowing ReadFileTool to skip redundant re-reads and save tokens.

Changes

FileContextTrackerTypes.ts

  • Added content_hash field to FileMetadataEntry schema (optional, nullable string)

FileContextTracker.ts

  • Added computeContentHash(content: string) static method for MD5 hashing
  • Added computeFileHash(filePath: string) method to compute hash of file on disk
  • Added isFileUnchangedInContext(filePath: string) method to check if a file can be skipped
  • Modified trackFileContext() to accept an optional contentHash parameter
  • Modified addFileToFileContextTracker() to store/invalidate content hashes

ReadFileTool.ts

  • Added check at the start of file processing to detect unchanged files
  • Returns a brief "file unchanged" status message instead of re-reading the full file
  • Computes and stores content hash after reading files successfully
  • Tracks file context with content hash

Tests

  • Added comprehensive test suite for new FileContextTracker functionality
  • Updated existing ReadFileTool tests to mock the new isFileUnchangedInContext method

How it works

  1. When a file is read, its content is hashed with MD5 and stored in the context tracker
  2. On subsequent reads, the tool checks if the file is already in context and unchanged
  3. If unchanged, it returns a brief status message instead of re-reading the entire file
  4. The hash is invalidated when the user edits the file (detected via file watchers)

Benefits

  • Reduces token usage by avoiding redundant file re-reads
  • Especially useful for large files that are referenced multiple times
  • Transparent to the user and the AI model

Fixes #10653


Important

This PR adds file content hashing to FileContextTracker to skip redundant reads in ReadFileTool, reducing token usage.

  • Behavior:
    • FileContextTracker now tracks file content changes using MD5 hashes to skip redundant reads.
    • ReadFileTool checks if a file is unchanged before reading, returning a status message if so.
  • Functions:
    • computeContentHash(content: string) in FileContextTracker computes MD5 hash of content.
    • isFileUnchangedInContext(filePath: string) checks if a file's content is unchanged.
    • trackFileContext() and addFileToFileContextTracker() updated to handle content hashes.
  • Tests:
    • Added tests for FileContextTracker in FileContextTracker.spec.ts.
    • Updated readFileTool.spec.ts to mock isFileUnchangedInContext method.

This description was created by Ellipsis for ea696ce. You can customize this summary. It will automatically update as commits are pushed.

- Add content_hash field to FileMetadataEntry schema for tracking file content
- Add isFileUnchangedInContext() method to FileContextTracker to detect unchanged files
- Add computeContentHash() and computeFileHash() methods for MD5 hashing
- Modify trackFileContext() to store content hash on read/edit operations
- Update ReadFileTool to skip re-reading files that are unchanged and in context
- Add comprehensive tests for the new functionality
- Update existing ReadFileTool tests to mock new isFileUnchangedInContext method

This reduces token usage by skipping redundant file re-reads when:
1. The file has an active read entry in context tracking
2. The file has not been edited since the last read
3. The file content hash matches the stored hash

Addresses issue #10653
@roomote
Copy link
Contributor Author

roomote bot commented Jan 17, 2026

Rooviewer Clock   See task on Roo Cloud

All previously flagged issues have been addressed. No new issues found.

  • ReadFileTool.ts (lines 629-648): Dead code, redundant file read, and incorrect hash storage for truncated files. The code always stores the content hash even when files are truncated, which will incorrectly report files as "unchanged" on subsequent reads.
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Comment on lines 629 to 648
// Compute content hash for skip-redundant-reads optimization
// Only store hash for complete reads (not truncated)
let contentHash: string | undefined
if (safeReadBudget > 0) {
const readResult = await readFileWithTokenBudget(fullPath, { budgetTokens: safeReadBudget })
// Re-read to get content for hash (already done above, use same result)
// Actually we need to use the result.content from above, so we compute hash here
// Note: content variable above is already the numbered content, we need raw
// For simplicity, compute hash from the result we already have (pre-addLineNumbers)
}
// For full reads, compute hash from raw content
if (safeReadBudget > 0) {
try {
const rawContent = await fs.readFile(fullPath, "utf-8")
contentHash = FileContextTracker.computeContentHash(rawContent)
} catch {
// If we can't read for hash, that's ok - just don't store hash
}
}
await task.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource, contentHash)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has several issues: (1) The first if (safeReadBudget > 0) block is dead code - it calls readFileWithTokenBudget but never uses the result. (2) The second block reads the file again with fs.readFile even though the content was already read earlier via readFileWithTokenBudget. (3) Despite the comment saying "Only store hash for complete reads (not truncated)", the code always stores the hash regardless of whether result.complete is false. This means truncated files will incorrectly report as "unchanged" on subsequent reads, even though only a portion is actually in context.

The fix should use the already-read content and only store the hash when the read was complete:

Suggested change
// Compute content hash for skip-redundant-reads optimization
// Only store hash for complete reads (not truncated)
let contentHash: string | undefined
if (safeReadBudget > 0) {
const readResult = await readFileWithTokenBudget(fullPath, { budgetTokens: safeReadBudget })
// Re-read to get content for hash (already done above, use same result)
// Actually we need to use the result.content from above, so we compute hash here
// Note: content variable above is already the numbered content, we need raw
// For simplicity, compute hash from the result we already have (pre-addLineNumbers)
}
// For full reads, compute hash from raw content
if (safeReadBudget > 0) {
try {
const rawContent = await fs.readFile(fullPath, "utf-8")
contentHash = FileContextTracker.computeContentHash(rawContent)
} catch {
// If we can't read for hash, that's ok - just don't store hash
}
}
await task.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource, contentHash)
// Compute content hash for skip-redundant-reads optimization
// Only store hash for complete reads (not truncated)
let contentHash: string | undefined
if (safeReadBudget > 0) {
const result = await readFileWithTokenBudget(fullPath, { budgetTokens: safeReadBudget })
content = addLineNumbers(result.content)
if (result.complete) {
// Only store hash for complete reads
contentHash = FileContextTracker.computeContentHash(result.content)
}
}
await task.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource, contentHash)

Fix it with Roo Code or mention @roomote and request a fix.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roomote Try the suggested fix, if you're game. I appreciate your attentiveness :)

Copy link
Contributor Author

@roomote roomote bot Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixaroo Clock   See task on Roo Cloud

Fixed the reported issues. All local checks passed.

View commit | Revert commit

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roomote do you have the ability to run any e2e tests that exercise file reads that could validate whether this patch produces the expected token savings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixaroo Clock   See task on Roo Cloud

Good question! I investigated the e2e test infrastructure to assess token validation capabilities:

Current State:

  • The read-file.test.ts e2e tests exist but are currently skipped (suite.skip)
  • The test infrastructure does support token tracking via TaskCompleted and TaskTokenUsageUpdated events, which emit TokenUsage data including totalTokensIn, totalTokensOut, and contextTokens

Feasibility:
An e2e test to validate token savings could:

  1. Read a file, capture token usage
  2. Read the same file again in the same conversation
  3. Verify the second read returns the "file unchanged" message
  4. Compare token counts - the second read should show significantly fewer tokens

Limitation:
I cannot run e2e tests in this environment since they require a full VSCode extension host. However, the unit tests in FileContextTracker.spec.ts and readFileTool.spec.ts can be run to verify the hashing and skip logic works correctly. Would running those unit tests be helpful for validating the implementation?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roomote sure, please run those tests. Thanks!

Copy link
Contributor Author

@roomote roomote bot Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixaroo Clock   See task on Roo Cloud

All tests passed:

  • FileContextTracker.spec.ts: 22 tests passed
  • readFileTool.spec.ts: 52 tests passed

- Remove dead code that called readFileWithTokenBudget but never used result
- Remove redundant fs.readFile call (content was already read earlier)
- Only compute and store contentHash when result.complete is true
- Fixes incorrect hash storage for truncated files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Triage

Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] Add file context tracking to skip redundant re-reads (reduces token usage)

3 participants