-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat: Add file context tracking to skip redundant re-reads (reduces token usage) #10807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add content_hash field to FileMetadataEntry schema for tracking file content - Add isFileUnchangedInContext() method to FileContextTracker to detect unchanged files - Add computeContentHash() and computeFileHash() methods for MD5 hashing - Modify trackFileContext() to store content hash on read/edit operations - Update ReadFileTool to skip re-reading files that are unchanged and in context - Add comprehensive tests for the new functionality - Update existing ReadFileTool tests to mock new isFileUnchangedInContext method This reduces token usage by skipping redundant file re-reads when: 1. The file has an active read entry in context tracking 2. The file has not been edited since the last read 3. The file content hash matches the stored hash Addresses issue #10653
All previously flagged issues have been addressed. No new issues found.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
src/core/tools/ReadFileTool.ts
Outdated
| // Compute content hash for skip-redundant-reads optimization | ||
| // Only store hash for complete reads (not truncated) | ||
| let contentHash: string | undefined | ||
| if (safeReadBudget > 0) { | ||
| const readResult = await readFileWithTokenBudget(fullPath, { budgetTokens: safeReadBudget }) | ||
| // Re-read to get content for hash (already done above, use same result) | ||
| // Actually we need to use the result.content from above, so we compute hash here | ||
| // Note: content variable above is already the numbered content, we need raw | ||
| // For simplicity, compute hash from the result we already have (pre-addLineNumbers) | ||
| } | ||
| // For full reads, compute hash from raw content | ||
| if (safeReadBudget > 0) { | ||
| try { | ||
| const rawContent = await fs.readFile(fullPath, "utf-8") | ||
| contentHash = FileContextTracker.computeContentHash(rawContent) | ||
| } catch { | ||
| // If we can't read for hash, that's ok - just don't store hash | ||
| } | ||
| } | ||
| await task.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource, contentHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has several issues: (1) The first if (safeReadBudget > 0) block is dead code - it calls readFileWithTokenBudget but never uses the result. (2) The second block reads the file again with fs.readFile even though the content was already read earlier via readFileWithTokenBudget. (3) Despite the comment saying "Only store hash for complete reads (not truncated)", the code always stores the hash regardless of whether result.complete is false. This means truncated files will incorrectly report as "unchanged" on subsequent reads, even though only a portion is actually in context.
The fix should use the already-read content and only store the hash when the read was complete:
| // Compute content hash for skip-redundant-reads optimization | |
| // Only store hash for complete reads (not truncated) | |
| let contentHash: string | undefined | |
| if (safeReadBudget > 0) { | |
| const readResult = await readFileWithTokenBudget(fullPath, { budgetTokens: safeReadBudget }) | |
| // Re-read to get content for hash (already done above, use same result) | |
| // Actually we need to use the result.content from above, so we compute hash here | |
| // Note: content variable above is already the numbered content, we need raw | |
| // For simplicity, compute hash from the result we already have (pre-addLineNumbers) | |
| } | |
| // For full reads, compute hash from raw content | |
| if (safeReadBudget > 0) { | |
| try { | |
| const rawContent = await fs.readFile(fullPath, "utf-8") | |
| contentHash = FileContextTracker.computeContentHash(rawContent) | |
| } catch { | |
| // If we can't read for hash, that's ok - just don't store hash | |
| } | |
| } | |
| await task.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource, contentHash) | |
| // Compute content hash for skip-redundant-reads optimization | |
| // Only store hash for complete reads (not truncated) | |
| let contentHash: string | undefined | |
| if (safeReadBudget > 0) { | |
| const result = await readFileWithTokenBudget(fullPath, { budgetTokens: safeReadBudget }) | |
| content = addLineNumbers(result.content) | |
| if (result.complete) { | |
| // Only store hash for complete reads | |
| contentHash = FileContextTracker.computeContentHash(result.content) | |
| } | |
| } | |
| await task.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource, contentHash) |
Fix it with Roo Code or mention @roomote and request a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roomote Try the suggested fix, if you're game. I appreciate your attentiveness :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixaroo
See task on Roo Cloud
Fixed the reported issues. All local checks passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roomote do you have the ability to run any e2e tests that exercise file reads that could validate whether this patch produces the expected token savings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixaroo
See task on Roo Cloud
Good question! I investigated the e2e test infrastructure to assess token validation capabilities:
Current State:
- The
read-file.test.tse2e tests exist but are currently skipped (suite.skip) - The test infrastructure does support token tracking via
TaskCompletedandTaskTokenUsageUpdatedevents, which emitTokenUsagedata includingtotalTokensIn,totalTokensOut, andcontextTokens
Feasibility:
An e2e test to validate token savings could:
- Read a file, capture token usage
- Read the same file again in the same conversation
- Verify the second read returns the "file unchanged" message
- Compare token counts - the second read should show significantly fewer tokens
Limitation:
I cannot run e2e tests in this environment since they require a full VSCode extension host. However, the unit tests in FileContextTracker.spec.ts and readFileTool.spec.ts can be run to verify the hashing and skip logic works correctly. Would running those unit tests be helpful for validating the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roomote sure, please run those tests. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixaroo
See task on Roo Cloud
All tests passed:
- FileContextTracker.spec.ts: 22 tests passed
- readFileTool.spec.ts: 52 tests passed
- Remove dead code that called readFileWithTokenBudget but never used result - Remove redundant fs.readFile call (content was already read earlier) - Only compute and store contentHash when result.complete is true - Fixes incorrect hash storage for truncated files
This PR attempts to address Issue #10653. Feedback and guidance are welcome.
Summary
Implements file content hashing in the FileContextTracker to detect when files are unchanged and still in the conversation context, allowing ReadFileTool to skip redundant re-reads and save tokens.
Changes
FileContextTrackerTypes.ts
content_hashfield toFileMetadataEntryschema (optional, nullable string)FileContextTracker.ts
computeContentHash(content: string)static method for MD5 hashingcomputeFileHash(filePath: string)method to compute hash of file on diskisFileUnchangedInContext(filePath: string)method to check if a file can be skippedtrackFileContext()to accept an optionalcontentHashparameteraddFileToFileContextTracker()to store/invalidate content hashesReadFileTool.ts
Tests
isFileUnchangedInContextmethodHow it works
Benefits
Fixes #10653
Important
This PR adds file content hashing to
FileContextTrackerto skip redundant reads inReadFileTool, reducing token usage.FileContextTrackernow tracks file content changes using MD5 hashes to skip redundant reads.ReadFileToolchecks if a file is unchanged before reading, returning a status message if so.computeContentHash(content: string)inFileContextTrackercomputes MD5 hash of content.isFileUnchangedInContext(filePath: string)checks if a file's content is unchanged.trackFileContext()andaddFileToFileContextTracker()updated to handle content hashes.FileContextTrackerinFileContextTracker.spec.ts.readFileTool.spec.tsto mockisFileUnchangedInContextmethod.This description was created by
for ea696ce. You can customize this summary. It will automatically update as commits are pushed.