CodeFrame - Multi-Language Code Parser

A Tree-sitter-based code parser that extracts structural information from source files across multiple programming languages.

Supported Languages

Java (.java)
JavaScript (.js)
TypeScript (.ts)
Python (.py)
C# (.cs)
PHP (.php)
Ruby (.rb)
Rust (.rs)
SQL (.sql)
COBOL (.cbl, .cob, .cpy)
Markdown (.md, .markdown, .mkd, .mkdn, .mdwn, .mdown)

Features

For each supported language, CodeFrame extracts:

Type Information
- Class/Interface declarations
- Base classes (extends)
- Implemented interfaces
Method Information
- Method/Function names
- Parameters
- Local variables
- Method calls with object context
File-Level Elements
- Module/file-level constants and variables
- Top-level function calls (outside any class/function)

Usage

Build the project

./gradlew build

Run analysis (two arguments required)

CodeFrame requires two arguments: <input-path> and <output-file>.

# Gradle
./gradlew run --args="<input-path> <output-file>"

# Direct JAR
java -jar codeframe.jar <input-path> <output-file>

Examples:

# Analyze a single file, write to codeframe-out/analysis.jsonl
./gradlew run --args="src/main/java/org/example/MyClass.java codeframe-out/analysis.jsonl"

# Analyze an entire directory
./gradlew run --args="src/main/java codeframe-out/analysis.jsonl"

# Analyze the entire project
./gradlew run --args=". codeframe-out/analysis.jsonl"

# Run directly via java
java -jar codeframe.jar src codeframe-out/analysis.jsonl

Docker

# Build
docker build -t codeframe-dev .

# Run (mount your code at /src)
docker run --rm -it -v "$PWD:/workspace" -v "/path/to/code:/src:ro" -w /workspace codeframe-dev

# Inside container
./gradlew run --args="/src /workspace/.out/analysis.jsonl"

Output

The analysis results are written to the path you pass as the second argument (e.g., /workspace/.out/analysis.jsonl) in JSONL format (JSON Lines - one JSON object per line). Parent directories for the output file are created automatically, and .out/ is gitignored by default.

Ignore patterns (.ignore)

Location: project root .ignore (included in releases).

Default contents:

**node_modules**
**.git**
**.Designer.cs**
**.Designer.vb**

Syntax:
- Blank lines and lines starting with # are ignored.
- Globs supported: * (within a segment), ** (across segments).
- Paths are matched against normalized project paths relative to the input root.
Examples:
- **node_modules** → ignore anything under any node_modules folder.
- **.Designer.cs → ignore files ending with .Designer.cs anywhere.
- src/generated/** → ignore everything under src/generated/.

How it works:

CodeFrame loads .ignore at startup using dx-ignore and filters files before analysis.
If .ignore is missing, no files are excluded by ignore rules.

Configuration (codeframe-config.yml)

CodeFrame supports optional configuration via a codeframe-config.yml file in the project root.

Available options:

Option	Type	Default	Description
`maxFileLines`	integer	20000	Maximum number of lines a file can have. Files exceeding this limit are skipped during analysis.
`hideSqlTableColumns`	boolean	false	When true, SQL analysis output omits table column definitions for CREATE/ALTER TABLE operations.

Analyzer Configuration:

You can selectively enable/disable analyzers using the analyzers map. All analyzers are enabled by default.

Available analyzer keys: java, javascript, typescript, python, csharp, php, sql, cobol, ruby, rust, markdown

Example configuration:

maxFileLines: 20000
hideSqlTableColumns: false
analyzers:
  java: true
  python: true
  sql: true

Behavior:

If codeframe-config.yml is missing, default values are used.
If the file exists but contains invalid YAML or missing/invalid values, defaults are applied silently.

Output Format

Output is JSONL (one JSON object per line) for memory efficiency and streaming.

Each line has a kind field:

"run" - Start metadata (timestamp, input path, file count)
File analysis objects (one per file)
"error" - Parse errors (if any)
"done" - Completion metadata (duration, counts)

Example outputs: See approved test outputs for real analysis results, e.g.:

SQL Analysis

SQL file analysis uses a hybrid JSqlParser + ANTLR approach to support multiple dialects (PostgreSQL, MySQL, T-SQL, PL/SQL) without configuration.

For complete documentation on SQL support, see SQL_SPEC.md.

COBOL Analysis

COBOL file analysis extracts structural information including:

Program identification and metadata
File control entries and data definitions
Data items and variables
Sections and paragraphs
Copy book statements
Embedded SQL/CICS/IMS detection

For complete documentation on COBOL support, see COBOL_SPEC.md.

Markdown Analysis

Markdown file analysis extracts document structure including:

Preamble and heading hierarchy
Block-level elements (paragraphs, code blocks, tables, lists, block quotes, thematic breaks, HTML blocks, images)
Line spans for extracted elements

For complete documentation on Markdown support, see MARKDOWN_SPEC.md.

Architecture

See ARCHITECTURE.md for details on core components and design decisions.

Contributing

See CONTRIBUTING.md for guidelines on adding new languages and analyzer conventions.

Requirements

Java 17+
Gradle 8.x
No native toolchain required (Tree-sitter natives are bundled via Maven artifacts)

License

This project uses Tree-sitter and its language grammars, which are licensed under MIT.

Limitations

General

Nested functions (e.g., arrow functions inside other functions, decorator wrappers) are NOT extracted as separate methods. Instead, their calls and local variables are captured in the parent function. This prevents duplicate function names and maintains correct semantic grouping
Parameter modifiers (e.g., final, ref, out) are not captured
Constructor calls (new ...) are not captured in methodCalls
Loop header variables are not added to localVariables

Language-Specific

JavaScript/TypeScript: Destructured parameters emit leaf names only; dynamic imports ignored; constructor functions appear as methods; object literal methods (e.g., const obj = { method() {} }) are not extracted - only the containing variable is captured as a field; functions inside IIFEs are extracted as top-level methods (their enclosing scope is not tracked)
C#: Events not handled; see test samples for details
Java: Local/anonymous classes not extracted as separate types
Python: Type aliases using TypeAlias annotation are captured with kind: "type_alias". PEP 695 style (type X = ...) is not yet supported by the tree-sitter-python grammar
SQL: See SQL_SPEC.md
COBOL: See COBOL_SPEC.md
Markdown: Front matter is ignored in output; links and inline formatting are not emitted as dedicated elements. See MARKDOWN_SPEC.md

Testing

./gradlew test

See CONTRIBUTING.md for testing workflow and conventions.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github/workflows		.github/workflows
docs		docs
gradle/wrapper		gradle/wrapper
releaseNotes		releaseNotes
src		src
.gitignore		.gitignore
.ignore		.ignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
codeframe-config.yml		codeframe-config.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
instrument.yml		instrument.yml
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeFrame - Multi-Language Code Parser

Supported Languages

Features

Usage

Build the project

Run analysis (two arguments required)

Docker

Output

Ignore patterns (.ignore)

Configuration (codeframe-config.yml)

Output Format

SQL Analysis

COBOL Analysis

Markdown Analysis

Architecture

Contributing

Requirements

License

Limitations

General

Language-Specific

Testing

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Languages

License

dxworks/codeframe

Folders and files

Latest commit

History

Repository files navigation

CodeFrame - Multi-Language Code Parser

Supported Languages

Features

Usage

Build the project

Run analysis (two arguments required)

Docker

Output

Ignore patterns (.ignore)

Configuration (codeframe-config.yml)

Output Format

SQL Analysis

COBOL Analysis

Markdown Analysis

Architecture

Contributing

Requirements

License

Limitations

General

Language-Specific

Testing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Languages

Packages