Skip to content

Conversation

@jneen
Copy link
Member

@jneen jneen commented Feb 3, 2026

Fixes #2193

Rationale

Currently Rouge spews out warnings on account of an overly-lazy approach to loading keyword files using self-redefining methods. #1962 attempted to fix this by using static require_relative, but this results in a much larger memory footprint when using Rouge without all lexers fully loaded, which is a common use case. Users shouldn't have to bring in the entire PHP and httpd keyword sets to lex a small amount of Ruby, for example. I find it important not to undo @ashmaroli's past work to reduce our memory footprint in this use-case.

While I was in this code, it turned out the keyword generators' output did not pass linting on account of lacking newlines at the ends of files. I took the opportunity to refactor them to use a shared subclass, which handles this correctly.

Approach

Lexers can now use the lazy { ... } helper method like so:

lazy do
  require_relative "my_cool/keywords.rb"
end

This block will be evaluated once the first time the lexer class is initialized. This is distinct from autoload behaviour, because the load is performed on lexer initialization rather than on constant lookup, with the assumption that initializing a lexer indicates an intention to use all of its functionality.

The exceptions are PHP and Lua (and their subclasses and dependents such as Hack and Moonscript), with builtin lists so massive that the lexers have a runtime option to not load them - IIRC these options were inherited from Pygments for similar reasons. For these, the auto: false option is provided, which will be inherited by subclasses. In this case, a manual self.class.eager_load! must be called in the initializer before any lazy_load constants are used.

Any lexer can be manually eager-loaded by calling, e.g. Rouge::Lexers::PHP.eager_load!. This will be a no-op if lazy_load was never used.

For applications such as Gitlab that definitely want all lexers with their required keywords loaded, and want to avoid request-time file loading, Rouge.eager_load! or require "rouge/eager" will eagerly load all lexers and their keyword files.

Going Forward

For 5.0, I think we can do a little bit better than this, both from a usability and performance standpoint, if we separate lexers into separate metadata and lexing implementation components. Some care is needed on this for subclassed and embedded lexers, but the idea is that we can statically load the metadata portions of all lexers, enabling guessing and lexer lookup, and lazily load the entire regex stack along with the keyword data on lexer initialization.

Changes

  • Removes all uses of self-redefining methods which were causing warning spam
  • Rewrites all generated keyword lists to use constant assignment in place of method definition
  • Eagerly load a lexer's keywords at lexer initialization instead of waiting for lexing to start
  • Eagerly load all keywords in Rouge::Lexer.load_lexers, for applications that need all lexers eagerly loaded, and cannot load at runtime (e.g. Gitlab)
  • Rewrites the keyword file generation tasks to share more logic and pass linting by default
  • Run lexers_spec in a random order to flush out hidden load-order bugs.
  • Removing the top-level Rouge constant and reloading will no longer work.
  • Uses guard-puma to run the dev server. Can be run with bundle exec guard. The dev preview server now uses a custom reloader which filters $LOADED_FEATURES before reloading. This can be replaced with a Ruby::Box when that API is stable, but should be more than enough for now.
  • Removes load_lexer in favour of require_relative

TODO (in later work):

  • Rewrite the Apache httpd keywords generator, as the source we were relying on seems to have disappeared from the internet. Compare before and after
  • Replace load_lexer with plain require_relative and refactor the dev code reloader, potentially using Ruby::Box.
  • The ISBL lexer (Add ISBL lexer #891) appears to have a hand-written builtins file. Let's see if we can generate this from documentation like the others.

@jneen jneen force-pushed the maint.initial-cleanup branch from 6525125 to 1bda81a Compare February 3, 2026 05:12
@jneen jneen changed the title DRAFT: Silence warnings while maintaining lazy-loading behaviour. DRAFT: Fix warnings while maintaining lazy-loading behaviour. Feb 3, 2026
@jneen jneen changed the title DRAFT: Fix warnings while maintaining lazy-loading behaviour. DRAFT: Fix warnings while maintaining lazy-loading behaviour, and re-generate keyword sets. Feb 3, 2026
@jneen jneen added this to the 5.0 Modernization milestone Feb 3, 2026
@jneen jneen requested a review from tancnle February 3, 2026 18:00
@jneen jneen changed the title DRAFT: Fix warnings while maintaining lazy-loading behaviour, and re-generate keyword sets. Fix warnings while maintaining lazy-loading behaviour, and re-generate keyword sets. Feb 3, 2026
@jneen jneen force-pushed the maint.initial-cleanup branch from be86638 to d208e25 Compare February 3, 2026 18:07
@jneen jneen marked this pull request as ready for review February 3, 2026 19:57
@jneen jneen requested a review from a team February 4, 2026 17:29
@jneen
Copy link
Member Author

jneen commented Feb 5, 2026

@tancnle what do you think of this approach? I think we have users who are very keen on clearing out the warning spam.

@tancnle
Copy link
Collaborator

tancnle commented Feb 6, 2026

@jneen The change LGTM overall. Great improvement 👍🏼 However, I encountered the following the error when running the visual app.

2026-02-06 14:23:34 - NameError - uninitialized constant Rouge::Lexers::Apache::SECTIONS:
	
	        if SECTIONS.include? token
	           ^^^^^^^^
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/lexers/apache.rb:20:in 'Rouge::Lexers::Apache#name_for_token'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/lexers/apache.rb:45:in 'block (2 levels) in <class:Apache>'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:377:in 'BasicObject#instance_exec'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:377:in 'block in Rouge::RegexLexer#step'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:359:in 'Array#each'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:359:in 'Rouge::RegexLexer#step'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:340:in 'Rouge::RegexLexer#stream_tokens'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/lexer.rb:513:in 'Rouge::Lexer#continue_lex'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/lexer.rb:503:in 'Rouge::Lexer#lex'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatter.rb:58:in 'Enumerator#each'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatter.rb:58:in 'Rouge::Formatter#filter_escapes'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatters/html.rb:20:in 'Enumerator#each'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatters/html.rb:20:in 'Rouge::Formatters::HTML#stream'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatters/html_pygments.rb:13:in 'Rouge::Formatters::HTMLPygments#stream'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatter.rb:74:in 'Rouge::Formatter#format'
	/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge.rb:36:in 'Rouge.highlight'
	spec/visual/templates/index.erb:29:in 'block in Tilt::CompiledTemplates#__tilt_2672'
	spec/visual/templates/index.erb:24:in 'Array#each'
	spec/visual/templates/index.erb:24:in 'Tilt::CompiledTemplates#__tilt_2672'
	/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/tilt-2.7.0/lib/tilt/template.rb:394:in 'UnboundMethod#bind_call'
	/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/tilt-2.7.0/lib/tilt/template.rb:394:in 'Tilt::Template#evaluate_method'
	/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/tilt-2.7.0/lib/tilt/template.rb:266:in 'Tilt::Template#evaluate'
	/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/tilt-2.7.0/lib/tilt/template.rb:134:in 'Tilt::Template#render'
	/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:874:in 'Sinatra::Templates#render'
	/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:756:in 'Sinatra::Templates#erb'
	spec/visual/app.rb:100:in 'block in <class:VisualTestApp>'

Could you have another look?

@jneen
Copy link
Member Author

jneen commented Feb 6, 2026

Hm... How are you running the server? bundle exec puma seems to work fine for me on c3d2cac.

@jneen
Copy link
Member Author

jneen commented Feb 6, 2026

Ope, nevermind, it happens on refresh. I think I might bundle my next set of changes into here, using require_relative. Was hoping there'd be an intermediate step, but I think ripping off the band-aid is maybe better here.

@jneen
Copy link
Member Author

jneen commented Feb 6, 2026

Okay @tancnle should be fixed now, though the changes are a bit further-reaching. load_lexer is gone, and running the dev server uses bundle exec guard to support reloading.

@jneen
Copy link
Member Author

jneen commented Feb 6, 2026

Memory checks:

[1: jneen@lavender rouge ] -> maint.initial-cleanup $ 
; b rake check:memory

Total allocated: 10.88 MB (94895 objects)
Total retained:  7.43 MB (45764 objects)

Detailed report saved to file: /Users/jneen/src/rouge/rouge-memory.tmp

[0: jneen@lavender rouge ] -> maint.initial-cleanup $ 
; b rake check:memory[eager]

Total allocated: 13.92 MB (120524 objects)
Total retained:  10.38 MB (70782 objects)

Detailed report saved to file: /Users/jneen/src/rouge/rouge-memory.tmp

@jneen
Copy link
Member Author

jneen commented Feb 6, 2026

vs master:

[0: jneen@lavender rouge ] -> master $ 
; b rake check:memory

Total allocated: 14.09 MB (138901 objects)
Total retained:  8.17 MB (46870 objects)

Detailed report saved to file: /Users/jneen/src/rouge/rouge-memory.tmp

@jneen jneen force-pushed the maint.initial-cleanup branch from d8ac96e to f597012 Compare February 8, 2026 16:59
@jneen jneen force-pushed the maint.initial-cleanup branch from f597012 to f3508a7 Compare February 8, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eliminate use of self-redefining methods.

2 participants