-
Notifications
You must be signed in to change notification settings - Fork 795
Fix warnings while maintaining lazy-loading behaviour, and re-generate keyword sets. #2202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
6525125 to
1bda81a
Compare
be86638 to
d208e25
Compare
d208e25 to
af765a1
Compare
|
@tancnle what do you think of this approach? I think we have users who are very keen on clearing out the warning spam. |
|
@jneen The change LGTM overall. Great improvement 👍🏼 However, I encountered the following the error when running the visual app. 2026-02-06 14:23:34 - NameError - uninitialized constant Rouge::Lexers::Apache::SECTIONS:
if SECTIONS.include? token
^^^^^^^^
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/lexers/apache.rb:20:in 'Rouge::Lexers::Apache#name_for_token'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/lexers/apache.rb:45:in 'block (2 levels) in <class:Apache>'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:377:in 'BasicObject#instance_exec'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:377:in 'block in Rouge::RegexLexer#step'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:359:in 'Array#each'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:359:in 'Rouge::RegexLexer#step'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/regex_lexer.rb:340:in 'Rouge::RegexLexer#stream_tokens'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/lexer.rb:513:in 'Rouge::Lexer#continue_lex'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/lexer.rb:503:in 'Rouge::Lexer#lex'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatter.rb:58:in 'Enumerator#each'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatter.rb:58:in 'Rouge::Formatter#filter_escapes'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatters/html.rb:20:in 'Enumerator#each'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatters/html.rb:20:in 'Rouge::Formatters::HTML#stream'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatters/html_pygments.rb:13:in 'Rouge::Formatters::HTMLPygments#stream'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge/formatter.rb:74:in 'Rouge::Formatter#format'
/Users/tle/code/open-source/rouge-ruby/rouge/lib/rouge.rb:36:in 'Rouge.highlight'
spec/visual/templates/index.erb:29:in 'block in Tilt::CompiledTemplates#__tilt_2672'
spec/visual/templates/index.erb:24:in 'Array#each'
spec/visual/templates/index.erb:24:in 'Tilt::CompiledTemplates#__tilt_2672'
/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/tilt-2.7.0/lib/tilt/template.rb:394:in 'UnboundMethod#bind_call'
/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/tilt-2.7.0/lib/tilt/template.rb:394:in 'Tilt::Template#evaluate_method'
/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/tilt-2.7.0/lib/tilt/template.rb:266:in 'Tilt::Template#evaluate'
/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/tilt-2.7.0/lib/tilt/template.rb:134:in 'Tilt::Template#render'
/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:874:in 'Sinatra::Templates#render'
/Users/tle/.local/share/mise/installs/ruby/3.4.8/lib/ruby/gems/3.4.0/gems/sinatra-4.2.1/lib/sinatra/base.rb:756:in 'Sinatra::Templates#erb'
spec/visual/app.rb:100:in 'block in <class:VisualTestApp>'Could you have another look? |
|
Hm... How are you running the server? |
|
Ope, nevermind, it happens on refresh. I think I might bundle my next set of changes into here, using require_relative. Was hoping there'd be an intermediate step, but I think ripping off the band-aid is maybe better here. |
|
Okay @tancnle should be fixed now, though the changes are a bit further-reaching. |
|
Memory checks: [1: jneen@lavender rouge ] -> maint.initial-cleanup $
; b rake check:memory
Total allocated: 10.88 MB (94895 objects)
Total retained: 7.43 MB (45764 objects)
Detailed report saved to file: /Users/jneen/src/rouge/rouge-memory.tmp
[0: jneen@lavender rouge ] -> maint.initial-cleanup $
; b rake check:memory[eager]
Total allocated: 13.92 MB (120524 objects)
Total retained: 10.38 MB (70782 objects)
Detailed report saved to file: /Users/jneen/src/rouge/rouge-memory.tmp |
|
vs master: [0: jneen@lavender rouge ] -> master $
; b rake check:memory
Total allocated: 14.09 MB (138901 objects)
Total retained: 8.17 MB (46870 objects)
Detailed report saved to file: /Users/jneen/src/rouge/rouge-memory.tmp |
d8ac96e to
f597012
Compare
f597012 to
f3508a7
Compare
Fixes #2193
Rationale
Currently Rouge spews out warnings on account of an overly-lazy approach to loading keyword files using self-redefining methods. #1962 attempted to fix this by using static
require_relative, but this results in a much larger memory footprint when using Rouge without all lexers fully loaded, which is a common use case. Users shouldn't have to bring in the entire PHP and httpd keyword sets to lex a small amount of Ruby, for example. I find it important not to undo @ashmaroli's past work to reduce our memory footprint in this use-case.While I was in this code, it turned out the keyword generators' output did not pass linting on account of lacking newlines at the ends of files. I took the opportunity to refactor them to use a shared subclass, which handles this correctly.
Approach
Lexers can now use the
lazy { ... }helper method like so:This block will be evaluated once the first time the lexer class is initialized. This is distinct from
autoloadbehaviour, because the load is performed on lexer initialization rather than on constant lookup, with the assumption that initializing a lexer indicates an intention to use all of its functionality.The exceptions are PHP and Lua (and their subclasses and dependents such as Hack and Moonscript), with builtin lists so massive that the lexers have a runtime option to not load them - IIRC these options were inherited from Pygments for similar reasons. For these, the
auto: falseoption is provided, which will be inherited by subclasses. In this case, a manualself.class.eager_load!must be called in the initializer before anylazy_loadconstants are used.Any lexer can be manually eager-loaded by calling, e.g.
Rouge::Lexers::PHP.eager_load!. This will be a no-op iflazy_loadwas never used.For applications such as Gitlab that definitely want all lexers with their required keywords loaded, and want to avoid request-time file loading,
Rouge.eager_load!orrequire "rouge/eager"will eagerly load all lexers and their keyword files.Going Forward
For 5.0, I think we can do a little bit better than this, both from a usability and performance standpoint, if we separate lexers into separate metadata and lexing implementation components. Some care is needed on this for subclassed and embedded lexers, but the idea is that we can statically load the metadata portions of all lexers, enabling guessing and lexer lookup, and lazily load the entire regex stack along with the keyword data on lexer initialization.
Changes
Rouge::Lexer.load_lexers, for applications that need all lexers eagerly loaded, and cannot load at runtime (e.g. Gitlab)lexers_specin a random order to flush out hidden load-order bugs.UsesThe dev preview server now uses a custom reloader which filtersguard-pumato run the dev server. Can be run withbundle exec guard.$LOADED_FEATURESbefore reloading. This can be replaced with a Ruby::Box when that API is stable, but should be more than enough for now.load_lexerin favour ofrequire_relativeTODO (in later work):
load_lexerwith plainrequire_relativeand refactor the dev code reloader, potentially usingRuby::Box.