Skip to content

Conversation

@majiayu000
Copy link
Contributor

Summary

This PR addresses two bugs:

Fix #3544: Add revision parameter support

The revision parameter in FastLlamaModel.from_pretrained was defined in the function signature but never passed to the underlying HuggingFace calls. This prevented users from loading specific model revisions/branches.

Changes:

  • Added revision parameter to AutoConfig.from_pretrained
  • Added revision parameter to AutoModelForCausalLM.from_pretrained
  • Added revision parameter to AutoModelForSequenceClassification.from_pretrained
  • Added revision parameter to load_correct_tokenizer and _load_correct_tokenizer functions

Fix #3667: Escape single quotes in Vicuna chat template

When using the Vicuna chat template with get_chat_template(), Jinja2 throws a TemplateSyntaxError because the default system message contains apostrophes (e.g., "user's questions") that break the single-quoted string in the template.

Changes:

  • Added single quote escaping ('\') in _change_system_message function before substituting system messages into templates

Test Plan

  • Verified Python syntax check passes for all modified files
  • Created and ran a test script to verify the Jinja2 template fix:
    • Original (buggy) version fails with expected token 'end of print statement', got 's'
    • Fixed version compiles and renders successfully

- Fix unslothai#3544: Add revision parameter to AutoConfig, AutoModelForCausalLM,
  AutoModelForSequenceClassification, and load_correct_tokenizer calls
  in FastLlamaModel.from_pretrained. This enables loading specific model
  revisions/branches from HuggingFace Hub.

- Fix unslothai#3667: Escape single quotes in system messages before substituting
  into Jinja2 templates. This prevents TemplateSyntaxError when system
  messages contain apostrophes (e.g., "user's" in Vicuna templates).

Signed-off-by: majiayu000 <1835304752@qq.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @majiayu000, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves two distinct issues to enhance the robustness and functionality of the system. It ensures that model and tokenizer loading respects the specified revision parameter, providing greater control over versioning. Additionally, it fixes a critical bug in chat template processing by correctly escaping special characters, preventing template parsing errors and improving the reliability of chat message generation.

Highlights

  • Revision Parameter Support: The revision parameter is now correctly passed to underlying HuggingFace calls (AutoConfig, AutoModelForCausalLM, AutoTokenizer) when loading models and tokenizers, enabling users to load specific model revisions or branches.
  • Chat Template Quote Escaping: Single quotes in system messages are now properly escaped (' to \') before being substituted into Jinja2 chat templates, resolving TemplateSyntaxError issues, particularly with the Vicuna template.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR fixes two bugs: adding revision parameter support and escaping single quotes in chat templates. The changes for the revision parameter look correct and are well-implemented. For the chat template fix, while the escaping logic is correct, it introduces code duplication and there's a pre-existing logic bug that makes some of the new code unreachable. I've provided a suggestion to refactor the function to address these issues for better maintainability and correctness.

Comment on lines 2108 to 2124
if has_placeholder:
if system_message is None:
raise ValueError("Unsloth: You need to provide a system message for custom templates.")
new_template = re.sub(system_message_pattern, system_message, template)
# Escape single quotes to prevent Jinja2 template syntax errors
escaped_message = system_message.replace("'", "\\'")
new_template = re.sub(system_message_pattern, escaped_message, template)
return new_template, system_message

return template, system_message

# For predefined templates with default system message
message_to_use = system_message if system_message is not None else default_system_message
new_template = re.sub(system_message_pattern, message_to_use, template)
# Escape single quotes to prevent Jinja2 template syntax errors
escaped_message = message_to_use.replace("'", "\\'")
new_template = re.sub(system_message_pattern, escaped_message, template)

return new_template, message_to_use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function has some duplicated logic for escaping and substituting the system message. More importantly, there seems to be a logic bug making some code unreachable.

Dead Code Issue
When type_chat_template is None (for custom templates), default_system_message on line 2094 becomes None. This causes the function to enter the if default_system_message is None: block and return early on line 2102. Consequently, the logic for custom templates starting on line 2105 is never executed.

Duplication Issue
The logic for escaping single quotes and substituting the message is repeated for custom templates (lines 2112-2113) and predefined templates (lines 2121-2122).

To fix both issues and improve clarity, I suggest refactoring the entire _change_system_message function. Since I can't suggest a change for the whole function, I'll provide the suggested implementation here for you to apply:

def _change_system_message(template: str, type_chat_template: str, system_message: str = None):
    system_message_pattern = r"\{system_message\}"

    # For custom templates
    if type_chat_template is None:
        if not re.search(system_message_pattern, template):
            return template, system_message
        
        if system_message is None:
            raise ValueError("Unsloth: You need to provide a system message for custom templates.")
        
        message_to_process = system_message
        message_to_return = system_message

    # For predefined templates
    else:
        default_system_message = DEFAULT_SYSTEM_MESSAGE.get(f"{type_chat_template}", None)
        if default_system_message is None:
            if system_message is not None:
                logger.warning_once(
                    f"Unsloth: You tried to change the system message for {type_chat_template}, "
                    "but it doesn't have a default system message. "
                    "You need to manually add the system message in your data."
                )
            return template, system_message
        
        message_to_process = system_message if system_message is not None else default_system_message
        message_to_return = message_to_process

    # Common logic for substitution
    # Escape single quotes to prevent Jinja2 template syntax errors
    escaped_message = message_to_process.replace("'", "\\'")
    new_template = re.sub(system_message_pattern, escaped_message, template)

    return new_template, message_to_return

This refactoring first separates the logic for custom and predefined templates, then uses a common block for the substitution, making the code cleaner and fixing the dead code issue.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 2223 to +2226
model_name,
token = token,
attn_implementation = "sdpa",
revision = revision,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Thread revision into vLLM fast inference

When revision is set and fast_inference=True, AutoConfig.from_pretrained now loads that specific revision, but the fast‑inference path later calls load_vllm without any revision parameter. That means the config can come from revision X while vLLM still fetches the default (latest) weights, which can yield a config/weights mismatch or silently ignore the user’s requested revision. This mismatch is introduced by adding revision here without propagating it to the vLLM loader.

Useful? React with 👍 / 👎.

@Datta0
Copy link
Collaborator

Datta0 commented Dec 29, 2025

Hey @majiayu000
Thanks for the changes. I do see the need for it.
If you can spend some more time and add similar changes to other files like loader.py, that would be of great help

- Add revision to load_vllm_kwargs in llama.py to fix config/weights mismatch
- Add revision to PEFT AutoConfig calls in loader.py (FastLanguageModel & FastModel)

Addresses reviewer feedback from @chatgpt-codex-connector and @Datta0

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@majiayu000
Copy link
Contributor Author

Hi @Datta0, thanks for the review!

I've added the revision parameter to loader.py as requested. The changes include:

  • FastLanguageModel.from_pretrained: Added revision to PEFT's AutoConfig.from_pretrained call
  • FastModel.from_pretrained: Added revision to PEFT's AutoConfig.from_pretrained call

I also addressed the feedback from @chatgpt-codex-connector by propagating revision to load_vllm_kwargs in llama.py to prevent config/weights version mismatch when using fast_inference=True.

Regarding Gemini's suggestion to refactor _change_system_message: the dead code issue is a pre-existing bug in the original function, not introduced by this PR. I'd prefer to keep this PR focused on the original fixes (#3544 and #3667). The refactoring could be addressed in a separate issue/PR if needed.

Please let me know if there's anything else you'd like me to change!

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2111 to +2113
# Escape single quotes to prevent Jinja2 template syntax errors
escaped_message = system_message.replace("'", "\\'")
new_template = re.sub(system_message_pattern, escaped_message, template)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid escaping apostrophes in raw custom templates

In the custom-template branch, {system_message} is replaced directly in the raw template string. Unconditionally escaping ' here means any custom template that inserts {system_message} as literal text (e.g., SYSTEM: {system_message} outside a quoted Jinja2 string) will now render backslashes in the prompt (SYSTEM: user\'s ...). That changes the prompt content rather than just preventing Jinja2 parse errors. Consider escaping only when the placeholder is inside a quoted literal, or deferring escaping to render-time.

Useful? React with 👍 / 👎.

@Datta0 Datta0 self-requested a review December 29, 2025 17:41
@Datta0
Copy link
Collaborator

Datta0 commented Dec 30, 2025

Hey @majiayu000 thanks for the changes but I think there are still few places where we might benefit from adding the same paramter. Like this for example

If you can do a thorough check on from_pretrained calls and make the same change that would be great :)

Propagate revision parameter to all from_pretrained calls in vision.py
to ensure consistent version pinning for vision models.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@majiayu000
Copy link
Contributor Author

Thanks for the review! Added revision parameter to FastBaseModel in vision.py (4 places: function signature, and 3 from_pretrained calls).

@Datta0
Copy link
Collaborator

Datta0 commented Dec 30, 2025

Yeah I think this looks fine now. If you would like to continue on this, you can explore adding the same to load_vllm function here and its calls in llama.py and loader.py (I assume) to complete it and make this work for fast_inference as well. But that can be done as separate PR too if you prefer it that way

Copy link
Collaborator

@Datta0 Datta0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
As a follow up maybe we can think about passing on the revision parameter to load_vllm to ensure full end to end integration.

@majiayu000
Copy link
Contributor Author

Hi @Datta0 , I had a try as your suggested. The code was on my another computer. Will push it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Vicuna chat template [Issue] FastLlamaModel Class doesn't use revision argument at all

2 participants