Skip to content

bug: ModelBuilder overwrites user-provided HF_MODEL_ID for DJL Serving, preventing S3 model loading #5529

@michaelraskansky

Description

@michaelraskansky

Description

When using ModelBuilder with ModelServer.DJL_SERVING, the SDK unconditionally overwrites HF_MODEL_ID with the model parameter value. This prevents users from loading models from S3, even when explicitly setting HF_MODEL_ID in env_vars.

Expected Behavior

Users should be able to specify an S3 URI for model loading via env_vars:

env_vars={"HF_MODEL_ID": "s3://bucket/model/"}

Per DJL documentation, option.model_id accepts both HuggingFace Hub IDs and S3 URIs.

Actual Behavior

The SDK ignores user-provided values and always sets HF_MODEL_ID to the HuggingFace model ID from the model parameter.

Root Cause

In sagemaker-serve/src/sagemaker/serve/model_builder_servers.py, multiple build methods use:

self.env_vars.update({"HF_MODEL_ID": self.model})

This unconditionally overwrites any user-provided HF_MODEL_ID. Affected lines: 139, 215, 323, 429, 535.

Suggested Fix

Use setdefault() to preserve user-provided values:

self.env_vars.setdefault("HF_MODEL_ID", self.model)

Reproduction

from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.utils.types import ModelServer
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.core.image_uris import retrieve as get_image_uri

S3_PATH = 's3://my-bucket/models/Qwen/'

mb = ModelBuilder(
    model='Qwen/Qwen3-VL-4B-Instruct',
    image_uri=get_image_uri(framework='djl-lmi', version='0.30.0', region='us-west-2'),
    model_server=ModelServer.DJL_SERVING,
    schema_builder=SchemaBuilder({'inputs': 'Hello', 'parameters': {}}, [{'generated_text': 'Hi'}]),
    role_arn='arn:aws:iam::123456789012:role/SageMakerRole',
    env_vars={'HF_MODEL_ID': S3_PATH},
    instance_type='ml.g5.2xlarge',
)

print('Before build:', mb.env_vars.get('HF_MODEL_ID'))  # s3://my-bucket/models/Qwen/
mb.build()
print('After build:', mb.env_vars.get('HF_MODEL_ID'))   # Qwen/Qwen3-VL-4B-Instruct (WRONG)

Use Case

Loading models from S3 is essential for:

  • Pre-synced models to avoid runtime download latency
  • Models with architectures not supported by TGI (e.g., qwen3_vl) that require vLLM backend

Environment

  • SageMaker Python SDK: 3.4.0 (also affects 3.3.x)
  • Python: 3.12
  • DJL LMI container: 0.30.0

I'm happy to submit a PR with the fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions