-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Description
When using ModelBuilder with ModelServer.DJL_SERVING, the SDK unconditionally overwrites HF_MODEL_ID with the model parameter value. This prevents users from loading models from S3, even when explicitly setting HF_MODEL_ID in env_vars.
Expected Behavior
Users should be able to specify an S3 URI for model loading via env_vars:
env_vars={"HF_MODEL_ID": "s3://bucket/model/"}Per DJL documentation, option.model_id accepts both HuggingFace Hub IDs and S3 URIs.
Actual Behavior
The SDK ignores user-provided values and always sets HF_MODEL_ID to the HuggingFace model ID from the model parameter.
Root Cause
In sagemaker-serve/src/sagemaker/serve/model_builder_servers.py, multiple build methods use:
self.env_vars.update({"HF_MODEL_ID": self.model})This unconditionally overwrites any user-provided HF_MODEL_ID. Affected lines: 139, 215, 323, 429, 535.
Suggested Fix
Use setdefault() to preserve user-provided values:
self.env_vars.setdefault("HF_MODEL_ID", self.model)Reproduction
from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.utils.types import ModelServer
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.core.image_uris import retrieve as get_image_uri
S3_PATH = 's3://my-bucket/models/Qwen/'
mb = ModelBuilder(
model='Qwen/Qwen3-VL-4B-Instruct',
image_uri=get_image_uri(framework='djl-lmi', version='0.30.0', region='us-west-2'),
model_server=ModelServer.DJL_SERVING,
schema_builder=SchemaBuilder({'inputs': 'Hello', 'parameters': {}}, [{'generated_text': 'Hi'}]),
role_arn='arn:aws:iam::123456789012:role/SageMakerRole',
env_vars={'HF_MODEL_ID': S3_PATH},
instance_type='ml.g5.2xlarge',
)
print('Before build:', mb.env_vars.get('HF_MODEL_ID')) # s3://my-bucket/models/Qwen/
mb.build()
print('After build:', mb.env_vars.get('HF_MODEL_ID')) # Qwen/Qwen3-VL-4B-Instruct (WRONG)Use Case
Loading models from S3 is essential for:
- Pre-synced models to avoid runtime download latency
- Models with architectures not supported by TGI (e.g.,
qwen3_vl) that require vLLM backend
Environment
- SageMaker Python SDK: 3.4.0 (also affects 3.3.x)
- Python: 3.12
- DJL LMI container: 0.30.0
I'm happy to submit a PR with the fix.