Skip to content

Conversation

@rauldiaz
Copy link

@rauldiaz rauldiaz commented Feb 3, 2026

This PR provides a new feature that allows users of EMRSteps to automatically plug the output data generated by them as inputs to the next steps and automatically inferring the step dependency.

In essence, EMRStepConfig receives a new parameter pairing output names to values (e.g., S3 URIs), extending the args list. However, while extending this list, we save the index of these parameters within the list.

Then, when declaring an EMRStep, a new field is available, emr_outputs, which is a dictionary that maps the output name specified in the config to the precise location of the step description in the properties field. Users can then use them as inputs to other steps, e.g., ProcessingInput(source=EMRStep.emr_outputs["output"], ...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant