Skip to content

Conversation

@atsareg
Copy link
Contributor

@atsareg atsareg commented Jan 19, 2026

Jobs that failed initialization, e.g. input sandbox download, are rescheduled by the jobwrapper and return non-zero status. JobAgent checks the job status with a certain delay, can be few minutes in the case of Pool/Singularity inner CE. The job can be already rescheduled, matched and running on another site again by this time. JobAgent sets the job status to Failed if it is in a Running status even if it is running after a successful rescheduling elsewhere. This Failed status is false in this case and should not be set.

BEGINRELEASENOTES

*WorkloadManagement
FIX: JobAgent - do not fail already rescheduled job

ENDRELEASENOTES

@atsareg atsareg added the alsoTargeting:integration Cherry pick this PR to integration after merge label Jan 19, 2026
@atsareg atsareg changed the title [8.0] JobAgent - do not reschedule already rescheduled job [8.0] JobAgent - do not fail already rescheduled job Jan 19, 2026
@fstagni fstagni merged commit 8c8262e into DIRACGrid:rel-v8r0 Jan 27, 2026
41 of 44 checks passed
@DIRACGridBot DIRACGridBot added sweep:done All sweeping actions have been done for this PR sweep:failed Sweeping failed and needs manual intervention labels Jan 27, 2026
@DIRACGridBot
Copy link

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/21401546530

Failed:

  • integration
    cherry-pick 8c8262e into integration failed
    check merge conflicts on a local copy of this repository
    git fetch upstream
    git checkout upstream/integration -b cherry-pick-2-8c8262ebb7-integration
    git cherry-pick -x -m 1 8c8262ebb7
    # Fix the conflicts
    git cherry-pick --continue
    git commit --amend -m 'sweep: #8427 JobAgent - do not fail already rescheduled job' --author='Andrei Tsaregorodtsev <atsareg@in2p3.fr>'
    git push -u origin cherry-pick-2-8c8262ebb7-integration
    
    # If you have the GitHub CLI installed the PR can be made with
    gh pr create \
         --label 'sweep:from rel-v8r0' \
         --base integration \
         --repo DIRACGrid/DIRAC \
         --title '[sweep:integration] JobAgent - do not fail already rescheduled job' \
         --body 'Sweep #8427 `JobAgent - do not fail already rescheduled job` to `integration`.
    
    Adding original author @atsareg as watcher.
    
    BEGINRELEASENOTES
    
    *WorkloadManagement
    FIX: JobAgent - do not fail already rescheduled job
    
    ENDRELEASENOTES
    Closes #8438'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alsoTargeting:integration Cherry pick this PR to integration after merge sweep:done All sweeping actions have been done for this PR sweep:failed Sweeping failed and needs manual intervention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants