PBS Pro rocoto issue

Using PBS Pro on Derecho, we are experiencing an issue in the UFS-SRW where tasks are reported as dead due to hitting the max unknown count:
```
12/29/25 08:04:22 MST :: FV3LAM_wflow.xml :: Cycle 202311100000, Task make_grid, jobid=4479944, in state DEAD (Unknown), giving up because job state could not be determined 3 consecutive times, try=2 (of 2)
```
However, the jobs (and job logs) report an exit status of 0. I could not determine from the rocoto source where the dead state is originating. It seems that the job state is already dead when the exit status is checked: https://github.com/christopherwharrop/rocoto/blob/79304a1c47a18ee68c45a52799935c481d0d6d56/lib/workflowmgr/pbsprobatchsystem.rb#L299C22-L299C27. (I am a newbie with the rocoto source code FYI.)

Here is a db snippet of the `DEAD` task `make_grid` with an `Exit_status` of 0 (other tasks are similar). I’m not sure why the `make_grid` task entry is duplicated.
```
> sqlite3 FV3LAM_wflow.db ".mode column" ".headers on" "select * from jobs where taskname = 'make_grid'"
id  jobid    taskname   cycle       cores  state  native_state  exit_status  tries  nunknowns  duration
--  -------  ---------  ----------  -----  -----  ------------  -----------  -----  ---------  --------
1   4479944  make_grid  1699574400  24     DEAD   Unknown       0            2      3          0.0    
10  4479944  make_grid  1699574400  24     DEAD   Unknown       0            2      3          0.0
```
A couple other items of note:
* A rewind/reboot will often address the status issue but not always.
* The UFS-WM does not experience the same polling issue.
* Increasing the interval between `rocotorun` calls also seems to address the issue.

Are there any recommendations for additional troubleshooting or knobs we can check? Thank you!

cc @MichaelLueken


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PBS Pro rocoto issue #121

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

PBS Pro rocoto issue #121

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions