[padb-users] vmrss and vmlck question

Duncan Harris harris.duncan at gmail.com
Mon Jul 11 14:06:31 BST 2011


Hi.
I have a question related to vmrss and vmlck (not technically a padb
question I realise).

We've modified our padb --proc-summary command to also print out the
vmlck value. We did this as we were having a situation where one of
our jobs was hitting the vmlck hardlimit on our machine and hanging as
a result.

However, we have a different code hanging now. Running padb shows that
for one of the nodes (which has 12 cores) none of the vmlck values are
hitting the limit, however if we sum the vmlck and vmrss values for
the whole node, we do exceed the total memory available on the node
(49 GB vs 48GB). Running a stack trace shows that 3 cores are stuck in
an MPI_Wait_Some and 1 in a later MPI_Waitall. From the code all of
the sends have been sent, so we've lost a message somewhere.

My question is, how do the vmlck and vmrss values relate to each
other? Should we be adding them together, or is the vmlck included in
the vmrss value? We're assuming that they are separate as we have some
cores where vmlck > vmrss.

Thanks,
Duncan




More information about the padb-users mailing list