From harris.duncan at gmail.com Mon Jul 11 14:06:31 2011 From: harris.duncan at gmail.com (Duncan Harris) Date: Mon, 11 Jul 2011 14:06:31 +0100 Subject: [padb-users] vmrss and vmlck question Message-ID: Hi. I have a question related to vmrss and vmlck (not technically a padb question I realise). We've modified our padb --proc-summary command to also print out the vmlck value. We did this as we were having a situation where one of our jobs was hitting the vmlck hardlimit on our machine and hanging as a result. However, we have a different code hanging now. Running padb shows that for one of the nodes (which has 12 cores) none of the vmlck values are hitting the limit, however if we sum the vmlck and vmrss values for the whole node, we do exceed the total memory available on the node (49 GB vs 48GB). Running a stack trace shows that 3 cores are stuck in an MPI_Wait_Some and 1 in a later MPI_Waitall. From the code all of the sends have been sent, so we've lost a message somewhere. My question is, how do the vmlck and vmrss values relate to each other? Should we be adding them together, or is the vmlck included in the vmrss value? We're assuming that they are separate as we have some cores where vmlck > vmrss. Thanks, Duncan From daniel.kidger at googlemail.com Mon Jul 11 15:41:03 2011 From: daniel.kidger at googlemail.com (Daniel Kidger) Date: Mon, 11 Jul 2011 15:41:03 +0100 Subject: [padb-users] vmrss and vmlck question In-Reply-To: References: Message-ID: Duncan, I tried a short piece of C code that calls mlock() with a user-adjustable size, then sleeps. Then I queried it using cat /proc//status In this case, it does appear that VmLck is a say a subset of VmRSS (and VmSize for that matter) If I increase the value in mlock() by say 128MB then both VmLck and VmRSS both increase by this amount Can you post an example where vmrss exceeds vmlck ? Daniel On 11 July 2011 14:06, Duncan Harris wrote: > Hi. > I have a question related to vmrss and vmlck (not technically a padb > question I realise). > > We've modified our padb --proc-summary command to also print out the > vmlck value. We did this as we were having a situation where one of > our jobs was hitting the vmlck hardlimit on our machine and hanging as > a result. > > However, we have a different code hanging now. Running padb shows that > for one of the nodes (which has 12 cores) none of the vmlck values are > hitting the limit, however if we sum the vmlck and vmrss values for > the whole node, we do exceed the total memory available on the node > (49 GB vs 48GB). Running a stack trace shows that 3 cores are stuck in > an MPI_Wait_Some and 1 in a later MPI_Waitall. From the code all of > the sends have been sent, so we've lost a message somewhere. > > My question is, how do the vmlck and vmrss values relate to each > other? Should we be adding them together, or is the vmlck included in > the vmrss value? We're assuming that they are separate as we have some > cores where vmlck > vmrss. > > Thanks, > Duncan > > _______________________________________________ > padb-users mailing list > padb-users at pittman.org.uk > http://pittman.org.uk/mailman/listinfo/padb-users_pittman.org.uk > -------------- next part -------------- An HTML attachment was scrubbed... URL: From harris.duncan at gmail.com Tue Jul 12 12:12:49 2011 From: harris.duncan at gmail.com (Duncan Harris) Date: Tue, 12 Jul 2011 12:12:49 +0100 Subject: [padb-users] vmrss and vmlck question In-Reply-To: References: Message-ID: Here's an example of vmlck > vmrss. We've since had one where vmlck was 4GB for a core and vrmss was still only around 1.3 GB. rank hostname pid vmsize vmrss vmlck S uptime %cpu lcore command ... ... 3924 node822 8265 3125968 kB 1281236 kB 2183152 kB R 13.14 100 0 ./a.out 3925 node822 8266 3171040 kB 1328928 kB 485320 kB R 13.14 100 1 ./a.out 3926 node822 8267 3144016 kB 1300564 kB 692028 kB R 13.14 100 2 ./a.out 3927 node822 8268 3127740 kB 1283676 kB 580352 kB R 13.14 100 3 ./a.out 3928 node822 8269 3108208 kB 1263544 kB 418840 kB R 13.14 100 4 ./a.out 3929 node822 8270 3079540 kB 1235072 kB 779184 kB R 13.14 98 5 ./a.out 3930 node822 8271 3062328 kB 1217008 kB 184380 kB R 13.14 100 6 ./a.out 3931 node822 8272 3056056 kB 1212048 kB 324308 kB R 13.14 100 7 ./a.out 3932 node822 8273 3047864 kB 1202824 kB 190376 kB R 13.14 100 8 ./a.out 3933 node822 8274 3046032 kB 1200984 kB 65964 kB R 13.14 100 9 ./a.out 3934 node822 8275 3046028 kB 1200880 kB 51928 kB R 13.14 100 10 ./a.out 3935 node822 8276 3047324 kB 1202960 kB 87784 kB R 13.14 100 11 ./a.out ... ... On Mon, Jul 11, 2011 at 3:41 PM, Daniel Kidger wrote: > Duncan, > > I tried a short piece of C code that calls mlock() with a user-adjustable > size, then sleeps. > Then I queried it using cat /proc//status > > In this case, it does appear that VmLck is a say a subset of VmRSS (and > VmSize for that matter) > If I increase the value in mlock() by say 128MB then both VmLck and VmRSS > both increase by this amount > > Can you post an example where vmrss exceeds vmlck ? > > Daniel > > > > On 11 July 2011 14:06, Duncan Harris wrote: >> >> Hi. >> I have a question related to vmrss and vmlck (not technically a padb >> question I realise). >> >> We've modified our padb --proc-summary command to also print out the >> vmlck value. We did this as we were having a situation where one of >> our jobs was hitting the vmlck hardlimit on our machine and hanging as >> a result. >> >> However, we have a different code hanging now. Running padb shows that >> for one of the nodes (which has 12 cores) none of the vmlck values are >> hitting the limit, however if we sum the vmlck and vmrss values for >> the whole node, we do exceed the total memory available on the node >> (49 GB vs 48GB). Running a stack trace shows that 3 cores are stuck in >> an MPI_Wait_Some and 1 in a later MPI_Waitall. From the code all of >> the sends have been sent, so we've lost a message somewhere. >> >> My question is, how do the vmlck and vmrss values relate to each >> other? Should we be adding them together, or is the vmlck included in >> the vmrss value? We're assuming that they are separate as we have some >> cores where vmlck > vmrss. >> >> Thanks, >> Duncan >> >> _______________________________________________ >> padb-users mailing list >> padb-users at pittman.org.uk >> http://pittman.org.uk/mailman/listinfo/padb-users_pittman.org.uk > >