Hello,
First I want to point out that I am not a Linux kernel developer, however I 
have done kernel development on Berkely Unix (4.X) in the distant past.

What I'm trying to discover in the Linux kernel is how the RSS is calculated in 
the 3.X kernels. I know that the current release is in the 4.X phase however I 
must work with what our customers want to use, not what I would prefer.

The kernel mailing list has excellent coverage of adding more reporting 
forHugeTLBPages values to the 4.X kernel and that is an interesting read. I was 
attempting to use that as a way to discover the RSS calculations in the 3.X 
kernel, however it didn't get me far.

The problem:
I have a program that uses lots of data, literally as much as physical RAM and 
I need to load this data in a way that I can detect when I'm running out of RAM 
to know when to push the 'stop loading' button; or when executing scans of this 
data to know when I've allocated too much working memory and again, push the 
'stop' button.

The program only uses mmap/mprotect/munmap/madivse to manage memory. It will 
preallocate a very large amount of virtual address space using mmap as unbacked 
memory and then back the memory on an as-needed basis. The program traps all 
calls to malloc/calloc/realloc as well as both kinds of operator new along with 
the associated free/delete routines. All memory allocation is redirected into 
mmap operations.

When running I can't afford to spend time looking at a file (/proc/pid/statm) 
to see if memory is full, I need to know at the time of allocation that I'm 
done. As a result I need to know if the Linux OOM killer will shoot me down 
because of over subscription on the next call to allocate more memory.

Since I'm trapping all calls to any memory allocation, including allocation 
though the C and C++ libraries, I don't understand why the kernel is reporting 
a higher RSS size than I think I should have. If I think I've allocated 120GB 
the kernel will report that my RSS size is over 160GB. This descrepency grows 
larger as I load more data. I’m not getting an error from mprotect when I 
attempt to add more backed memory than the system supports, which would be 
acceptable as an OOM error to my program. I would expect ENOMEM if I could not 
map the required memory, instead I get hit by the OOM killer. If anyone would 
like a program that demonstrates this I have one. An interesting point on the 
program is that after a mapping of 10GB of ram and subsequent unmapping, my RSS 
size has increased from 1.5MB to 2.4MB. I need to understand this kind of 
‘behind the scenes’ allocations charged to my program.

To this end I'm appealing to the Linux Kernel developers for a helping hint (or 
3) to understand the accounting of RSS size for the 3.X kernel. I don't need a 
complete walk through, just a 'look here' kind of thing. I've been through the 
mm/mmap.c and the mm/memory.c files and I'm having no luck in putting the 
pieces together.

I know that the reporting is held in the mm_rss_stat structure and is 
initalized in init_rss_vec and updated by inline functions in mm.h.

When I walk through the unmap_page_range I see where (eventually) zap_pte_range 
is called and that eventually calls add_mm_rss_vec to update the various 
mm_counters.

When mapping I can see a call to sys_mmap_pgoff from sys_x86_64.c and can't 
find any definition of sys_mmap_pgoff in the kernel files. I do see a 
__SYSCALL(192, sys_mmap_pgoff) and a __SYSCALL(80, sys_mmap_pgoff, 6).

What else could modify the RSS of a running process? I'm not creating any new 
threads, I'm not forking the program. I'm just loading data (read from file, 
convert to internal format, MMAP some space, write to memory) for later use and 
that is causing me grief as the kernel's idea of my RSS far exceeds my idea.

I'm not on the Linux Developers mailing list, so please CC me in any reply.

Thanks for your time and consideration,

  David Barto
  ba...@cambrigesemantics.com

Reply via email to