Re: [Tracker] Memory usage for tracker-extract and tracker-miner-fs

Joe Rhodes Mon, 11 Jan 2016 18:34:00 -0800

Carlos, et. al.,

I'm sorry, but I cannot seem to build the master branch right now.  I ran the 
autogen.sh script and then configure dies on me with this:


checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.16... yes
./configure: line 19136: syntax error near unexpected token `0.9.5'
./configure: line 19136: `GOBJECT_INTROSPECTION_CHECK(0.9.5)'


I'm not entirely sure what's going on there.  (Sorry, programming is not my 
forte.)    I'll have to wait for 1.7.2 and give that a try.  I can only work on 
this in the evenings when I'm not at work and the server thats housing all of 
this data is otherwise not terribly busy.

Cheers!
-Joe Rhodes




> On Jan 11, 2016, at 5:21 AM, Philip Van Hoof <phi...@codeminded.be> wrote:
> 
> Hi Carlos,
> 
> Looks like my git-account has been closed on GNOME, so here is a patch
> for one of the issues in that valgrind.
> 
> 
> Kind regards,
> 
> Philip
> 
> On Sun, 2016-01-10 at 16:05 -0500, Joe Rhodes wrote:
>> Carlos:
>> 
>> Yes, there are a LOT of files on this volume.  The makeup of the 5 TB of 
>> data is PDFs, Photoshop files, Word docs, InDesign & Illustrator docs.  
>> There are very few large files like MP3's or videos.   If I disable all the 
>> extractors and just build an index based on file names, I get an index of 
>> about 3 GB.  
>> 
>> I did notice that I was possibly indexing all of my snapshots of my volumes. 
>> I'm using ZFS and they're available under "/volume/.zfs".  I've added that 
>> folder to my list of excluded directories:
>> 
>> org.freedesktop.Tracker.Miner.Files ignored-directories ['.zfs', 'ZZZ 
>> Snapshots', 'po', 'CVS', 'core-dumps', 'lost+found']
>> 
>> I'll see if that makes any difference.  If it was digging into those, that 
>> would greatly increase the number of files.
>> 
>> I'm not entirely sure how to start tracker with the valgrind command.  
>> Tracker is currently started automatically by the Netatalk file server 
>> process.  In order to run the tracker processes, I have to execute the 
>> following:
>> 
>> PREFIX="/main-storage"
>> export XDG_DATA_HOME="$PREFIX/var/netatalk/"
>> export XDG_CACHE_HOME="$PREFIX/var/netatalk/"
>> export 
>> DBUS_SESSION_BUS_ADDRESS="unix:path=$PREFIX/var/netatalk/spotlight.ipc"
>> /usr/local/bin/tracker daemon -t
>> 
>> So after stopping the daemon, I just started tried the following:
>> 
>> valgrind --leak-check=full --log-file=valgrind-tracker-extract-log 
>> --num-callers=30 /usr/local/libexec/tracker-extract
>> valgrind --leak-check=full --log-file=valgrind-tracker-miner-fs-log 
>> --num-callers=30 /usr/local/libexec/tracker-miner-fs
>> 
>> Hopefully that will get you want you want?
>> 
>> I've uploaded the log files files to DropBox.  Hopefully you can easily grab 
>> those without having to jump through too many hoops. 
>> 
>> https://www.dropbox.com/s/o3w10hnaa6ikvn3/valgrind-tracker-extract-log.gz?dl=0
>> https://www.dropbox.com/s/5s4vqk0owrf5gjd/valgrind-tracker-miner-fs-log.gz?dl=0
>> 
>> I let them run for a bit.  I could definitely see RAM usage start to climb.  
>> I didn't bother to let it go to GB's in size.  I think I was about about 
>> 300MB when I hit Ctl-C.
>> 
>> Cheers!
>> -Joe Rhodes
>> 
>> 
>>> On Jan 10, 2016, at 2:25 PM, Carlos Garnacho <carl...@gnome.org> wrote:
>>> 
>>> Hi Joe,
>>> 
>>> On Sun, Jan 10, 2016 at 6:40 PM, Joe Rhodes <li...@joerhodes.com> wrote:
>>>> I have just compiled and installed tracker-1.7.1 on a CentOS 7.1 box.  I
>>>> just used the default configuration ("./configure" with no additional
>>>> options).  I'm indexing around  5 TB of data.  I'm noticing that both the
>>>> tracker-extract   and  tracker-miner-fs processes are using a large amount
>>>> of RAM.  The tracker-extract process is currently using 11 GB of RAM (RES
>>>> not VIRT as reported by top), while the tracker-miner-fs is sitting at 4.5
>>>> GB.
>>>> 
>>>> Both processes start out modestly, but continue to grow as they do their
>>>> work.  The tracker-miner-fs levels off at 4.5 GB once it appears to have
>>>> finished crawling the entire volume. (Once the CPU usage goes back down to
>>>> near 0.)   The tracker-extract process also continues to grow as it works.
>>>> Once it is done, it levels off.  Last time it stayed at about 9 GB.
>>>> 
>>>> If I restart tracker (with: 'tracker daemon -t' followed by 'tracker daemon
>>>> -s') a similar thing will happen with tracker-miner-fs.  It will grow back
>>>> to 4.5 GB as it crawls its way across the entire volume.  The
>>>> tracker-extract process though, because all of the files were just indexed
>>>> and it doesn't need to do much, uses a very modest amount of RAM. I don't
>>>> have that number right now because I'm re-indexing the entire volume, but
>>>> it's well below 100 MB.
>>>> 
>>>> Is this expected behaviour?  Or is there a memory leak?  Or perhaps tracker
>>>> just isn't designed to operate on this large of a volume?
>>> 
>>> It totally sounds like a memory leak, although it sounds strange that
>>> it hits both tracker-miner-fs and tracker-extract.
>>> 
>>> There is obviously an impact to running Tracker on large directory
>>> trees, such as:
>>> 
>>> - Possibly exhausted inotify handles, the directories we fail to
>>> create a monitor for would just be checked/updated on next miner
>>> startup
>>> - More (longer, rather) IO/CPU usage during startup, because the miner
>>> has to check mtimes for all directories and files
>>> - The miner also needs to keep an in-memory representation of the
>>> directory tree for accounting purposes (file monitors, etc). Regular
>>> files are represented in this model only as long as they're being
>>> checked/processed, and disappear soon after. This might account for a
>>> memory peak at startup, if there's many items left to process, because
>>> Tracker dumps files into processing queues ASAP, but I think the
>>> memory usage should be nowhere as big.
>>> 
>>> So I think nothing accounts for such memory usage in tracker-miner-fs,
>>> the only known source of unbound memory growth is the number of
>>> directories (and regular files for the peak at startup) to be indexed,
>>> but you would need millions of those to have tracker-miner-fs grow up
>>> to 4.5GB.
>>> 
>>> And tracker-extract has a much shorter memory, it just checks the
>>> files that need extraction in small batches, and processes those one
>>> by one before querying the next batch. 9GB shout memory leak, we've
>>> had other memory leak situations in tracker-extract, and the culprit
>>> most often is in the various libraries we're using in our extract
>>> modules, if many files end up triggering that module (and the leaky
>>> code path in the specific library), the effect will accumulate over
>>> time.
>>> 
>>> The downside of this situation is that most often we Tracker
>>> developers can't reproduce unless we have a file that triggers the
>>> leak so we can fix it or channel to the appropriate maintainers, so it
>>> would be great if you could provide valgrind logs, just run as:
>>> 
>>> valgrind --leak-check=full --log-file=valgrind-log --num-callers=30
>>> /path/to/built/tracker-extract
>>> 
>>> Hit ctrl-C when enough time has passed, and send back the valgrind-log
>>> file. Same applies to tracker-miner-fs.
>>> 
>>>> 
>>>> My tracker meta.db file is about 13 GB right now, though still growing.  I
>>>> suspect it's close to indexed though.
>>> 
>>> This is also suspicious, you again need either a hideous amount of
>>> files to have meta.db grow as large, or an equally hideous amount of
>>> plain text content that gets indexed. Out of curiosity, how many
>>> directories/files does that partition contain? is the content
>>> primarily video/documents/etc?
>>> 
>>> Cheers,
>>> Carlos
>> 
>> _______________________________________________
>> tracker-list mailing list
>> tracker-list@gnome.org
>> https://mail.gnome.org/mailman/listinfo/tracker-list
> 
> <0001-Fix-small-memory-leak.patch>

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Memory usage for tracker-extract and tracker-miner-fs

Reply via email to