Re: [Tracker] Memory usage for tracker-extract and tracker-miner-fs

Philip Van Hoof Tue, 12 Jan 2016 03:07:19 -0800

Hi Joe,

If you run a debian based distro: apt-get build-dep tracker. Else in
this case you need to install a package which is usually called
gobject-introspection.


ps. for the office files you'll need GFS, libgfs-1-dev or something.

Kind regards,

Philip

On Mon, 2016-01-11 at 21:33 -0500, Joe Rhodes wrote:
> Carlos, et. al.,
> 
> I'm sorry, but I cannot seem to build the master branch right now.  I ran the 
> autogen.sh script and then configure dies on me with this:
> 
> checking for pkg-config... /usr/bin/pkg-config
> checking pkg-config is at least version 0.16... yes
> ./configure: line 19136: syntax error near unexpected token `0.9.5'
> ./configure: line 19136: `GOBJECT_INTROSPECTION_CHECK(0.9.5)'
> 
> 
> I'm not entirely sure what's going on there.  (Sorry, programming is not my 
> forte.)    I'll have to wait for 1.7.2 and give that a try.  I can only work 
> on this in the evenings when I'm not at work and the server thats housing all 
> of this data is otherwise not terribly busy.
> 
> Cheers!
> -Joe Rhodes
> 
> 
> 
> 
> > On Jan 11, 2016, at 5:21 AM, Philip Van Hoof <phi...@codeminded.be> wrote:
> > 
> > Hi Carlos,
> > 
> > Looks like my git-account has been closed on GNOME, so here is a patch
> > for one of the issues in that valgrind.
> > 
> > 
> > Kind regards,
> > 
> > Philip
> > 
> > On Sun, 2016-01-10 at 16:05 -0500, Joe Rhodes wrote:
> >> Carlos:
> >> 
> >> Yes, there are a LOT of files on this volume.  The makeup of the 5 TB of 
> >> data is PDFs, Photoshop files, Word docs, InDesign & Illustrator docs.  
> >> There are very few large files like MP3's or videos.   If I disable all 
> >> the extractors and just build an index based on file names, I get an index 
> >> of about 3 GB.  
> >> 
> >> I did notice that I was possibly indexing all of my snapshots of my 
> >> volumes. I'm using ZFS and they're available under "/volume/.zfs".  I've 
> >> added that folder to my list of excluded directories:
> >> 
> >> org.freedesktop.Tracker.Miner.Files ignored-directories ['.zfs', 'ZZZ 
> >> Snapshots', 'po', 'CVS', 'core-dumps', 'lost+found']
> >> 
> >> I'll see if that makes any difference.  If it was digging into those, that 
> >> would greatly increase the number of files.
> >> 
> >> I'm not entirely sure how to start tracker with the valgrind command.  
> >> Tracker is currently started automatically by the Netatalk file server 
> >> process.  In order to run the tracker processes, I have to execute the 
> >> following:
> >> 
> >> PREFIX="/main-storage"
> >> export XDG_DATA_HOME="$PREFIX/var/netatalk/"
> >> export XDG_CACHE_HOME="$PREFIX/var/netatalk/"
> >> export 
> >> DBUS_SESSION_BUS_ADDRESS="unix:path=$PREFIX/var/netatalk/spotlight.ipc"
> >> /usr/local/bin/tracker daemon -t
> >> 
> >> So after stopping the daemon, I just started tried the following:
> >> 
> >> valgrind --leak-check=full --log-file=valgrind-tracker-extract-log 
> >> --num-callers=30 /usr/local/libexec/tracker-extract
> >> valgrind --leak-check=full --log-file=valgrind-tracker-miner-fs-log 
> >> --num-callers=30 /usr/local/libexec/tracker-miner-fs
> >> 
> >> Hopefully that will get you want you want?
> >> 
> >> I've uploaded the log files files to DropBox.  Hopefully you can easily 
> >> grab those without having to jump through too many hoops. 
> >> 
> >> https://www.dropbox.com/s/o3w10hnaa6ikvn3/valgrind-tracker-extract-log.gz?dl=0
> >> https://www.dropbox.com/s/5s4vqk0owrf5gjd/valgrind-tracker-miner-fs-log.gz?dl=0
> >> 
> >> I let them run for a bit.  I could definitely see RAM usage start to 
> >> climb.  I didn't bother to let it go to GB's in size.  I think I was about 
> >> about 300MB when I hit Ctl-C.
> >> 
> >> Cheers!
> >> -Joe Rhodes
> >> 
> >> 
> >>> On Jan 10, 2016, at 2:25 PM, Carlos Garnacho <carl...@gnome.org> wrote:
> >>> 
> >>> Hi Joe,
> >>> 
> >>> On Sun, Jan 10, 2016 at 6:40 PM, Joe Rhodes <li...@joerhodes.com> wrote:
> >>>> I have just compiled and installed tracker-1.7.1 on a CentOS 7.1 box.  I
> >>>> just used the default configuration ("./configure" with no additional
> >>>> options).  I'm indexing around  5 TB of data.  I'm noticing that both the
> >>>> tracker-extract   and  tracker-miner-fs processes are using a large 
> >>>> amount
> >>>> of RAM.  The tracker-extract process is currently using 11 GB of RAM (RES
> >>>> not VIRT as reported by top), while the tracker-miner-fs is sitting at 
> >>>> 4.5
> >>>> GB.
> >>>> 
> >>>> Both processes start out modestly, but continue to grow as they do their
> >>>> work.  The tracker-miner-fs levels off at 4.5 GB once it appears to have
> >>>> finished crawling the entire volume. (Once the CPU usage goes back down 
> >>>> to
> >>>> near 0.)   The tracker-extract process also continues to grow as it 
> >>>> works.
> >>>> Once it is done, it levels off.  Last time it stayed at about 9 GB.
> >>>> 
> >>>> If I restart tracker (with: 'tracker daemon -t' followed by 'tracker 
> >>>> daemon
> >>>> -s') a similar thing will happen with tracker-miner-fs.  It will grow 
> >>>> back
> >>>> to 4.5 GB as it crawls its way across the entire volume.  The
> >>>> tracker-extract process though, because all of the files were just 
> >>>> indexed
> >>>> and it doesn't need to do much, uses a very modest amount of RAM. I don't
> >>>> have that number right now because I'm re-indexing the entire volume, but
> >>>> it's well below 100 MB.
> >>>> 
> >>>> Is this expected behaviour?  Or is there a memory leak?  Or perhaps 
> >>>> tracker
> >>>> just isn't designed to operate on this large of a volume?
> >>> 
> >>> It totally sounds like a memory leak, although it sounds strange that
> >>> it hits both tracker-miner-fs and tracker-extract.
> >>> 
> >>> There is obviously an impact to running Tracker on large directory
> >>> trees, such as:
> >>> 
> >>> - Possibly exhausted inotify handles, the directories we fail to
> >>> create a monitor for would just be checked/updated on next miner
> >>> startup
> >>> - More (longer, rather) IO/CPU usage during startup, because the miner
> >>> has to check mtimes for all directories and files
> >>> - The miner also needs to keep an in-memory representation of the
> >>> directory tree for accounting purposes (file monitors, etc). Regular
> >>> files are represented in this model only as long as they're being
> >>> checked/processed, and disappear soon after. This might account for a
> >>> memory peak at startup, if there's many items left to process, because
> >>> Tracker dumps files into processing queues ASAP, but I think the
> >>> memory usage should be nowhere as big.
> >>> 
> >>> So I think nothing accounts for such memory usage in tracker-miner-fs,
> >>> the only known source of unbound memory growth is the number of
> >>> directories (and regular files for the peak at startup) to be indexed,
> >>> but you would need millions of those to have tracker-miner-fs grow up
> >>> to 4.5GB.
> >>> 
> >>> And tracker-extract has a much shorter memory, it just checks the
> >>> files that need extraction in small batches, and processes those one
> >>> by one before querying the next batch. 9GB shout memory leak, we've
> >>> had other memory leak situations in tracker-extract, and the culprit
> >>> most often is in the various libraries we're using in our extract
> >>> modules, if many files end up triggering that module (and the leaky
> >>> code path in the specific library), the effect will accumulate over
> >>> time.
> >>> 
> >>> The downside of this situation is that most often we Tracker
> >>> developers can't reproduce unless we have a file that triggers the
> >>> leak so we can fix it or channel to the appropriate maintainers, so it
> >>> would be great if you could provide valgrind logs, just run as:
> >>> 
> >>> valgrind --leak-check=full --log-file=valgrind-log --num-callers=30
> >>> /path/to/built/tracker-extract
> >>> 
> >>> Hit ctrl-C when enough time has passed, and send back the valgrind-log
> >>> file. Same applies to tracker-miner-fs.
> >>> 
> >>>> 
> >>>> My tracker meta.db file is about 13 GB right now, though still growing.  
> >>>> I
> >>>> suspect it's close to indexed though.
> >>> 
> >>> This is also suspicious, you again need either a hideous amount of
> >>> files to have meta.db grow as large, or an equally hideous amount of
> >>> plain text content that gets indexed. Out of curiosity, how many
> >>> directories/files does that partition contain? is the content
> >>> primarily video/documents/etc?
> >>> 
> >>> Cheers,
> >>> Carlos
> >> 
> >> _______________________________________________
> >> tracker-list mailing list
> >> tracker-list@gnome.org
> >> https://mail.gnome.org/mailman/listinfo/tracker-list
> > 
> > <0001-Fix-small-memory-leak.patch>
>

signature.asc
Description: This is a digitally signed message part

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Memory usage for tracker-extract and tracker-miner-fs

Reply via email to