Carlos, et. al., I'm sorry, but I cannot seem to build the master branch right now. I ran the autogen.sh script and then configure dies on me with this:
checking for pkg-config... /usr/bin/pkg-config checking pkg-config is at least version 0.16... yes ./configure: line 19136: syntax error near unexpected token `0.9.5' ./configure: line 19136: `GOBJECT_INTROSPECTION_CHECK(0.9.5)' I'm not entirely sure what's going on there. (Sorry, programming is not my forte.) I'll have to wait for 1.7.2 and give that a try. I can only work on this in the evenings when I'm not at work and the server thats housing all of this data is otherwise not terribly busy. Cheers! -Joe Rhodes > On Jan 11, 2016, at 5:21 AM, Philip Van Hoof <phi...@codeminded.be> wrote: > > Hi Carlos, > > Looks like my git-account has been closed on GNOME, so here is a patch > for one of the issues in that valgrind. > > > Kind regards, > > Philip > > On Sun, 2016-01-10 at 16:05 -0500, Joe Rhodes wrote: >> Carlos: >> >> Yes, there are a LOT of files on this volume. The makeup of the 5 TB of >> data is PDFs, Photoshop files, Word docs, InDesign & Illustrator docs. >> There are very few large files like MP3's or videos. If I disable all the >> extractors and just build an index based on file names, I get an index of >> about 3 GB. >> >> I did notice that I was possibly indexing all of my snapshots of my volumes. >> I'm using ZFS and they're available under "/volume/.zfs". I've added that >> folder to my list of excluded directories: >> >> org.freedesktop.Tracker.Miner.Files ignored-directories ['.zfs', 'ZZZ >> Snapshots', 'po', 'CVS', 'core-dumps', 'lost+found'] >> >> I'll see if that makes any difference. If it was digging into those, that >> would greatly increase the number of files. >> >> I'm not entirely sure how to start tracker with the valgrind command. >> Tracker is currently started automatically by the Netatalk file server >> process. In order to run the tracker processes, I have to execute the >> following: >> >> PREFIX="/main-storage" >> export XDG_DATA_HOME="$PREFIX/var/netatalk/" >> export XDG_CACHE_HOME="$PREFIX/var/netatalk/" >> export >> DBUS_SESSION_BUS_ADDRESS="unix:path=$PREFIX/var/netatalk/spotlight.ipc" >> /usr/local/bin/tracker daemon -t >> >> So after stopping the daemon, I just started tried the following: >> >> valgrind --leak-check=full --log-file=valgrind-tracker-extract-log >> --num-callers=30 /usr/local/libexec/tracker-extract >> valgrind --leak-check=full --log-file=valgrind-tracker-miner-fs-log >> --num-callers=30 /usr/local/libexec/tracker-miner-fs >> >> Hopefully that will get you want you want? >> >> I've uploaded the log files files to DropBox. Hopefully you can easily grab >> those without having to jump through too many hoops. >> >> https://www.dropbox.com/s/o3w10hnaa6ikvn3/valgrind-tracker-extract-log.gz?dl=0 >> https://www.dropbox.com/s/5s4vqk0owrf5gjd/valgrind-tracker-miner-fs-log.gz?dl=0 >> >> I let them run for a bit. I could definitely see RAM usage start to climb. >> I didn't bother to let it go to GB's in size. I think I was about about >> 300MB when I hit Ctl-C. >> >> Cheers! >> -Joe Rhodes >> >> >>> On Jan 10, 2016, at 2:25 PM, Carlos Garnacho <carl...@gnome.org> wrote: >>> >>> Hi Joe, >>> >>> On Sun, Jan 10, 2016 at 6:40 PM, Joe Rhodes <li...@joerhodes.com> wrote: >>>> I have just compiled and installed tracker-1.7.1 on a CentOS 7.1 box. I >>>> just used the default configuration ("./configure" with no additional >>>> options). I'm indexing around 5 TB of data. I'm noticing that both the >>>> tracker-extract and tracker-miner-fs processes are using a large amount >>>> of RAM. The tracker-extract process is currently using 11 GB of RAM (RES >>>> not VIRT as reported by top), while the tracker-miner-fs is sitting at 4.5 >>>> GB. >>>> >>>> Both processes start out modestly, but continue to grow as they do their >>>> work. The tracker-miner-fs levels off at 4.5 GB once it appears to have >>>> finished crawling the entire volume. (Once the CPU usage goes back down to >>>> near 0.) The tracker-extract process also continues to grow as it works. >>>> Once it is done, it levels off. Last time it stayed at about 9 GB. >>>> >>>> If I restart tracker (with: 'tracker daemon -t' followed by 'tracker daemon >>>> -s') a similar thing will happen with tracker-miner-fs. It will grow back >>>> to 4.5 GB as it crawls its way across the entire volume. The >>>> tracker-extract process though, because all of the files were just indexed >>>> and it doesn't need to do much, uses a very modest amount of RAM. I don't >>>> have that number right now because I'm re-indexing the entire volume, but >>>> it's well below 100 MB. >>>> >>>> Is this expected behaviour? Or is there a memory leak? Or perhaps tracker >>>> just isn't designed to operate on this large of a volume? >>> >>> It totally sounds like a memory leak, although it sounds strange that >>> it hits both tracker-miner-fs and tracker-extract. >>> >>> There is obviously an impact to running Tracker on large directory >>> trees, such as: >>> >>> - Possibly exhausted inotify handles, the directories we fail to >>> create a monitor for would just be checked/updated on next miner >>> startup >>> - More (longer, rather) IO/CPU usage during startup, because the miner >>> has to check mtimes for all directories and files >>> - The miner also needs to keep an in-memory representation of the >>> directory tree for accounting purposes (file monitors, etc). Regular >>> files are represented in this model only as long as they're being >>> checked/processed, and disappear soon after. This might account for a >>> memory peak at startup, if there's many items left to process, because >>> Tracker dumps files into processing queues ASAP, but I think the >>> memory usage should be nowhere as big. >>> >>> So I think nothing accounts for such memory usage in tracker-miner-fs, >>> the only known source of unbound memory growth is the number of >>> directories (and regular files for the peak at startup) to be indexed, >>> but you would need millions of those to have tracker-miner-fs grow up >>> to 4.5GB. >>> >>> And tracker-extract has a much shorter memory, it just checks the >>> files that need extraction in small batches, and processes those one >>> by one before querying the next batch. 9GB shout memory leak, we've >>> had other memory leak situations in tracker-extract, and the culprit >>> most often is in the various libraries we're using in our extract >>> modules, if many files end up triggering that module (and the leaky >>> code path in the specific library), the effect will accumulate over >>> time. >>> >>> The downside of this situation is that most often we Tracker >>> developers can't reproduce unless we have a file that triggers the >>> leak so we can fix it or channel to the appropriate maintainers, so it >>> would be great if you could provide valgrind logs, just run as: >>> >>> valgrind --leak-check=full --log-file=valgrind-log --num-callers=30 >>> /path/to/built/tracker-extract >>> >>> Hit ctrl-C when enough time has passed, and send back the valgrind-log >>> file. Same applies to tracker-miner-fs. >>> >>>> >>>> My tracker meta.db file is about 13 GB right now, though still growing. I >>>> suspect it's close to indexed though. >>> >>> This is also suspicious, you again need either a hideous amount of >>> files to have meta.db grow as large, or an equally hideous amount of >>> plain text content that gets indexed. Out of curiosity, how many >>> directories/files does that partition contain? is the content >>> primarily video/documents/etc? >>> >>> Cheers, >>> Carlos >> >> _______________________________________________ >> tracker-list mailing list >> tracker-list@gnome.org >> https://mail.gnome.org/mailman/listinfo/tracker-list > > <0001-Fix-small-memory-leak.patch> _______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list