On Tue, 21.10.14 14:48, Martyn Russell (mar...@lanedo.com) wrote: > >Hmm. I would always have assumed that tracker is strictly IO-bound, > >not CPU-bound, hence 100% sounds suspicious to me. What precisely is > >tracker doing there that it needs to crunch that much data? Just > >extracting some meta-data from a set of files doesn't sound like > >something CPU intensive. > > Tracker does quite a number of things that could require some level of > processing power. > > Just some off the top of my head: > > - Parsing words any any language in large quantities of text > - Unaccenting words > - Unicode normalization > - Case folding > - Stemming > > and more.
That all doesn't sound too excessive in CPU except if you index whole encyclopedias... But in this case I'd really time bound things: stop processing each file after 500ms or so of time spent on them. > It also depends on which process or binary you're talking about, but > extractors (like the one using poppler for PDFs) can easily require a LOT of > processing power to handle complex PDFs. We only care about the text > usually, but that's not always under our control unless we write our own > extractor. Well, it certainly sounds like a great chance to work together with the poppler folks to figure out a way to only hand you the text. But either way, given the variable quality of the extractors it really sounds as if you want to indivudally run them out-of-process and then kill after a fixed time limit of 500ms and continue with the next one. > >Well, looking at that bug it appears to me that this is caused because > >you try to use inotify for something it shouldn't be used for: to > >recursively watch an entire directory subtree. If you fake recursive > >fs watching by recursively adding all subdirs to the inotify watch > >then of course you eat a lot of CPU. > > In our experience, watching a tree and seeing changes in that tree through > inotify is not the expensive part (unless you're currency is FDs). It does > depend on what operations are taking place. Well, the bug report you linked suggests it is an inotify add loop that is the culprit here... > >The appropriate fix for this is to not make use of inotify this way, > >which might mean fixing the kernel to provide recursive subscription > >to fs events for unpriviliged processes. Sorry if that's > >disappointing, but no amount of cgriups can work around that. > > Not at all. Actually, I really would like something like that in the kernel > and user space has been asking for a while :) Well, I think I said this before: it's quite possible that the Linux kernel is not quite ready for something like Tracker. Apparently nobody is working to make it ready for Tracker though. So either you have to do the work yourself or find somebody to fix the missing bits (which would be to fix fanotify for unpriviliged clients). > >Don't try to work around limitations of kernel APIs by implementing > >inherently not scalabale algorithms in userspace. I mean, you > >implemented something that scales O(n) with n the numbers of > >dirs. That's what you need to fix, there's no way around that. Just > >looking for magic wands in cgroups and scheduling facilities to make > >an algorithm that fundamentally scales badly acceptable is not going > >to work. > > OK. > > Could I ask one more favour from you Lennart, could you possibly reply on > the bug report where your fellow RedHat-ers :) suggest using cgroups? > > https://bugzilla.gnome.org/show_bug.cgi?id=737663#c6 Well, just link this thread there really, I think that should be enough... > >This is misleading, as RLIMIT_AS and RLIMIT_DATA limit address space, > >not actual memory usage. In particularly limiting RLIMIT_AS like this > >is actively wrong as this just breaks mmap(). I mean, this is the > >beauty of modern memory management: you can set up mappings, and they > >are relatively cheap and only are backed by real RAM as soon as you > >access them. really, you should not limit RLIMIT_AS, it's not for what > >you think it is. > > When would you use this functionality? I struggle to see a use case. Well RLIMIT_AS is certainly not useful for your purpose, that's true. As it turns out both RLIMIT_DATA and RLIMIT_RSS are NOPs these days. Which basically means, that the memory cgroup controller is the only technology that would come close, but there you'd still have a problem to find a value to initialize its limit too (also, as mentioned, this isn't open to unpriviliged processes just now). Lennart -- Lennart Poettering, Red Hat _______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list