Hi, The main reason for not modifying the original extractor is that I want to keep it as a fallback if this new extractor fails due to an unexpected file structure. Since png-faster tries to skip to the end of the file by estimating the location of the metadata contained in the end of the file using the file size & IDAT chunk size, I predict it may fail more often than the original. Since tracker-extract handles these failures gracefully, this is not a problem however.
The best way I can see to get a similar functionality in to the existing extractor would be to modify libpng to allow skipping to the end of the file (right now there is a comment in the existing png extractor noting that this functionality is missing from the library), but since reading the PNG format is relatively simple I opted to put this functionality in the extractor rather than first patching libpng (I am not sure how much work this would be, either). What are your thoughts on keeping png-faster as a separate, optional extractor module which can be enabled when extraction speed is of primary concern? On 27 June 2013 19:06, Martyn Russell <mar...@lanedo.com> wrote: > On 27/06/13 16:08, Jonatan Pålsson wrote: > >> On 27 June 2013 16:48, Aleksander Morgado <aleksan...@lanedo.com >> <mailto:aleksan...@lanedo.com>**> wrote: >> >> On 27/06/13 16:26, Jonatan Pålsson wrote: >> > To start with, I would like to submit a patch containing a new >> extractor >> > for PNG files, which is faster than the original. >> > >> > The reason behind the speed increase with this extractor compared >> to the >> > old extractor is that the new extractor seek()s out the metadata >> fields >> > in the PNG, rather than traverse the entire file to find them, as >> the >> > old extractor did (using libpng). >> >> Could you share some numbers on which is the actual speed improvement? >> E.g. extracting 1000 random PNGs before took Xs, now it takes Ys. >> >> Certainly! >> >> I'm running Tracker on a PandaBoard Rev A4. 1000 replicated PNGs were >> used, I could make the replicated file available, there is nothing >> special about it. >> I used the following command to measure the running times: >> >> # For png-faster >> tracker-control -r ; echo 3 > /proc/sys/vm/drop_caches ; sync ; sync ; >> time /usr/lib/tracker/tracker-**miner-fs -v 0 --no-daemon >> >> # For the original PNG extractor >> tracker-control -r ; /usr/lib/tracker/tracker-**extract -m png >> echo 3 > /proc/sys/vm/drop_caches ; sync ; sync ; time >> /usr/lib/tracker/tracker-**miner-fs -v 0 --no-daemon >> >> And here are the results: >> # png-faster >> real 0m14.804s >> user 0m4.945s >> sys 0m1.313s >> >> # original >> real 1m33.274s >> user 0m5.250s >> sys 0m1.820s >> > > That's quite some difference! > > Thanks for posting some numbers. Important! :) > > My first thought is, why did you create a new extractor instead of improve > the original one? > > The patch link you gave is good, but I would love to see a diff from our > actual extractor right now to see how easily we could merge the changes > into that one. > > -- > Regards, > Martyn > > Founder and CEO of Lanedo GmbH. > -- Regards, Jonatan Pålsson Pelagicore AB Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden
_______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list