Hi,

The main reason for not modifying the original extractor is that I want to
keep it as a fallback if this new extractor fails due to an unexpected file
structure. Since png-faster tries to skip to the end of the file by
estimating the location of the metadata contained in the end of the file
using the file size & IDAT chunk size, I predict it may fail more often
than the original. Since tracker-extract handles these failures gracefully,
this is not a problem however.

The best way I can see to get a similar functionality in to the existing
extractor would be to modify libpng to allow skipping to the end of the
file (right now there is a comment in the existing png extractor noting
that this functionality is missing from the library), but since reading the
PNG format is relatively simple I opted to put this functionality in the
extractor rather than first patching libpng (I am not sure how much work
this would be, either).

What are your thoughts on keeping png-faster as a separate, optional
extractor module which can be enabled when extraction speed is of primary
concern?


On 27 June 2013 19:06, Martyn Russell <mar...@lanedo.com> wrote:

> On 27/06/13 16:08, Jonatan Pålsson wrote:
>
>> On 27 June 2013 16:48, Aleksander Morgado <aleksan...@lanedo.com
>> <mailto:aleksan...@lanedo.com>**> wrote:
>>
>>     On 27/06/13 16:26, Jonatan Pålsson wrote:
>>      > To start with, I would like to submit a patch containing a new
>>     extractor
>>      > for PNG files, which is faster than the original.
>>      >
>>      > The reason behind the speed increase with this extractor compared
>>     to the
>>      > old extractor is that the new extractor seek()s out the metadata
>>     fields
>>      > in the PNG, rather than traverse the entire file to find them, as
>> the
>>      > old extractor did (using libpng).
>>
>>     Could you share some numbers on which is the actual speed improvement?
>>     E.g. extracting 1000 random PNGs before took Xs, now it takes Ys.
>>
>> Certainly!
>>
>> I'm running Tracker on a PandaBoard Rev A4. 1000 replicated PNGs were
>> used, I could make the replicated file available, there is nothing
>> special about it.
>> I used the following command to measure the running times:
>>
>> # For png-faster
>> tracker-control -r ; echo 3 > /proc/sys/vm/drop_caches ; sync ; sync ;
>> time /usr/lib/tracker/tracker-**miner-fs -v 0 --no-daemon
>>
>> # For the original PNG extractor
>> tracker-control -r ; /usr/lib/tracker/tracker-**extract -m png
>> echo 3 > /proc/sys/vm/drop_caches ; sync ; sync ; time
>> /usr/lib/tracker/tracker-**miner-fs -v 0 --no-daemon
>>
>> And here are the results:
>> # png-faster
>> real    0m14.804s
>> user    0m4.945s
>> sys     0m1.313s
>>
>> # original
>> real    1m33.274s
>> user    0m5.250s
>> sys     0m1.820s
>>
>
> That's quite some difference!
>
> Thanks for posting some numbers. Important! :)
>
> My first thought is, why did you create a new extractor instead of improve
> the original one?
>
> The patch link you gave is good, but I would love to see a diff from our
> actual extractor right now to see how easily we could merge the changes
> into that one.
>
> --
> Regards,
> Martyn
>
> Founder and CEO of Lanedo GmbH.
>



-- 
Regards,
Jonatan Pålsson

Pelagicore AB
Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden
_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to