Hey :),
On Fri, Feb 24, 2017 at 11:19 AM, Debarshi Ray <[email protected]> wrote:
> Hey,
>
> Every now and then, there is a bug or regression in one of miners
> which leads to broken metadata being inserted into the database. For
> example, here is a low impact one:
> https://bugzilla.gnome.org/show_bug.cgi?id=767472#c48
>
> However, the breakage can be more serious and visibly impact
> applications. For example, this one:
> https://bugzilla.gnome.org/show_bug.cgi?id=776723
>
> It broke various fields in gnome-photos' properties dialog, and due to
> the wrong nfo:orientation values images were no longer correctly
> oriented.
>
> Bugs happen. Such is life. What is the recommended way to deal with
> these situations. So far I have been telling users to hard reset their
> database and restart the miners using the command line. I am afraid
> that isn't an elegant solution.
tracker reset --file is friendlier :), but I do agree. We already keep
several files in ~/.cache/tracker to ensure that the database itself
is up-to-date, the easy way would be adding a fuck-up counter for
miner-fs (and thus tracker-extract) to force reindex and thus do the
same maintenance with the database content.
Ideally, this would have a better granularity, if a bug only affects
image files, we shouldn't need reindexing everything from scratch. I
wonder if we could apply the same approach than we have for the FTS
tokenizer: keeping the most recent commit ID affecting it, and
checking it against a file, whenever it changes in the user setup, an
update is due.
However, the git files to track are rather varying and spread, there's
tracker-extract-*.c files, there's tracker-resource.c, there's eg.
tracker-xmp.c wherever it applies,... I guess that can get under
control soon.
>
> Do the Tracker miners version the metadata that they insert into the
> database? Or, is it possible to programmatically discard metadata
> coming from a certain miner and force a reindex?
There's no versioning... For dropping full miner data, I'd wish we
supported the DROP GRAPH syntax, all filesystem miners in tracker
share the same TRACKER_OWN_GRAPH_URN define.
This however could be open coded as:
"DELETE WHERE { GRAPH <" TRACKER_OWN_GRAPH "> { ?u a rdfs:Resource }}"
That should leave a clean slate for miners, still maybe a bit too clean :).
>
> In gnome-online-miners (those are the out-of-tree miners used by
> gnome-documents/photos to index online accounts advertised by
> gnome-online-accounts), we handle this by having each miner tag their
> insertions with nie:version (grep for 'version' in
> src/gom-miner.c). Whenever a bug that could have inserted broken
> metadata is fixed, we bump the miner version. When the user installs
> the updated miner, it will automatically purge the old metadata and
> re-index.
>
> So, any suggestions? Thoughts?
For data maintenance, I suggest you look into inserting g-o-m data
into its own graph, version management is more open to discussion. The
approach you picked seems indeed the nepomuk-y way, although I'm not
sure how much of a great argument that is nowadays :).
Cheers,
Carlos
_______________________________________________
tracker-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/tracker-list