Extra note
This one is in the patch. I'm posting it a second time to catch
Garnacho's full attention:
Index: src/tracker-indexer/tracker-module-metadata.c
===================================================================
--- src/tracker-indexer/tracker-module-metadata.c (revision 2683)
+++ src/tracker-indexer/tracker-module-metadata.c (working copy)
@@ -188,13 +188,15 @@
list = g_list_prepend (list, value);
data = list;
} else {
- /* FIXME: warn if data already exists */
+ data = g_hash_table_lookup (metadata->table, field);
+ g_free (data);
## If you don't do this then inserting the same field twice will
## leak the first
data = value;
}
- g_hash_table_insert (metadata->table,
- g_object_ref (field),
- data);
+ g_hash_table_replace (metadata->table,
+ g_object_ref (field),
+ data);
+
## If you don't use _replace here then the GHashTable wont exec the
## GDestroyNotify. This means that you always leak the reference
## per each time that you add a string to a list-element.
return TRUE;
}
On Tue, 2008-12-09 at 18:17 +0100, Philip Van Hoof wrote:
> Hi there,
>
> This is a new version of the patch.
>
> Garnacho: You want to review the part where I'm changing your precious
> src/tracker-indexer/tracker-module-metadata.c at line 1016 in this diff.
>
> Ivan: Continue committing in the "turtle" branch. I'll cope by merging
> your changes into the next patch and using some SVN command-line tricks.
>
> Martyn: Re-review ;-). I commented inline (see lower)
>
> All you guys: Have fun
>
>
> > 1. Please remember in the next patch to svn add the files which are
> > new so people can compile with the applied patch.
>
> Done this time
>
> > 2. Use AM_CONDITIONAL in configure.ac:
> >
> > +if test x$have_raptor == "xyes"; then
> > + AC_DEFINE(HAVE_RAPTOR, 1, [Raptor RDF parsers])
> > +fi
> >
> > Everywhere else does:
> >
> > AM_CONDITIONAL(HAVE_RAPTOR, test "$have_raptor" = "yes")
>
> AM_CONDITIONAL is not the same as getting the define in config.h.
> There's also a discussion started about making libraptor mandatory.
>
> The conclusion of that discussion will influence this configure.ac
> magic.
>
> I left this as is (using AC_DEFINE)
>
> > 3. Do we require a particular version of raptor?
>
> Not at this moment. Also this is affected by the discussion about
> libraptor taking place. Not adapting the patch for this yet.
>
> > 4. Do we need to check S.Enabled here and/or the Volumes table at all?
> > Also, isn't it more logical to use "M" instead of "F" for the
> > metadata
> >
> > [CUT SQLite stored statements]
>
> This is under Ivan's supervision. This new patch has some updates about
> the things that I will in this reply annotate as Ivan's responsibility.
>
> Not everything being done by Ivan is already in this patch yet, though.
>
> Ivan is continuing with fixing the issues in the "turtle" branch. I will
> continue merging my merged checkout of trunk with his improvements.
>
> So far these three are the merge commands executed:
>
> svn merge -r 2555:2615 ../branches/turtle/
> svn merge -r 2615:2680 ../branches/turtle/
> svn merge -r 2680:2682 ../branches/turtle
>
> About this specific issue, I think I saw Ivan commit something about
> these queries in the turtle branch. I also noticed some differences in
> the original merge patch, and this new one, about these queries.
>
> I'll let Ivan reply if he wants to about this.
>
> > 5. Is the notation "(out)" the same notation used in gtk-doc now?
> >
> > + * @mount_mount: (out): if @path is on a removable device, the mount
> > point will
> > + * be filled in here
>
> I have removed the annotations for now. The GObject-Introspection's
> annotation format has not yet been finalized. I have added a humanly
> explained comment about how to deal with @mount_point.
>
> > 6. This needs fixing in tracker-hal.c, also it doesn't say if it
> > returns
> > TRUE or FALSE, usually that's a good idea:
>
> > + * Returns Whether or not @path is on a known removable device
> > + *
> > + * Returns: Whether or not @path is on a known removable devic
>
> Fixed in the "Returns" annotation. Same here, I am not yet using the
> GObject-Introspection annotation syntax. I think it would be better if
> we make one huge patch adding annotations everywhere, than to do it
> sporadically API per API.
>
> > 6. This doesn't need to be cast:
> >
> > + udi = (const gchar*) key;
>
> Fixed
>
> >
> > 7. Indentation here is wrong and we could use g_strcmp0() instead so
> > we
> > don't have to check mp for NULL too:
> >
> > + if (mp && strcmp (mp, path) != 0) {
> > + if (g_strrstr (path, mp)) {
> >
>
> Done
>
>
> > 8. This looks like a memory leak, volume is not freed:
> >
> > if (available)
> > *available = libhal_volume_is_mounted
> > (volume);
> > break;
>
> It's not. Volume is freed. Double checked this. Maybe the original patch
> didn't have it. This now one, in any case, has it.
>
> + libhal_volume_free (volume);
>
> >
> > 10. The changes to tracker_data_metadata_insert() actually look like
> > they fix a potential leak in TRUNK, i.e. inserting the same thing
> > twice. Nice catch.
>
> Sure. Carefully review all those, though. I went over all users of the
> APIs to make sure they are still all validly using it. But feel free to
> add some extra eyes to that process.
>
> > 11. This looks like it would cause a crasher now that we don't own all
> > data sent to libtracker-data/tracker-data-metadata.c, I don't think we
> > should be freeing the list there or its contents:
> >
> > - g_return_if_fail (TRACKER_IS_FIELD (field));
> > + if (!field) {
> > + g_warning ("Field name '%s' has isn't described in the
> > ontology",
> > field_name);
> > + g_list_foreach (list, (GFunc) g_free, NULL);
> > + g_list_free (list);
> > + return;
> > + }
> > +
>
> This was indeed different in the Turtle branch and has indeed been
> refactored in Trunk. I have merge-rewritten these functions to act the
> way they are expected to act as Trunk got refactored to.
>
> > 12. Same as #11 for tracker_data_metadata_append_to_list() where we
> > free "value".
>
> Agreed, fixed too.
>
>
> > 13. Just wondering about the name here, shouldn't it be
> > GetMetadataBackup? I mean, the Turtle file is just a metadata backup
> > right, I don't quite know how the "Embedded" part comes into it.
> >
> > + result_set = tracker_data_manager_exec_proc (iface,
> > +
> > "GetEmbeddedMetadataBackup",
> > + NULL);
>
> This is under supervision of Ivan. I don't think it is fixed in this new
> version of the patch yet. I'll let Ivan reply and handle this issue.
>
> > 14. Please put _ALL_ structures and typedefs at the top of the file:
> >
> > +/* TODO: URI branch path -> uri */
> > +
> > +typedef struct {
> > + TrackerService *service;
> > + guint32 iid_value;
> > + TrackerLanguage *language;
> > + TrackerConfig *config;
> > +} ForeachInMetadataInfo;
> > +
>
> Fixed
>
> > 15. Please put all static variables at the top of the file:
> >
> > + static TrackerConfig *config = NULL;
> > + static TrackerLanguage *language = NULL;
>
> Fixed
>
> >
> > 16. When linning up variables don't add extra spaces:
> >
> > +void
> > +tracker_data_delete_service (const gchar *path,
> > + const gchar *rdf_type)
> > +{
> > + TrackerService *service;
> > + const gchar *service_type;
> > + guint32 service_id;
> >
> Fixed
>
> > 17. I wonder if we should recursively delete here even if there is no
> > top level service ID found - in case there is ever something below it
> > left in the database. Also remember, we changed this at the code camp
> > in Helsinki, I am still going through the patch to see if this is a
> > conflict at all:
> >
> > + if (service_id != 0) {
> > + tracker_data_update_delete_service (service, service_id);
> > + tracker_data_update_delete_service_recursively (service,
> > (gchar *) path);
> > + tracker_data_update_delete_all_metadata (service,
> > service_id);
> > + }
>
> Carlos replied that this is logically correct
>
>
> > 18. Can we format these arguments according to the coding style
> > please:
> >
> > +set_metadata (TrackerField *field, gpointer value,
> > ForeachInMetadataInfo *info)
>
> Fixed
>
> > 19. Every time I look at this, I think it is crack, that's why I added
> > the comment. I know we do this in the indexer too, but I wonder if we
> > should change this to something sensible with a reason too:
> >
> > + /* Throttle indexer, value 9 is from older code, why 9? */
> > + throttle = tracker_config_get_throttle (info->config);
> > + if (throttle > 9) {
> > + tracker_throttle (info->config, throttle * 100);
> > + }
>
> Sure, if the merge happens before the fix then thw two locations.
>
> I think we should keep the logic in this patch because what is
> functionally being added is basically the exact same. Except that it's
> coming from the Turtle file instead of from the FS scanner.
>
> I try to keep patches about functionality A separated from patches that
> fixes obscure weird hard coded B things.
>
> If trunk fixes this before this merge takes place, then let me know and
> I'll adapt this one to the fixed situation too. Of course.
>
> If it gets fixed after merge, we just fix the two locations. It's not
> that this is a flood of code here ;-) (just search for "why 9?").
>
> > 20. I would have used for() instead of while() for clarity here:
> >
> > + GList *list;
> > +
> > + list = value;
> > +
> > + while (list) {
> > + set_metadata (field, list->data, user_data);
> > + list = list->next;
> > + }
> >
> > i.e.
> >
> > for (list = value; list; list = list->next)
> > set_metadata (field, list->data, user_data);
>
> Fixed
>
> >
> > 21. I think the API should be a bit more uniform here, i.e.
> > tracker_data_turtle_import_{start|stop}:
> >
> > + void tracker_data_start_turtle_import (void);
> > + void tracker_data_stop_turtle_import (void);
>
>
> Fixed
>
> >
> > 22. Instead of duplicating the config and language modules, can we not
> > just call an initialize and termination function to do all this and
> > wrap all static variables in a GStaticPrivate? To save memory and
> > also if this is done more than once per invocation of the application
> > it really makes sense. Right now I think TrackerConfig and
> > TrackerLanguage are only ever created once per application. I would
> > like to keep it that way if possible.
> >
> > +void
> > +tracker_data_start_turtle_import (void)
> > +{
> > + if (!config)
> > + config = tracker_config_new ();
> > + if (!language)
> > + language = tracker_language_new (config);
> > +}
>
> Then please make TrackerConfig and TrackerLanguage singleton classes.
>
> > 23. All public functions we need variable checking, e.g.
> >
> > tracker_data_delete_service()
> > tracker_data_replace_service()
> >
> > need (for example):
> >
> > g_return_if_fail (path != NULL);
>
> Fixed
>
> > 24. We should use g_debug here not g_print, and the preprocessor
> > comments should be on column 0 at the start of the line.
> >
> > + #ifdef TURTLE_DEBUG
> > + g_print ("Q ERROR: %s\n", error->message);
> > + #endif /* TURTLE_DEBUG */
> > + g_error_free (error);
>
> Removed/Fixed/Adapted/Stuffed
>
> > 25. Indentation:
> >
> > + GValue id_value = { 0, };
> > + GValue is_value = { 0, };
> > + gint iid_value, iis_value;
> >
> > + guint32 id;
>
>
> Fixed
>
> > 26. API wise, what is the difference between
> >
> > tracker_data_delete_service()
> >
> > and
> >
> > tracker_data_update_delete_service()
> >
> > both look like they should do the same thing except one is the uniform
> > API which Jürg worked on. That should be clearer I think. The same
> > namespace should be used for the _replace_ variant too.
>
> The namespace has been fixed ('update' is added to it)
>
> The difference is that the new one removes using the path or (soon when
> the path->URI work is done) using the URI. The first removes using the
> service-id. This service-id is outside of the scope of the Turtle code,
> therefore is the functionality moved into the database specific pieces
> and is the Turtle code just calling it that way.
>
> >
> > 27. We need spaces in here:
> >
> > + info->config = config?g_object_ref (config):NULL;
> > + info->language = language?g_object_ref
> > (language):NULL;
>
> Fixed
>
>
> > 28. Bloody good catch guys :)
> >
> > - if (file_iface)
> > + if (file_iface) {
> > g_object_unref (file_iface);
> > - if (email_iface)
> > + file_iface = NULL;
> > + }
> > +
> > + if (email_iface) {
> > g_object_unref (email_iface);
> > - if (xesam_iface)
> > + email_iface = NULL;
> > + }
> > +
> > + if (xesam_iface) {
> > g_object_unref (xesam_iface);
> > + xesam_iface = NULL;
> > + }
>
> Np
>
>
> > 29. More spaces we don't need :)
> >
> > +void
> > +tracker_indexer_commit_transaction (TrackerIndexer *indexer)
> > +{
> > + stop_transaction (indexer);
> > + tracker_indexer_set_running (indexer, TRUE);
> > +
> > +}
> > +
> > +void
> > +tracker_indexer_open_transaction (TrackerIndexer *indexer)
> > +{
> > + tracker_indexer_set_running (indexer, FALSE);
> > + start_transaction (indexer);
> > +}
> > +
>
> Fixed
>
> >
> > 30. I would change tracker_indexer_commit_transaction() and
> > tracker_indexer_open_transaction() to be
> > tracker_indexer_transaction_*()
> > instead.
>
> Fixed
>
>
> > 31. There is no need to set mount_point to NULL after freeing it if it
> > is unused again.
>
> Fixed
>
>
> > 32. Shouldn't we do something in an "else" for the
> > tracker_hal_path_is_on_removable_device() for "path", like we do for
> > "other_path"?
> >
> > + if (tracker_hal_path_is_on_removable_device (indexer->private->hal,
> > + path,
> > + &mount_point,
> > + NULL)) {
> > + if (tracker_hal_path_is_on_removable_device
> > (indexer->private->hal,
> > + other_path,
> > + NULL,
> > + NULL)) {
> > + tracker_removable_device_add_move (indexer,
> > + mount_point,
> > + path,
> > + other_path,
> > +
> > tracker_service_get_name (service));
> > + } else {
> > + tracker_removable_device_add_removal (indexer,
> > + mount_point,
> > + path,
> > +
> > tracker_service_get_name (service));
> > + }
> > + }
>
> Nope, the logic of that code is good
>
>
> > 33. APIs like this really confuse me:
> >
> > tracker_removable_device_add_move()
> > tracker_removable_device_add_removal()
> >
> > Can we come up with something a bit more intuitive?
> >
>
> Nope. It means "add a record about a move", "add a record about a
> remove".
>
> It's not the same as updating your database as the Turtle file is an
> always-append format. You append "that you removed something" and "that
> you moved something". You don't go into the file and remove something,
> nor do you go into the file and move something. You append that piece of
> info, in the form of an "event", to the file. That's the format.
>
> That's why it's called that way.
>
> > 34. We don't need to allocate memory here:
> >
> > + values[0] = g_strdup (triple->object);
> > + values[1] = NULL;
> > ...
> > +
> > + g_free (values[0]);
> >
>
> Fixed
>
> > 35. All request IDs are generated before anything else. This is for
> > debugging purposes and so we know how many requests we have _REALLY_
> > received regardless of if there was some parameter check failure.
> > This
> > needs to be before any tracker_dbus_async_return_if_fail() calls:
> >
> > + request_id = tracker_dbus_get_next_request_id ();
> >
>
> Fixed
>
> > 36. This normally comes up as a warning depending on compiler flags,
> > "(void)" is needed:
> >
> > +static gchar *
> > +get_turtle_userdata_backup_filename ()
>
> Me fixed for Ivan
>
>
> > 37. Checking private is NULL or not is pointless here because it is
> > created in main() and never freed - I just noticed :) so I should fix
> > that. Also, we should store this value in private, since we often get
> > "finished" more than once from the indexer and recreating the string
> > here would be unnecessary:
> >
> > + if (private) {
> > + return g_build_filename (private->user_data_dir,
> > + "tracker-userdata-backup.ttl",
> > + NULL);
> > + } else {
> > + g_critical ("Directories not initialized");
> > + return NULL;
> > + }
>
> Under supervision of Ivan.
>
> > 38. This should be fixed now (sending finished twice at the same time).
> > But also you should know that "finished" is sent whenever the indexer
> > finishes indexing ANY content from ANY file changes on the disk. So this
> > solution is broken anyway. It will need fixing:
> >
> > + if (counter >= 2) {
>
> Under supervision of Ivan.
>
> >
> > 39. I would rather we are a bit more descriptive for variable names,
> > which callback_id this is, isn't obvious without searching further in
> > the code, this is in main() for trackerd:
> >
> > + gulong callback_id;
>
> Under supervision of Ivan.
>
> > 40. More spacing issues:
> >
> > if (force_reindex) {
> > +
> > + gchar *turtle_file;
> > +
> > + turtle_file = get_turtle_userdata_backup_filename ();
>
> Under supervision of Ivan.
>
>
> > 41. Eeeek, what is this for? You are duplicating a lot of stuff here!
> > The tracker_db_manager_init() is called just after this. The
> > tracker_db_index_manager_init() is also called after this. The
> > tracker_data_manager_init() is also called after this. As I understand
> > it you just want to save the turtle file after an index right -
> > specifically the initial index? If so, then why not just check the
> > "initial index" state using tracker_status_get_is_first_time_index().
> > You could also use this in the "finished" callback.
> >
> > This code looks to me like it would break a lot of things and it
> > shouldn't get committed:
> >
> > if (force_reindex) {
> > +
> > + gchar *turtle_file;
> > +
> > + turtle_file = get_turtle_userdata_backup_filename ();
> > +
> > + g_message ("Saving metadata in %s", turtle_file);
> > +
> > + /* Init the DB stack */
> > + tracker_db_manager_init (0, &is_first_time_index, TRUE);
> > +
> > + tracker_db_index_manager_init (0,
> > +
> > tracker_config_get_min_bucket_count (config),
> > +
> > tracker_config_get_max_bucket_count (config));
> > +
> > + file_index = tracker_db_index_manager_get_index
> > (TRACKER_DB_INDEX_FILE);
> > + email_index = tracker_db_index_manager_get_index
> > (TRACKER_DB_INDEX_EMAIL);
> > +
> > + tracker_data_manager_init (config, language, file_index,
> > email_index);
> > +
> > + tracker_backup_save (turtle_file);
> > +
> > + /* Shutdown the DB stack */
> > + tracker_data_manager_shutdown ();
> > +
> > + tracker_db_index_manager_shutdown ();
> > + tracker_db_manager_shutdown ();
> > +
> > + g_free (turtle_file);
> > +
>
> Under supervision of Ivan.
>
>
> > 42. I don't think you want to do this ONLY on FORCE_REINDEX. Since that
> > is rarely the case. Although I admit I am not entirely sure what
> > org_freedesktop_Tracker_Indexer_restore_backup() is supposed to do from
> > the API name. Does it write the Turtle file to the DB? If so, then why
> > would you want to write metadata to the database AFTER it has just been
> > written to with all the latest information from the indexer? If your
> > intention (on the other hand) is to write initial data to the database
> > on a reindex before the indexer has had a chance to index the real data
> > it gets, then I think this is going about it the wrong way.
> >
> > + if (flags & TRACKER_DB_MANAGER_FORCE_REINDEX) {
> > + g_debug ("Setting callback for crawling finish detection");
> > + callback_id = g_signal_connect (private->processor,
> > "finished",
> > + G_CALLBACK
> > (crawling_finished_cb),
> > + &callback_id);
> > + }
> > +
>
> Under supervision of Ivan.
>
>
> > 43. About tracker-turtle.h:
> >
> > a) Why is MetadataItem public, it looks unused in the header.
> > b) Why doesn't it use a proper namespace?
> > i.e. TrackerTurtleMetadataItem
> > c) Why don't the callback typedefs have Tracker in the namespace?
> > i.e. TrackerTurtleTripleCallback
> > d) What happened to the coding style here?
> > e) Include order is incorrect.
> > f) Includes should have config.h
> > g) G_BEGIN_DECLS and G_END_DECLS are needed.
> > h) Why not use Glib types here? gint, gchar, etc.
> > i) When I saw "stmt" in the code, I had to find out what it was, where
> > it came from, etc. Namespacing issues there again need sortingout.
> > j) I wouldn't include Jamie's name in the copyright since he didn't
> > write it.
>
> All fixed
>
>
> > 44. About tracker-turtle.c:
> >
> > a) The "tracker-turtle.h" include should come after all external
> > includes.
> > b) There is no need to include the same files already included in the
> > tracker-turtle.h.
> > c) In tracker_turtle_open(), you just print a critical if not
> > initialised. I would suggest you use g_return_val_if_fail () instead
> > since.
> > d) I would use g_fopen() not fopen() for compatibility across
> > platforms.
> > e) Is turtle->uri a uri or a path? Variable names should reflect this
> > accurately:
> > i.e. turtle->uri = raptor_new_uri ("/");
>
> raptor_new_uri converts the string to a URI, so it's a URI and that
> makes the variable name crystal clear (uri is a URI, indeed).
>
> > f) Can we use proper variable names like "count" instead of 3 letter
> > variable names that look awefully like another English word I dare
> > not
> > utter :)
> > i.e. guint cnt;
> > g) There are some serious code style issues there.
> > h) I wouldn't include Jamie's name in the copyright since he didn't
> > write it.
>
> All fixed
>
> > 45. About tracker-removable-device.h:
> >
> > a) Include order is incorrect.
> > b) G_BEGIN_DECLS and G_END_DECLS are needed.
> > c) I wouldn't include Jamie's name in the copyright since he didn't
> > write it.
>
> All fixed
>
> > d) tracker_removable_device_add_move() shouldn't this be
> > tracker_removable_device_move()?
>
> No, you are not moving a removable device. You are adding info about the
> moving of a file that is located on a removable device to the metadata
> of the removable device. Your API proposal would make it look like as if
> you re moving the removable device.
>
> > e) tracker_removable_device_add_removal() shouldn't this be
> > tracker_removable_device_removal()?
>
> Same. It has nothing to do with removing the removable device. This has
> to do with adding the removal of a file on the removable device to the
> metadata of the removable device.
>
>
> > 46. About tracker-removable-device.c:
> >
> > a) The #ifdef HAVE_RAPTOR at the top of this file (I would expect)
> > should go right to the end of the file since it is so high up, but it
> > doesn't.
>
> Uh? Reread the file. It's not because the ifdef is high-up that this
> means that the entire file is being shadowed away with the ifdef.
>
> > I think we should decide if we are making the whole file
> > require that define or just parts of it and fix it accordingly.
>
> Sure, this is being discussed on the ML atm. Leaving as-is.
>
> > b) Do you really need all those includes or is that a copy and paste
> > from tracker-indexer.c :)
>
> Fixed. Seems it was indeed a C/P from tracker-indexer.c.
>
> > c) Again all typedefs need proper namespacing including the Tracker
> > name
> > otherwise it is confusing as hell when you see it somewhere.
> > d) Coding style needs fixing.
>
> All fixed
>
>
--
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be
gnome: pvanhoof at gnome dot org
http://pvanhoof.be/blog
http://codeminded.be
_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list