Extra note This one is in the patch. I'm posting it a second time to catch Garnacho's full attention:
Index: src/tracker-indexer/tracker-module-metadata.c =================================================================== --- src/tracker-indexer/tracker-module-metadata.c (revision 2683) +++ src/tracker-indexer/tracker-module-metadata.c (working copy) @@ -188,13 +188,15 @@ list = g_list_prepend (list, value); data = list; } else { - /* FIXME: warn if data already exists */ + data = g_hash_table_lookup (metadata->table, field); + g_free (data); ## If you don't do this then inserting the same field twice will ## leak the first data = value; } - g_hash_table_insert (metadata->table, - g_object_ref (field), - data); + g_hash_table_replace (metadata->table, + g_object_ref (field), + data); + ## If you don't use _replace here then the GHashTable wont exec the ## GDestroyNotify. This means that you always leak the reference ## per each time that you add a string to a list-element. return TRUE; } On Tue, 2008-12-09 at 18:17 +0100, Philip Van Hoof wrote: > Hi there, > > This is a new version of the patch. > > Garnacho: You want to review the part where I'm changing your precious > src/tracker-indexer/tracker-module-metadata.c at line 1016 in this diff. > > Ivan: Continue committing in the "turtle" branch. I'll cope by merging > your changes into the next patch and using some SVN command-line tricks. > > Martyn: Re-review ;-). I commented inline (see lower) > > All you guys: Have fun > > > > 1. Please remember in the next patch to svn add the files which are > > new so people can compile with the applied patch. > > Done this time > > > 2. Use AM_CONDITIONAL in configure.ac: > > > > +if test x$have_raptor == "xyes"; then > > + AC_DEFINE(HAVE_RAPTOR, 1, [Raptor RDF parsers]) > > +fi > > > > Everywhere else does: > > > > AM_CONDITIONAL(HAVE_RAPTOR, test "$have_raptor" = "yes") > > AM_CONDITIONAL is not the same as getting the define in config.h. > There's also a discussion started about making libraptor mandatory. > > The conclusion of that discussion will influence this configure.ac > magic. > > I left this as is (using AC_DEFINE) > > > 3. Do we require a particular version of raptor? > > Not at this moment. Also this is affected by the discussion about > libraptor taking place. Not adapting the patch for this yet. > > > 4. Do we need to check S.Enabled here and/or the Volumes table at all? > > Also, isn't it more logical to use "M" instead of "F" for the > > metadata > > > > [CUT SQLite stored statements] > > This is under Ivan's supervision. This new patch has some updates about > the things that I will in this reply annotate as Ivan's responsibility. > > Not everything being done by Ivan is already in this patch yet, though. > > Ivan is continuing with fixing the issues in the "turtle" branch. I will > continue merging my merged checkout of trunk with his improvements. > > So far these three are the merge commands executed: > > svn merge -r 2555:2615 ../branches/turtle/ > svn merge -r 2615:2680 ../branches/turtle/ > svn merge -r 2680:2682 ../branches/turtle > > About this specific issue, I think I saw Ivan commit something about > these queries in the turtle branch. I also noticed some differences in > the original merge patch, and this new one, about these queries. > > I'll let Ivan reply if he wants to about this. > > > 5. Is the notation "(out)" the same notation used in gtk-doc now? > > > > + * @mount_mount: (out): if @path is on a removable device, the mount > > point will > > + * be filled in here > > I have removed the annotations for now. The GObject-Introspection's > annotation format has not yet been finalized. I have added a humanly > explained comment about how to deal with @mount_point. > > > 6. This needs fixing in tracker-hal.c, also it doesn't say if it > > returns > > TRUE or FALSE, usually that's a good idea: > > > + * Returns Whether or not @path is on a known removable device > > + * > > + * Returns: Whether or not @path is on a known removable devic > > Fixed in the "Returns" annotation. Same here, I am not yet using the > GObject-Introspection annotation syntax. I think it would be better if > we make one huge patch adding annotations everywhere, than to do it > sporadically API per API. > > > 6. This doesn't need to be cast: > > > > + udi = (const gchar*) key; > > Fixed > > > > > 7. Indentation here is wrong and we could use g_strcmp0() instead so > > we > > don't have to check mp for NULL too: > > > > + if (mp && strcmp (mp, path) != 0) { > > + if (g_strrstr (path, mp)) { > > > > Done > > > > 8. This looks like a memory leak, volume is not freed: > > > > if (available) > > *available = libhal_volume_is_mounted > > (volume); > > break; > > It's not. Volume is freed. Double checked this. Maybe the original patch > didn't have it. This now one, in any case, has it. > > + libhal_volume_free (volume); > > > > > 10. The changes to tracker_data_metadata_insert() actually look like > > they fix a potential leak in TRUNK, i.e. inserting the same thing > > twice. Nice catch. > > Sure. Carefully review all those, though. I went over all users of the > APIs to make sure they are still all validly using it. But feel free to > add some extra eyes to that process. > > > 11. This looks like it would cause a crasher now that we don't own all > > data sent to libtracker-data/tracker-data-metadata.c, I don't think we > > should be freeing the list there or its contents: > > > > - g_return_if_fail (TRACKER_IS_FIELD (field)); > > + if (!field) { > > + g_warning ("Field name '%s' has isn't described in the > > ontology", > > field_name); > > + g_list_foreach (list, (GFunc) g_free, NULL); > > + g_list_free (list); > > + return; > > + } > > + > > This was indeed different in the Turtle branch and has indeed been > refactored in Trunk. I have merge-rewritten these functions to act the > way they are expected to act as Trunk got refactored to. > > > 12. Same as #11 for tracker_data_metadata_append_to_list() where we > > free "value". > > Agreed, fixed too. > > > > 13. Just wondering about the name here, shouldn't it be > > GetMetadataBackup? I mean, the Turtle file is just a metadata backup > > right, I don't quite know how the "Embedded" part comes into it. > > > > + result_set = tracker_data_manager_exec_proc (iface, > > + > > "GetEmbeddedMetadataBackup", > > + NULL); > > This is under supervision of Ivan. I don't think it is fixed in this new > version of the patch yet. I'll let Ivan reply and handle this issue. > > > 14. Please put _ALL_ structures and typedefs at the top of the file: > > > > +/* TODO: URI branch path -> uri */ > > + > > +typedef struct { > > + TrackerService *service; > > + guint32 iid_value; > > + TrackerLanguage *language; > > + TrackerConfig *config; > > +} ForeachInMetadataInfo; > > + > > Fixed > > > 15. Please put all static variables at the top of the file: > > > > + static TrackerConfig *config = NULL; > > + static TrackerLanguage *language = NULL; > > Fixed > > > > > 16. When linning up variables don't add extra spaces: > > > > +void > > +tracker_data_delete_service (const gchar *path, > > + const gchar *rdf_type) > > +{ > > + TrackerService *service; > > + const gchar *service_type; > > + guint32 service_id; > > > Fixed > > > 17. I wonder if we should recursively delete here even if there is no > > top level service ID found - in case there is ever something below it > > left in the database. Also remember, we changed this at the code camp > > in Helsinki, I am still going through the patch to see if this is a > > conflict at all: > > > > + if (service_id != 0) { > > + tracker_data_update_delete_service (service, service_id); > > + tracker_data_update_delete_service_recursively (service, > > (gchar *) path); > > + tracker_data_update_delete_all_metadata (service, > > service_id); > > + } > > Carlos replied that this is logically correct > > > > 18. Can we format these arguments according to the coding style > > please: > > > > +set_metadata (TrackerField *field, gpointer value, > > ForeachInMetadataInfo *info) > > Fixed > > > 19. Every time I look at this, I think it is crack, that's why I added > > the comment. I know we do this in the indexer too, but I wonder if we > > should change this to something sensible with a reason too: > > > > + /* Throttle indexer, value 9 is from older code, why 9? */ > > + throttle = tracker_config_get_throttle (info->config); > > + if (throttle > 9) { > > + tracker_throttle (info->config, throttle * 100); > > + } > > Sure, if the merge happens before the fix then thw two locations. > > I think we should keep the logic in this patch because what is > functionally being added is basically the exact same. Except that it's > coming from the Turtle file instead of from the FS scanner. > > I try to keep patches about functionality A separated from patches that > fixes obscure weird hard coded B things. > > If trunk fixes this before this merge takes place, then let me know and > I'll adapt this one to the fixed situation too. Of course. > > If it gets fixed after merge, we just fix the two locations. It's not > that this is a flood of code here ;-) (just search for "why 9?"). > > > 20. I would have used for() instead of while() for clarity here: > > > > + GList *list; > > + > > + list = value; > > + > > + while (list) { > > + set_metadata (field, list->data, user_data); > > + list = list->next; > > + } > > > > i.e. > > > > for (list = value; list; list = list->next) > > set_metadata (field, list->data, user_data); > > Fixed > > > > > 21. I think the API should be a bit more uniform here, i.e. > > tracker_data_turtle_import_{start|stop}: > > > > + void tracker_data_start_turtle_import (void); > > + void tracker_data_stop_turtle_import (void); > > > Fixed > > > > > 22. Instead of duplicating the config and language modules, can we not > > just call an initialize and termination function to do all this and > > wrap all static variables in a GStaticPrivate? To save memory and > > also if this is done more than once per invocation of the application > > it really makes sense. Right now I think TrackerConfig and > > TrackerLanguage are only ever created once per application. I would > > like to keep it that way if possible. > > > > +void > > +tracker_data_start_turtle_import (void) > > +{ > > + if (!config) > > + config = tracker_config_new (); > > + if (!language) > > + language = tracker_language_new (config); > > +} > > Then please make TrackerConfig and TrackerLanguage singleton classes. > > > 23. All public functions we need variable checking, e.g. > > > > tracker_data_delete_service() > > tracker_data_replace_service() > > > > need (for example): > > > > g_return_if_fail (path != NULL); > > Fixed > > > 24. We should use g_debug here not g_print, and the preprocessor > > comments should be on column 0 at the start of the line. > > > > + #ifdef TURTLE_DEBUG > > + g_print ("Q ERROR: %s\n", error->message); > > + #endif /* TURTLE_DEBUG */ > > + g_error_free (error); > > Removed/Fixed/Adapted/Stuffed > > > 25. Indentation: > > > > + GValue id_value = { 0, }; > > + GValue is_value = { 0, }; > > + gint iid_value, iis_value; > > > > + guint32 id; > > > Fixed > > > 26. API wise, what is the difference between > > > > tracker_data_delete_service() > > > > and > > > > tracker_data_update_delete_service() > > > > both look like they should do the same thing except one is the uniform > > API which Jürg worked on. That should be clearer I think. The same > > namespace should be used for the _replace_ variant too. > > The namespace has been fixed ('update' is added to it) > > The difference is that the new one removes using the path or (soon when > the path->URI work is done) using the URI. The first removes using the > service-id. This service-id is outside of the scope of the Turtle code, > therefore is the functionality moved into the database specific pieces > and is the Turtle code just calling it that way. > > > > > 27. We need spaces in here: > > > > + info->config = config?g_object_ref (config):NULL; > > + info->language = language?g_object_ref > > (language):NULL; > > Fixed > > > > 28. Bloody good catch guys :) > > > > - if (file_iface) > > + if (file_iface) { > > g_object_unref (file_iface); > > - if (email_iface) > > + file_iface = NULL; > > + } > > + > > + if (email_iface) { > > g_object_unref (email_iface); > > - if (xesam_iface) > > + email_iface = NULL; > > + } > > + > > + if (xesam_iface) { > > g_object_unref (xesam_iface); > > + xesam_iface = NULL; > > + } > > Np > > > > 29. More spaces we don't need :) > > > > +void > > +tracker_indexer_commit_transaction (TrackerIndexer *indexer) > > +{ > > + stop_transaction (indexer); > > + tracker_indexer_set_running (indexer, TRUE); > > + > > +} > > + > > +void > > +tracker_indexer_open_transaction (TrackerIndexer *indexer) > > +{ > > + tracker_indexer_set_running (indexer, FALSE); > > + start_transaction (indexer); > > +} > > + > > Fixed > > > > > 30. I would change tracker_indexer_commit_transaction() and > > tracker_indexer_open_transaction() to be > > tracker_indexer_transaction_*() > > instead. > > Fixed > > > > 31. There is no need to set mount_point to NULL after freeing it if it > > is unused again. > > Fixed > > > > 32. Shouldn't we do something in an "else" for the > > tracker_hal_path_is_on_removable_device() for "path", like we do for > > "other_path"? > > > > + if (tracker_hal_path_is_on_removable_device (indexer->private->hal, > > + path, > > + &mount_point, > > + NULL)) { > > + if (tracker_hal_path_is_on_removable_device > > (indexer->private->hal, > > + other_path, > > + NULL, > > + NULL)) { > > + tracker_removable_device_add_move (indexer, > > + mount_point, > > + path, > > + other_path, > > + > > tracker_service_get_name (service)); > > + } else { > > + tracker_removable_device_add_removal (indexer, > > + mount_point, > > + path, > > + > > tracker_service_get_name (service)); > > + } > > + } > > Nope, the logic of that code is good > > > > 33. APIs like this really confuse me: > > > > tracker_removable_device_add_move() > > tracker_removable_device_add_removal() > > > > Can we come up with something a bit more intuitive? > > > > Nope. It means "add a record about a move", "add a record about a > remove". > > It's not the same as updating your database as the Turtle file is an > always-append format. You append "that you removed something" and "that > you moved something". You don't go into the file and remove something, > nor do you go into the file and move something. You append that piece of > info, in the form of an "event", to the file. That's the format. > > That's why it's called that way. > > > 34. We don't need to allocate memory here: > > > > + values[0] = g_strdup (triple->object); > > + values[1] = NULL; > > ... > > + > > + g_free (values[0]); > > > > Fixed > > > 35. All request IDs are generated before anything else. This is for > > debugging purposes and so we know how many requests we have _REALLY_ > > received regardless of if there was some parameter check failure. > > This > > needs to be before any tracker_dbus_async_return_if_fail() calls: > > > > + request_id = tracker_dbus_get_next_request_id (); > > > > Fixed > > > 36. This normally comes up as a warning depending on compiler flags, > > "(void)" is needed: > > > > +static gchar * > > +get_turtle_userdata_backup_filename () > > Me fixed for Ivan > > > > 37. Checking private is NULL or not is pointless here because it is > > created in main() and never freed - I just noticed :) so I should fix > > that. Also, we should store this value in private, since we often get > > "finished" more than once from the indexer and recreating the string > > here would be unnecessary: > > > > + if (private) { > > + return g_build_filename (private->user_data_dir, > > + "tracker-userdata-backup.ttl", > > + NULL); > > + } else { > > + g_critical ("Directories not initialized"); > > + return NULL; > > + } > > Under supervision of Ivan. > > > 38. This should be fixed now (sending finished twice at the same time). > > But also you should know that "finished" is sent whenever the indexer > > finishes indexing ANY content from ANY file changes on the disk. So this > > solution is broken anyway. It will need fixing: > > > > + if (counter >= 2) { > > Under supervision of Ivan. > > > > > 39. I would rather we are a bit more descriptive for variable names, > > which callback_id this is, isn't obvious without searching further in > > the code, this is in main() for trackerd: > > > > + gulong callback_id; > > Under supervision of Ivan. > > > 40. More spacing issues: > > > > if (force_reindex) { > > + > > + gchar *turtle_file; > > + > > + turtle_file = get_turtle_userdata_backup_filename (); > > Under supervision of Ivan. > > > > 41. Eeeek, what is this for? You are duplicating a lot of stuff here! > > The tracker_db_manager_init() is called just after this. The > > tracker_db_index_manager_init() is also called after this. The > > tracker_data_manager_init() is also called after this. As I understand > > it you just want to save the turtle file after an index right - > > specifically the initial index? If so, then why not just check the > > "initial index" state using tracker_status_get_is_first_time_index(). > > You could also use this in the "finished" callback. > > > > This code looks to me like it would break a lot of things and it > > shouldn't get committed: > > > > if (force_reindex) { > > + > > + gchar *turtle_file; > > + > > + turtle_file = get_turtle_userdata_backup_filename (); > > + > > + g_message ("Saving metadata in %s", turtle_file); > > + > > + /* Init the DB stack */ > > + tracker_db_manager_init (0, &is_first_time_index, TRUE); > > + > > + tracker_db_index_manager_init (0, > > + > > tracker_config_get_min_bucket_count (config), > > + > > tracker_config_get_max_bucket_count (config)); > > + > > + file_index = tracker_db_index_manager_get_index > > (TRACKER_DB_INDEX_FILE); > > + email_index = tracker_db_index_manager_get_index > > (TRACKER_DB_INDEX_EMAIL); > > + > > + tracker_data_manager_init (config, language, file_index, > > email_index); > > + > > + tracker_backup_save (turtle_file); > > + > > + /* Shutdown the DB stack */ > > + tracker_data_manager_shutdown (); > > + > > + tracker_db_index_manager_shutdown (); > > + tracker_db_manager_shutdown (); > > + > > + g_free (turtle_file); > > + > > Under supervision of Ivan. > > > > 42. I don't think you want to do this ONLY on FORCE_REINDEX. Since that > > is rarely the case. Although I admit I am not entirely sure what > > org_freedesktop_Tracker_Indexer_restore_backup() is supposed to do from > > the API name. Does it write the Turtle file to the DB? If so, then why > > would you want to write metadata to the database AFTER it has just been > > written to with all the latest information from the indexer? If your > > intention (on the other hand) is to write initial data to the database > > on a reindex before the indexer has had a chance to index the real data > > it gets, then I think this is going about it the wrong way. > > > > + if (flags & TRACKER_DB_MANAGER_FORCE_REINDEX) { > > + g_debug ("Setting callback for crawling finish detection"); > > + callback_id = g_signal_connect (private->processor, > > "finished", > > + G_CALLBACK > > (crawling_finished_cb), > > + &callback_id); > > + } > > + > > Under supervision of Ivan. > > > > 43. About tracker-turtle.h: > > > > a) Why is MetadataItem public, it looks unused in the header. > > b) Why doesn't it use a proper namespace? > > i.e. TrackerTurtleMetadataItem > > c) Why don't the callback typedefs have Tracker in the namespace? > > i.e. TrackerTurtleTripleCallback > > d) What happened to the coding style here? > > e) Include order is incorrect. > > f) Includes should have config.h > > g) G_BEGIN_DECLS and G_END_DECLS are needed. > > h) Why not use Glib types here? gint, gchar, etc. > > i) When I saw "stmt" in the code, I had to find out what it was, where > > it came from, etc. Namespacing issues there again need sortingout. > > j) I wouldn't include Jamie's name in the copyright since he didn't > > write it. > > All fixed > > > > 44. About tracker-turtle.c: > > > > a) The "tracker-turtle.h" include should come after all external > > includes. > > b) There is no need to include the same files already included in the > > tracker-turtle.h. > > c) In tracker_turtle_open(), you just print a critical if not > > initialised. I would suggest you use g_return_val_if_fail () instead > > since. > > d) I would use g_fopen() not fopen() for compatibility across > > platforms. > > e) Is turtle->uri a uri or a path? Variable names should reflect this > > accurately: > > i.e. turtle->uri = raptor_new_uri ("/"); > > raptor_new_uri converts the string to a URI, so it's a URI and that > makes the variable name crystal clear (uri is a URI, indeed). > > > f) Can we use proper variable names like "count" instead of 3 letter > > variable names that look awefully like another English word I dare > > not > > utter :) > > i.e. guint cnt; > > g) There are some serious code style issues there. > > h) I wouldn't include Jamie's name in the copyright since he didn't > > write it. > > All fixed > > > 45. About tracker-removable-device.h: > > > > a) Include order is incorrect. > > b) G_BEGIN_DECLS and G_END_DECLS are needed. > > c) I wouldn't include Jamie's name in the copyright since he didn't > > write it. > > All fixed > > > d) tracker_removable_device_add_move() shouldn't this be > > tracker_removable_device_move()? > > No, you are not moving a removable device. You are adding info about the > moving of a file that is located on a removable device to the metadata > of the removable device. Your API proposal would make it look like as if > you re moving the removable device. > > > e) tracker_removable_device_add_removal() shouldn't this be > > tracker_removable_device_removal()? > > Same. It has nothing to do with removing the removable device. This has > to do with adding the removal of a file on the removable device to the > metadata of the removable device. > > > > 46. About tracker-removable-device.c: > > > > a) The #ifdef HAVE_RAPTOR at the top of this file (I would expect) > > should go right to the end of the file since it is so high up, but it > > doesn't. > > Uh? Reread the file. It's not because the ifdef is high-up that this > means that the entire file is being shadowed away with the ifdef. > > > I think we should decide if we are making the whole file > > require that define or just parts of it and fix it accordingly. > > Sure, this is being discussed on the ML atm. Leaving as-is. > > > b) Do you really need all those includes or is that a copy and paste > > from tracker-indexer.c :) > > Fixed. Seems it was indeed a C/P from tracker-indexer.c. > > > c) Again all typedefs need proper namespacing including the Tracker > > name > > otherwise it is confusing as hell when you see it somewhere. > > d) Coding style needs fixing. > > All fixed > > -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be _______________________________________________ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list