Extra note

This one is in the patch. I'm posting it a second time to catch
Garnacho's full attention:

Index: src/tracker-indexer/tracker-module-metadata.c
===================================================================
--- src/tracker-indexer/tracker-module-metadata.c       (revision 2683)
+++ src/tracker-indexer/tracker-module-metadata.c       (working copy)
@@ -188,13 +188,15 @@
                list = g_list_prepend (list, value);
                data = list;
        } else {
-               /* FIXME: warn if data already exists */
+               data = g_hash_table_lookup (metadata->table, field);
+               g_free (data);


## If you don't do this then inserting the same field twice will 
## leak the first

                data = value;
        }
 
-       g_hash_table_insert (metadata->table,
-                            g_object_ref (field),
-                            data);
+       g_hash_table_replace (metadata->table,
+                             g_object_ref (field),
+                             data);
+

## If you don't use _replace here then the GHashTable wont exec the
## GDestroyNotify. This means that you always leak the reference
## per each time that you add a string to a list-element.


        return TRUE;
 }

On Tue, 2008-12-09 at 18:17 +0100, Philip Van Hoof wrote:
> Hi there,
> 
> This is a new version of the patch.
> 
> Garnacho: You want to review the part where I'm changing your precious 
> src/tracker-indexer/tracker-module-metadata.c at line 1016 in this diff.
> 
> Ivan: Continue committing in the "turtle" branch. I'll cope by merging
> your changes into the next patch and using some SVN command-line tricks.
> 
> Martyn: Re-review ;-). I commented inline (see lower)
> 
> All you guys: Have fun
> 
> 
> > 1. Please remember in the next patch to svn add the files which are
> > new  so people can compile with the applied patch.
> 
> Done this time
> 
> > 2. Use AM_CONDITIONAL in configure.ac:
> > 
> > +if test x$have_raptor == "xyes"; then
> > +  AC_DEFINE(HAVE_RAPTOR, 1, [Raptor RDF parsers])
> > +fi
> > 
> > Everywhere else does:
> > 
> >    AM_CONDITIONAL(HAVE_RAPTOR, test "$have_raptor" = "yes")
> 
> AM_CONDITIONAL is not the same as getting the define in config.h.
> There's also a discussion started about making libraptor mandatory. 
> 
> The conclusion of that discussion will influence this configure.ac
> magic.
> 
> I left this as is (using AC_DEFINE)
> 
> > 3. Do we require a particular version of raptor?
> 
> Not at this moment. Also this is affected by the discussion about
> libraptor taking place. Not adapting the patch for this yet.
> 
> > 4. Do we need to check S.Enabled here and/or the Volumes table at all?
> > Also, isn't it more logical to use "M" instead of "F" for the
> > metadata 
> > 
> > [CUT SQLite stored statements]
> 
> This is under Ivan's supervision. This new patch has some updates about
> the things that I will in this reply annotate as Ivan's responsibility.
> 
> Not everything being done by Ivan is already in this patch yet, though.
> 
> Ivan is continuing with fixing the issues in the "turtle" branch. I will
> continue merging my merged checkout of trunk with his improvements.
> 
> So far these three are the merge commands executed:
> 
> svn merge -r 2555:2615 ../branches/turtle/
> svn merge -r 2615:2680 ../branches/turtle/
> svn merge -r 2680:2682 ../branches/turtle
> 
> About this specific issue, I think I saw Ivan commit something about
> these queries in the turtle branch. I also noticed some differences in
> the original merge patch, and this new one, about these queries.
> 
> I'll let Ivan reply if he wants to about this.
> 
> > 5. Is the notation "(out)" the same notation used in gtk-doc now?
> > 
> > + * @mount_mount: (out): if @path is on a removable device, the mount 
> > point will
> > + * be filled in here
> 
> I have removed the annotations for now. The GObject-Introspection's
> annotation format has not yet been finalized. I have added a humanly
> explained comment about how to deal with @mount_point.
> 
> > 6. This needs fixing in tracker-hal.c, also it doesn't say if it
> > returns 
> > TRUE or FALSE, usually that's a good idea:
> 
> > + * Returns Whether or not @path is on a known removable device
> > + *
> > + * Returns: Whether or not @path is on a known removable devic
> 
> Fixed in the "Returns" annotation. Same here, I am not yet using the
> GObject-Introspection annotation syntax. I think it would be better if
> we make one huge patch adding annotations everywhere, than to do it
> sporadically API per API.
> 
> > 6. This doesn't need to be cast:
> > 
> > +               udi = (const gchar*) key;
> 
> Fixed
> 
> > 
> > 7. Indentation here is wrong and we could use g_strcmp0() instead so
> > we 
> > don't have to check mp for NULL too:
> > 
> > +               if (mp && strcmp (mp, path) != 0) {
> > +                 if (g_strrstr (path, mp)) {
> > 
> 
> Done
> 
> 
> > 8. This looks like a memory leak, volume is not freed:
> > 
> >                         if (available)
> >                                 *available = libhal_volume_is_mounted
> > (volume);
> >                         break;
> 
> It's not. Volume is freed. Double checked this. Maybe the original patch
> didn't have it. This now one, in any case, has it.
> 
> + libhal_volume_free (volume);
> 
> > 
> > 10. The changes to tracker_data_metadata_insert() actually look like 
> > they fix a potential leak in TRUNK, i.e. inserting the same thing
> > twice. Nice catch.
> 
> Sure. Carefully review all those, though. I went over all users of the
> APIs to make sure they are still all validly using it. But feel free to
> add some extra eyes to that process.
> 
> > 11. This looks like it would cause a crasher now that we don't own all
> > data sent to libtracker-data/tracker-data-metadata.c, I don't think we
> > should be freeing the list there or its contents:
> > 
> > -       g_return_if_fail (TRACKER_IS_FIELD (field));
> > +       if (!field) {
> > +               g_warning ("Field name '%s' has isn't described in the
> > ontology", 
> > field_name);
> > +               g_list_foreach (list, (GFunc) g_free, NULL);
> > +               g_list_free (list);
> > +               return;
> > +       }
> > +
> 
> This was indeed different in the Turtle branch and has indeed been
> refactored in Trunk. I have merge-rewritten these functions to act the
> way they are expected to act as Trunk got refactored to.
> 
> > 12. Same as #11 for tracker_data_metadata_append_to_list() where we
> > free "value".
> 
> Agreed, fixed too.
> 
> 
> > 13. Just wondering about the name here, shouldn't it be 
> > GetMetadataBackup? I mean, the Turtle file is just a metadata backup 
> > right, I don't quite know how the "Embedded" part comes into it.
> > 
> > +       result_set = tracker_data_manager_exec_proc (iface,
> > +
> > "GetEmbeddedMetadataBackup",
> > +                                                    NULL);
> 
> This is under supervision of Ivan. I don't think it is fixed in this new
> version of the patch yet. I'll let Ivan reply and handle this issue.
> 
> > 14. Please put _ALL_ structures and typedefs at the top of the file:
> > 
> > +/* TODO: URI branch path -> uri */
> > +
> > +typedef struct {
> > +       TrackerService *service;
> > +       guint32 iid_value;
> > +       TrackerLanguage *language;
> > +       TrackerConfig *config;
> > +} ForeachInMetadataInfo;
> > +
> 
> Fixed
> 
> > 15. Please put all static variables at the top of the file:
> > 
> > + static TrackerConfig *config = NULL;
> > + static TrackerLanguage *language = NULL;
> 
> Fixed
> 
> > 
> > 16. When linning up variables don't add extra spaces:
> > 
> > +void
> > +tracker_data_delete_service (const gchar         *path,
> > +                            const gchar         *rdf_type)
> > +{
> > +       TrackerService      *service;
> > +       const gchar         *service_type;
> > +       guint32              service_id;
> > 
> Fixed
> 
> > 17. I wonder if we should recursively delete here even if there is no 
> > top level service ID found - in case there is ever something below it 
> > left in the database. Also remember, we changed this at the code camp 
> > in Helsinki, I am still going through the patch to see if this is a 
> > conflict at all:
> > 
> > +       if (service_id != 0) {
> > +               tracker_data_update_delete_service (service, service_id);
> > +               tracker_data_update_delete_service_recursively (service, 
> > (gchar *) path);
> > +               tracker_data_update_delete_all_metadata (service, 
> > service_id);
> > +       }
> 
> Carlos replied that this is logically correct
> 
> 
> > 18. Can we format these arguments according to the coding style
> > please:
> > 
> > +set_metadata (TrackerField *field, gpointer value, 
> > ForeachInMetadataInfo *info)
> 
> Fixed
> 
> > 19. Every time I look at this, I think it is crack, that's why I added
> > the comment. I know we do this in the indexer too, but I wonder if we 
> > should change this to something sensible with a reason too:
> > 
> > +       /* Throttle indexer, value 9 is from older code, why 9? */
> > +       throttle = tracker_config_get_throttle (info->config);
> > +       if (throttle > 9) {
> > +               tracker_throttle (info->config, throttle * 100);
> > +       }
> 
> Sure, if the merge happens before the fix then thw two locations.
> 
> I think we should keep the logic in this patch because what is
> functionally being added is basically the exact same. Except that it's
> coming from the Turtle file instead of from the FS scanner.
> 
> I try to keep patches about functionality A separated from patches that
> fixes obscure weird hard coded B things.
> 
> If trunk fixes this before this merge takes place, then let me know and
> I'll adapt this one to the fixed situation too. Of course.
> 
> If it gets fixed after merge, we just fix the two locations. It's not
> that this is a flood of code here ;-) (just search for "why 9?").
> 
> > 20. I would have used for() instead of while() for clarity here:
> > 
> > +               GList *list;
> > +
> > +               list = value;
> > +
> > +               while (list) {
> > +                       set_metadata (field, list->data, user_data);
> > +                       list = list->next;
> > +               }
> > 
> > i.e.
> > 
> >    for (list = value; list; list = list->next)
> >           set_metadata (field, list->data, user_data);
> 
> Fixed
> 
> > 
> > 21. I think the API should be a bit more uniform here, i.e. 
> > tracker_data_turtle_import_{start|stop}:
> > 
> > + void     tracker_data_start_turtle_import               (void);
> > + void     tracker_data_stop_turtle_import                (void);
> 
> 
> Fixed
> 
> > 
> > 22. Instead of duplicating the config and language modules, can we not
> > just call an initialize and termination function to do all this and
> > wrap  all static variables in a GStaticPrivate? To save memory and
> > also if  this is done more than once per invocation of the application
> > it really  makes sense. Right now I think TrackerConfig and
> > TrackerLanguage are  only ever created once per application. I would
> > like to keep it that way  if possible.
> > 
> > +void
> > +tracker_data_start_turtle_import (void)
> > +{
> > +       if (!config)
> > +               config = tracker_config_new ();
> > +       if (!language)
> > +               language = tracker_language_new (config);
> > +}
> 
> Then please make TrackerConfig and TrackerLanguage singleton classes.
> 
> > 23. All public functions we need variable checking, e.g.
> > 
> >    tracker_data_delete_service()
> >    tracker_data_replace_service()
> > 
> > need (for example):
> > 
> >    g_return_if_fail (path != NULL);
> 
> Fixed
> 
> > 24. We should use g_debug here not g_print, and the preprocessor 
> > comments should be on column 0 at the start of the line.
> > 
> > +               #ifdef TURTLE_DEBUG
> > +               g_print ("Q ERROR: %s\n", error->message);
> > +               #endif /* TURTLE_DEBUG */
> > +               g_error_free (error);
> 
> Removed/Fixed/Adapted/Stuffed
> 
> > 25. Indentation:
> > 
> > +               GValue         id_value = { 0, };
> > +               GValue         is_value = { 0, };
> > +               gint           iid_value, iis_value;
> > 
> > +               guint32        id;
> 
> 
> Fixed
> 
> > 26. API wise, what is the difference between
> > 
> >    tracker_data_delete_service()
> > 
> > and
> > 
> >    tracker_data_update_delete_service()
> > 
> > both look like they should do the same thing except one is the uniform
> > API which Jürg worked on. That should be clearer I think. The same
> > namespace should be used for the _replace_ variant too.
> 
> The namespace has been fixed ('update' is added to it)
> 
> The difference is that the new one removes using the path or (soon when
> the path->URI work is done) using the URI. The first removes using the
> service-id. This service-id is outside of the scope of the Turtle code,
> therefore is the functionality moved into the database specific pieces
> and is the Turtle code just calling it that way.
> 
> > 
> > 27. We need spaces in here:
> > 
> > +               info->config = config?g_object_ref (config):NULL;
> > +               info->language = language?g_object_ref
> > (language):NULL;
> 
> Fixed
> 
> 
> > 28. Bloody good catch guys :)
> > 
> > -       if (file_iface)
> > +       if (file_iface) {
> >                 g_object_unref (file_iface);
> > -       if (email_iface)
> > +               file_iface = NULL;
> > +       }
> > +
> > +       if (email_iface) {
> >                 g_object_unref (email_iface);
> > -       if (xesam_iface)
> > +               email_iface = NULL;
> > +       }
> > +
> > +       if (xesam_iface) {
> >                 g_object_unref (xesam_iface);
> > +               xesam_iface = NULL;
> > +       }
> 
> Np
> 
> 
> > 29. More spaces we don't need :)
> > 
> > +void
> > +tracker_indexer_commit_transaction (TrackerIndexer *indexer)
> > +{
> > +       stop_transaction (indexer);
> > +       tracker_indexer_set_running (indexer, TRUE);
> > +
> > +}
> > +
> > +void
> > +tracker_indexer_open_transaction   (TrackerIndexer *indexer)
> > +{
> > +       tracker_indexer_set_running (indexer, FALSE);
> > +       start_transaction (indexer);
> > +}
> > +
> 
> Fixed
> 
> > 
> > 30. I would change tracker_indexer_commit_transaction() and 
> > tracker_indexer_open_transaction() to be
> > tracker_indexer_transaction_*() 
> > instead.
> 
> Fixed
> 
> 
> > 31. There is no need to set mount_point to NULL after freeing it if it
> > is unused again.
> 
> Fixed
> 
> 
> > 32. Shouldn't we do something in an "else" for the 
> > tracker_hal_path_is_on_removable_device() for "path", like we do for 
> > "other_path"?
> > 
> > +       if (tracker_hal_path_is_on_removable_device (indexer->private->hal,
> > +                                                    path,
> > +                                                    &mount_point,
> > +                                                    NULL)) {
> > +               if (tracker_hal_path_is_on_removable_device 
> > (indexer->private->hal,
> > +                                                    other_path,
> > +                                                    NULL,
> > +                                                    NULL)) {
> > +                       tracker_removable_device_add_move (indexer,
> > +                                                          mount_point,
> > +                                                          path,
> > +                                                          other_path,
> > +                                                          
> > tracker_service_get_name (service));
> > +               } else {
> > +                       tracker_removable_device_add_removal (indexer,
> > +                                                             mount_point,
> > +                                                             path,
> > +                                                             
> > tracker_service_get_name (service));
> > +               }
> > +       }
> 
> Nope, the logic of that code is good
> 
> 
> > 33. APIs like this really confuse me:
> > 
> >    tracker_removable_device_add_move()
> >    tracker_removable_device_add_removal()
> > 
> > Can we come up with something a bit more intuitive?
> > 
> 
> Nope. It means "add a record about a move", "add a record about a
> remove".
> 
> It's not the same as updating your database as the Turtle file is an
> always-append format. You append "that you removed something" and "that
> you moved something". You don't go into the file and remove something,
> nor do you go into the file and move something. You append that piece of
> info, in the form of an "event", to the file. That's the format.
> 
> That's why it's called that way.
> 
> > 34. We don't need to allocate memory here:
> > 
> > +       values[0] = g_strdup (triple->object);
> > +       values[1] = NULL;
> > ...
> > +
> > +       g_free (values[0]);
> > 
> 
> Fixed
> 
> > 35. All request IDs are generated before anything else. This is for 
> > debugging purposes and so we know how many requests we have _REALLY_ 
> > received regardless of if there was some parameter check failure.
> > This 
> > needs to be before any tracker_dbus_async_return_if_fail() calls:
> > 
> > +       request_id = tracker_dbus_get_next_request_id ();
> > 
> 
> Fixed
> 
> > 36. This normally comes up as a warning depending on compiler flags, 
> > "(void)" is needed:
> > 
> > +static gchar *
> > +get_turtle_userdata_backup_filename ()
> 
> Me fixed for Ivan
> 
> 
> > 37. Checking private is NULL or not is pointless here because it is 
> > created in main() and never freed - I just noticed :) so I should fix 
> > that. Also, we should store this value in private, since we often get 
> > "finished" more than once from the indexer and recreating the string 
> > here would be unnecessary:
> > 
> > +       if (private) {
> > +               return g_build_filename (private->user_data_dir,
> > +                                        "tracker-userdata-backup.ttl",
> > +                                        NULL);
> > +       } else {
> > +               g_critical ("Directories not initialized");
> > +               return NULL;
> > +       }
> 
> Under supervision of Ivan.
> 
> > 38. This should be fixed now (sending finished twice at the same time). 
> > But also you should know that "finished" is sent whenever the indexer 
> > finishes indexing ANY content from ANY file changes on the disk. So this 
> > solution is broken anyway. It will need fixing:
> > 
> > +       if (counter >= 2) {
> 
> Under supervision of Ivan.
> 
> > 
> > 39. I would rather we are a bit more descriptive for variable names, 
> > which callback_id this is, isn't obvious without searching further in 
> > the code, this is in main() for trackerd:
> > 
> > +       gulong                      callback_id;
> 
> Under supervision of Ivan.
> 
> > 40. More spacing issues:
> > 
> >         if (force_reindex) {
> > +
> > +               gchar              *turtle_file;
> > +
> > +               turtle_file = get_turtle_userdata_backup_filename ();
> 
> Under supervision of Ivan.
> 
> 
> > 41. Eeeek, what is this for? You are duplicating a lot of stuff here! 
> > The tracker_db_manager_init() is called just after this. The 
> > tracker_db_index_manager_init() is also called after this. The 
> > tracker_data_manager_init() is also called after this. As I understand 
> > it you just want to save the turtle file after an index right - 
> > specifically the initial index? If so, then why not just check the 
> > "initial index" state using tracker_status_get_is_first_time_index(). 
> > You could also use this in the "finished" callback.
> > 
> > This code looks to me like it would break a lot of things and it 
> > shouldn't get committed:
> > 
> >         if (force_reindex) {
> > +
> > +               gchar              *turtle_file;
> > +
> > +               turtle_file = get_turtle_userdata_backup_filename ();
> > +
> > +               g_message ("Saving metadata in %s", turtle_file);
> > +
> > +               /* Init the DB stack */
> > +               tracker_db_manager_init (0, &is_first_time_index, TRUE);
> > +
> > +               tracker_db_index_manager_init (0,
> > +                                              
> > tracker_config_get_min_bucket_count (config),
> > +                                              
> > tracker_config_get_max_bucket_count (config));
> > +               
> > +               file_index = tracker_db_index_manager_get_index 
> > (TRACKER_DB_INDEX_FILE);
> > +               email_index = tracker_db_index_manager_get_index 
> > (TRACKER_DB_INDEX_EMAIL);
> > +               
> > +               tracker_data_manager_init (config, language, file_index, 
> > email_index);
> > +               
> > +               tracker_backup_save (turtle_file);
> > +
> > +               /* Shutdown the DB stack */
> > +               tracker_data_manager_shutdown ();
> > +               
> > +               tracker_db_index_manager_shutdown ();
> > +               tracker_db_manager_shutdown ();
> > +
> > +               g_free (turtle_file);
> > +
> 
> Under supervision of Ivan.
> 
> 
> > 42. I don't think you want to do this ONLY on FORCE_REINDEX. Since that 
> > is rarely the case. Although I admit I am not entirely sure what 
> > org_freedesktop_Tracker_Indexer_restore_backup() is supposed to do from 
> > the API name. Does it write the Turtle file to the DB? If so, then why 
> > would you want to write metadata to the database AFTER it has just been 
> > written to with all the latest information from the indexer? If your 
> > intention (on the other hand) is to write initial data to the database 
> > on a reindex before the indexer has had a chance to index the real data 
> > it gets, then I think this is going about it the wrong way.
> > 
> > +       if (flags & TRACKER_DB_MANAGER_FORCE_REINDEX) {
> > +               g_debug ("Setting callback for crawling finish detection");
> > +               callback_id = g_signal_connect (private->processor, 
> > "finished",
> > +                                               G_CALLBACK 
> > (crawling_finished_cb),
> > +                                               &callback_id);
> > +       }
> > +
> 
> Under supervision of Ivan.
> 
> 
> > 43. About tracker-turtle.h:
> > 
> > a) Why is MetadataItem public, it looks unused in the header.
> > b) Why doesn't it use a proper namespace?
> >     i.e. TrackerTurtleMetadataItem
> > c) Why don't the callback typedefs have Tracker in the namespace?
> >     i.e. TrackerTurtleTripleCallback
> > d) What happened to the coding style here?
> > e) Include order is incorrect.
> > f) Includes should have config.h
> > g) G_BEGIN_DECLS and G_END_DECLS are needed.
> > h) Why not use Glib types here? gint, gchar, etc.
> > i) When I saw "stmt" in the code, I had to find out what it was, where
> >     it came from, etc. Namespacing issues there again need sortingout.
> > j) I wouldn't include Jamie's name in the copyright since he didn't 
> > write it.
> 
> All fixed
> 
> 
> > 44. About tracker-turtle.c:
> > 
> > a) The "tracker-turtle.h" include should come after all external
> > includes.
> > b) There is no need to include the same files already included in the 
> > tracker-turtle.h.
> > c) In tracker_turtle_open(), you just print a critical if not 
> > initialised. I would suggest you use g_return_val_if_fail () instead
> > since.
> > d) I would use g_fopen() not fopen() for compatibility across
> > platforms.
> > e) Is turtle->uri a uri or a path? Variable names should reflect this 
> > accurately:
> >     i.e. turtle->uri = raptor_new_uri ("/");
> 
> raptor_new_uri converts the string to a URI, so it's a URI and that
> makes the variable name crystal clear (uri is a URI, indeed).
> 
> > f) Can we use proper variable names like "count" instead of 3 letter 
> > variable names that look awefully like another English word I dare
> > not 
> > utter :)
> >     i.e. guint cnt;
> > g) There are some serious code style issues there.
> > h) I wouldn't include Jamie's name in the copyright since he didn't 
> > write it.
> 
> All fixed
> 
> > 45. About tracker-removable-device.h:
> > 
> > a) Include order is incorrect.
> > b) G_BEGIN_DECLS and G_END_DECLS are needed.
> > c) I wouldn't include Jamie's name in the copyright since he didn't 
> > write it.
> 
> All fixed
> 
> > d) tracker_removable_device_add_move() shouldn't this be 
> > tracker_removable_device_move()?
> 
> No, you are not moving a removable device. You are adding info about the
> moving of a file that is located on a removable device to the metadata
> of the removable device. Your API proposal would make it look like as if
> you re moving the removable device.
> 
> > e) tracker_removable_device_add_removal() shouldn't this be 
> > tracker_removable_device_removal()?
> 
> Same. It has nothing to do with removing the removable device. This has
> to do with adding the removal of a file on the removable device to the
> metadata of the removable device.
> 
> 
> > 46. About tracker-removable-device.c:
> > 
> > a) The #ifdef HAVE_RAPTOR at the top of this file (I would expect) 
> > should go right to the end of the file since it is so high up, but it 
> > doesn't. 
> 
> Uh? Reread the file. It's not because the ifdef is high-up that this
> means that the entire file is being shadowed away with the ifdef.
> 
> > I think we should decide if we are making the whole file 
> > require that define or just parts of it and fix it accordingly.
> 
> Sure, this is being discussed on the ML atm. Leaving as-is.
> 
> > b) Do you really need all those includes or is that a copy and paste 
> > from tracker-indexer.c :)
> 
> Fixed. Seems it was indeed a C/P from tracker-indexer.c.
> 
> > c) Again all typedefs need proper namespacing including the Tracker
> > name 
> > otherwise it is confusing as hell when you see it somewhere.
> > d) Coding style needs fixing.
> 
> All fixed
> 
> 
-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be

_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to