Mark Florian wrote:
> Well done on the release!
> 
> I downloaded and built the tarball on Ubuntu edgy, using the included
> debian directory without any changes (except 'dch -i'ing to make the
> package with the right version number), and then of course installed the
> new libtrackerclient0, tracker, tracker-gnome-search-tool, tracker-utils
> debs.

any help getting the debs to use the right version number would be 
appreciated - im at a lost ands thats why I did not ship debs last night

> 
> I removed the old ~/.Tracker directory and set the new tracker indexing,
> with the --turbo switch. After that was all done, I ran a few test
> searches, and unfortunately, there still seem to be problems. To be
> sure, I've since rebooted, re-removed ~/.Tracker and set trackerd to
> index everything again, but without --turbo. Same story.
> 
> Firstly, I have a (text) file called "Jesus May Ball photos" with
> various words inside. I ran a search for eaden (it's a name inside in
> file) and this returns no results. I then did a 'cp "Desktop/Jesus May
> Ball photos" Desktop/place', and watched ~/.Tracker/tracker.log. The
> result was this:
> 
> 07 Nov 2006, 12:33:02:221 - File /home/markrian/Desktop/place has
> finished changing
> 07 Nov 2006, 12:33:02:227 - saving basic metadata for *new*
> file /home/markrian/Desktop/place with mime unknown and service type 8
> 07 Nov 2006, 12:33:07:717 - Total entities index : 2567
> 07 Nov 2006, 12:33:07:717 - Please wait while remaining data is flushed
> to the inverted word index. This may take some time...
> 07 Nov 2006, 12:33:07:719 - flushing data (3 words left) - please wait
> 07 Nov 2006, 12:33:07:720 - flushing data (2 words left) - please wait
> 07 Nov 2006, 12:33:07:721 - flushing data (1 words left) - please wait
> 07 Nov 2006, 12:33:07:762 - flushing data (0 words left) - please wait
> 07 Nov 2006, 12:33:07:762 - All data has been flushed - waiting for new
> file events...
>


service type 8 is "other files" and is caused by both xdgmime reporting 
unknown mime *and* reading the first 4kb returns invalid utf8

please check the file in nautilus (which also uses xdgmime) to see how 
it identifies it.

also try opening it in gedit - if its invalid utf8 then it wont be able 
to and therefore tracker was right to ignore it.

also rerun with --enable-debug to see more detailed info in log


> It seems that the mime type is unknown. If I do the same operation but
> append a .txt to the file name, the following happens:
> 
> 07 Nov 2006, 12:53:32:504 - File /home/markrian/Desktop/place.txt has
> finished changing
> 07 Nov 2006, 12:53:32:508 - saving basic metadata for *new*
> file /home/markrian/Desktop/place.txt with mime text/plain and service
> type 6
> 07 Nov 2006, 12:53:32:512 - Extracting Metadata for *new*
> file /home/markrian/Desktop/place.txt with mime text/plain and service
> type 6
> 07 Nov 2006, 12:53:37:906 - Total entities index : 2576
> 07 Nov 2006, 12:53:37:906 - Please wait while remaining data is flushed
> to the inverted word index. This may take some time...
> 07 Nov 2006, 12:53:37:912 - flushing data (17 words left) - please wait
> [...etc...]
> 07 Nov 2006, 12:53:37:920 - flushing data (0 words left) - please wait
> 07 Nov 2006, 12:53:37:920 - All data has been flushed - waiting for new
> file events...
> 
> And searches for eaden return the result of places.txt, but nothing
> else.

thats right because only recognised text files (or files than can be 
converted to text) have their contents indexed. If tracker thinks its 
not valid text then it wont get indexed (as is the case here)

> 
> If I search for 'jesus ball' then I get the result "Desktop/Jesus May
> Ball photos" as expected.
> 
> The second problem is that I have a file called
> bills_Maids_Causeway.ods. If I run a search for "bills maids causeway"
> no results are returned. If I search for "bills_maids_causeway" then I
> get that one result. The same effect can be seen with files
> named-with-dashes-like-this.txt.

thats deliberate - we do not treat underscores or hyphens as word breaks 
so they are effectively one word

(this is important for searching source code)

if there are good reasons for also breaking them up then please let me 
know (does beagle do this?)


> 
> The third issue involves a test file I created, called 'whisper'. The
> file, which I created in gedit, contains only the line:
> 
> I hear the sound of ticking clocks
> 
> Tracker's log picked it up, and registered the file with the correct
> mimetype, text/plain. When I run a search for "ticking here", the file
> whisper appears in the results. Is this expected behaviour? I would have
> thought that search implied ticking AND here, not ticking OR here.

"here" is a stopword and is ignored in a search - check the log file for 
the exact search terms that were used



> 
> The fourth and final issue involves the other test file I created,
> own_way.txt, containing the line:
> 
> To the end of the last page.
> 
> Searching for 'last' returns no results, and the log contains the
> following interesting entries for this:
> 
> 07 Nov 2006, 13:09:34:256 - Executing search with params Files, last
> 07 Nov 2006, 13:09:34:257 - tracker_indexer_get_hits: assertion
> `(indexer && words && words[0] && (limit > 0))' failed
> 07 Nov 2006, 13:09:34:257 - search returned no results

"last" is a stopword. see /usr/share/data/languages/stopwords.en  for 
full list.

(if you have set the language code to anything other than "en" then see 
appropriate file)

We should include code to make sure that assertion does not pop up in 
those cases.

> 
> I hope this information helps. If anyone would like more details/tests,
> just say. Again, well done to all involved with this release! Tracker is
> *totally* awesome.

only the first issue needs investigation really. ANy info from 
--enable-debug would be helpful


-- 
Mr Jamie McCracken
http://jamiemcc.livejournal.com/

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to