On Tue, 2006-11-07 at 13:38 +0000, Jamie McCracken wrote: > please check the file in nautilus (which also uses xdgmime) to see how > it identifies it.
Nautilus identifies the file as mimetype text/plain. > also try opening it in gedit - if its invalid utf8 then it wont be able > to and therefore tracker was right to ignore it. Opening it in gedit poses no problems at all. In fact, I think it was created in gedit... > also rerun with --enable-debug to see more detailed info in log Done. The log says this after creating a copy (called 'other') of the problematic file: 07 Nov 2006, 22:06:38:735 - File /home/markrian/Desktop/other has finished changing 07 Nov 2006, 22:06:38:739 - file /home/markrian/Desktop/other is indexable 07 Nov 2006, 22:06:38:739 - /home/markrian/Desktop/other is not a text file 07 Nov 2006, 22:06:38:741 - saving basic metadata for *new* file /home/markrian/Desktop/other with mime unknown and service t ype 8 07 Nov 2006, 22:06:38:746 - file /home/markrian/Desktop/other is indexable 07 Nov 2006, 22:06:38:750 - 0 files are pending with count 0 07 Nov 2006, 22:06:44:139 - Total entities index : 1379 07 Nov 2006, 22:06:44:139 - Please wait while remaining data is flushed to the inverted word index. This may take some time... 07 Nov 2006, 22:06:44:141 - flushing data (2 words left) - please wait 07 Nov 2006, 22:06:44:142 - flushing data (1 words left) - please wait 07 Nov 2006, 22:06:44:143 - flushing data (0 words left) - please wait 07 Nov 2006, 22:06:44:143 - All data has been flushed - waiting for new file events... It seems odd that tracker thinks it's indexable, but isn't a text file, and has an unknown mimetype, doesn't it? Mark > > > > It seems that the mime type is unknown. If I do the same operation but > > append a .txt to the file name, the following happens: > > > > 07 Nov 2006, 12:53:32:504 - File /home/markrian/Desktop/place.txt has > > finished changing > > 07 Nov 2006, 12:53:32:508 - saving basic metadata for *new* > > file /home/markrian/Desktop/place.txt with mime text/plain and service > > type 6 > > 07 Nov 2006, 12:53:32:512 - Extracting Metadata for *new* > > file /home/markrian/Desktop/place.txt with mime text/plain and service > > type 6 > > 07 Nov 2006, 12:53:37:906 - Total entities index : 2576 > > 07 Nov 2006, 12:53:37:906 - Please wait while remaining data is flushed > > to the inverted word index. This may take some time... > > 07 Nov 2006, 12:53:37:912 - flushing data (17 words left) - please wait > > [...etc...] > > 07 Nov 2006, 12:53:37:920 - flushing data (0 words left) - please wait > > 07 Nov 2006, 12:53:37:920 - All data has been flushed - waiting for new > > file events... > > > > And searches for eaden return the result of places.txt, but nothing > > else. > > thats right because only recognised text files (or files than can be > converted to text) have their contents indexed. If tracker thinks its > not valid text then it wont get indexed (as is the case here) > > > > > If I search for 'jesus ball' then I get the result "Desktop/Jesus May > > Ball photos" as expected. > > > > The second problem is that I have a file called > > bills_Maids_Causeway.ods. If I run a search for "bills maids causeway" > > no results are returned. If I search for "bills_maids_causeway" then I > > get that one result. The same effect can be seen with files > > named-with-dashes-like-this.txt. > > thats deliberate - we do not treat underscores or hyphens as word breaks > so they are effectively one word > > (this is important for searching source code) > > if there are good reasons for also breaking them up then please let me > know (does beagle do this?) > > > > > > The third issue involves a test file I created, called 'whisper'. The > > file, which I created in gedit, contains only the line: > > > > I hear the sound of ticking clocks > > > > Tracker's log picked it up, and registered the file with the correct > > mimetype, text/plain. When I run a search for "ticking here", the file > > whisper appears in the results. Is this expected behaviour? I would have > > thought that search implied ticking AND here, not ticking OR here. > > "here" is a stopword and is ignored in a search - check the log file for > the exact search terms that were used > > > > > > > The fourth and final issue involves the other test file I created, > > own_way.txt, containing the line: > > > > To the end of the last page. > > > > Searching for 'last' returns no results, and the log contains the > > following interesting entries for this: > > > > 07 Nov 2006, 13:09:34:256 - Executing search with params Files, last > > 07 Nov 2006, 13:09:34:257 - tracker_indexer_get_hits: assertion > > `(indexer && words && words[0] && (limit > 0))' failed > > 07 Nov 2006, 13:09:34:257 - search returned no results > > "last" is a stopword. see /usr/share/data/languages/stopwords.en for > full list. > > (if you have set the language code to anything other than "en" then see > appropriate file) > > We should include code to make sure that assertion does not pop up in > those cases. > > > > > I hope this information helps. If anyone would like more details/tests, > > just say. Again, well done to all involved with this release! Tracker is > > *totally* awesome. > > only the first issue needs investigation really. ANy info from > --enable-debug would be helpful > > _______________________________________________ tracker-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/tracker-list
