On Tue, 2006-11-07 at 13:38 +0000, Jamie McCracken wrote:
> please check the file in nautilus (which also uses xdgmime) to see how 
> it identifies it.

Nautilus identifies the file as mimetype text/plain.

> also try opening it in gedit - if its invalid utf8 then it wont be able 
> to and therefore tracker was right to ignore it.

Opening it in gedit poses no problems at all. In fact, I think it was
created in gedit...

> also rerun with --enable-debug to see more detailed info in log

Done. The log says this after creating a copy (called 'other') of the
problematic file:

07 Nov 2006, 22:06:38:735 - File /home/markrian/Desktop/other has
finished changing
07 Nov 2006, 22:06:38:739 - file /home/markrian/Desktop/other is
indexable
07 Nov 2006, 22:06:38:739 - /home/markrian/Desktop/other is not a text
file
07 Nov 2006, 22:06:38:741 - saving basic metadata for *new*
file /home/markrian/Desktop/other with mime unknown and service t
ype 8
07 Nov 2006, 22:06:38:746 - file /home/markrian/Desktop/other is
indexable
07 Nov 2006, 22:06:38:750 - 0 files are pending with count 0
07 Nov 2006, 22:06:44:139 - Total entities index : 1379
07 Nov 2006, 22:06:44:139 - Please wait while remaining data is flushed
to the inverted word index. This may take some time...
07 Nov 2006, 22:06:44:141 - flushing data (2 words left) - please wait
07 Nov 2006, 22:06:44:142 - flushing data (1 words left) - please wait
07 Nov 2006, 22:06:44:143 - flushing data (0 words left) - please wait
07 Nov 2006, 22:06:44:143 - All data has been flushed - waiting for new
file events...

It seems odd that tracker thinks it's indexable, but isn't a text file,
and has an unknown mimetype, doesn't it?

Mark


> 
> 
> > It seems that the mime type is unknown. If I do the same operation but
> > append a .txt to the file name, the following happens:
> > 
> > 07 Nov 2006, 12:53:32:504 - File /home/markrian/Desktop/place.txt has
> > finished changing
> > 07 Nov 2006, 12:53:32:508 - saving basic metadata for *new*
> > file /home/markrian/Desktop/place.txt with mime text/plain and service
> > type 6
> > 07 Nov 2006, 12:53:32:512 - Extracting Metadata for *new*
> > file /home/markrian/Desktop/place.txt with mime text/plain and service
> > type 6
> > 07 Nov 2006, 12:53:37:906 - Total entities index : 2576
> > 07 Nov 2006, 12:53:37:906 - Please wait while remaining data is flushed
> > to the inverted word index. This may take some time...
> > 07 Nov 2006, 12:53:37:912 - flushing data (17 words left) - please wait
> > [...etc...]
> > 07 Nov 2006, 12:53:37:920 - flushing data (0 words left) - please wait
> > 07 Nov 2006, 12:53:37:920 - All data has been flushed - waiting for new
> > file events...
> > 
> > And searches for eaden return the result of places.txt, but nothing
> > else.
> 
> thats right because only recognised text files (or files than can be 
> converted to text) have their contents indexed. If tracker thinks its 
> not valid text then it wont get indexed (as is the case here)
> 
> > 
> > If I search for 'jesus ball' then I get the result "Desktop/Jesus May
> > Ball photos" as expected.
> > 
> > The second problem is that I have a file called
> > bills_Maids_Causeway.ods. If I run a search for "bills maids causeway"
> > no results are returned. If I search for "bills_maids_causeway" then I
> > get that one result. The same effect can be seen with files
> > named-with-dashes-like-this.txt.
> 
> thats deliberate - we do not treat underscores or hyphens as word breaks 
> so they are effectively one word
> 
> (this is important for searching source code)
> 
> if there are good reasons for also breaking them up then please let me 
> know (does beagle do this?)
> 
> 
> > 
> > The third issue involves a test file I created, called 'whisper'. The
> > file, which I created in gedit, contains only the line:
> > 
> > I hear the sound of ticking clocks
> > 
> > Tracker's log picked it up, and registered the file with the correct
> > mimetype, text/plain. When I run a search for "ticking here", the file
> > whisper appears in the results. Is this expected behaviour? I would have
> > thought that search implied ticking AND here, not ticking OR here.
> 
> "here" is a stopword and is ignored in a search - check the log file for 
> the exact search terms that were used
> 
> 
> 
> > 
> > The fourth and final issue involves the other test file I created,
> > own_way.txt, containing the line:
> > 
> > To the end of the last page.
> > 
> > Searching for 'last' returns no results, and the log contains the
> > following interesting entries for this:
> > 
> > 07 Nov 2006, 13:09:34:256 - Executing search with params Files, last
> > 07 Nov 2006, 13:09:34:257 - tracker_indexer_get_hits: assertion
> > `(indexer && words && words[0] && (limit > 0))' failed
> > 07 Nov 2006, 13:09:34:257 - search returned no results
> 
> "last" is a stopword. see /usr/share/data/languages/stopwords.en  for 
> full list.
> 
> (if you have set the language code to anything other than "en" then see 
> appropriate file)
> 
> We should include code to make sure that assertion does not pop up in 
> those cases.
> 
> > 
> > I hope this information helps. If anyone would like more details/tests,
> > just say. Again, well done to all involved with this release! Tracker is
> > *totally* awesome.
> 
> only the first issue needs investigation really. ANy info from 
> --enable-debug would be helpful
> 
> 

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to