[Tracker] tracker as full text index/search tool for a large collection of pdf, ps, djvu, dvi documents?

Meik Hellmund Sun, 05 Oct 2008 06:55:47 -0700

Dear tracker developers,

I have a collection of ca 10000 documents, mostly Postscript, PDF,
DjVu and DVI format and I am looking for a full text index/search
tool. I tried tracker 0.6.6 from Debian/unstable and have now some
questions where I didn't find the answer in the docu and faq.


 - Tracker works fine and great with PDF documents. Full points!
   That's what I am looking for. 
   But: 

 - It seems that Postscript, Dvi and Djvu documents are not fully
   indexed, only the metadata are used. How can I change this?

 - It seems that Djvu files are classified as "images".
   This may be true in a technical sense, but djvu is a format
   especially adopted for scanned text and most djvu documents are
   scanned books and similar. 
   I think you should reclassify them as "documents".   

 - How about compressed files? The documentation mentions that .gz
   files are supported. What about .bz2? Is it possible to add a filter
   for other compression methods?

 - Are there plans to extend the query capabilities with respect to
   the full text index? E.g., query for documents containing this but
   not that word, or containing some words in a small distance from
   each other? 

 - At the moment my collection of documents is mostly organized in a
   hierarchy of  directories. Is it possible to take this into
   account  in queries, e.g., query only for documents from a
   subtree of the indexed tree? 

I know this is quite a list of questions. Any pointers to answers of
any of them are really welcome. 

Of course, tracker may simply be the wrong tool for what I want. Any
pointers to alternatives are also welcome. 

-- 
Meik Hellmund
Mathematisches Institut, Uni Leipzig
e-mail: [EMAIL PROTECTED]
http://www.math.uni-leipzig.de/~hellmund
_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

[Tracker] tracker as full text index/search tool for a large collection of pdf, ps, djvu, dvi documents?

Reply via email to