The KDE thing is very interesting, thanks for the link!  I wash hoping for
something cross-platform though.

As regards using Nutch: how would it handle file updates?  It seems to me a
Web crawler would only get new files and changes on each crawl, whereas a
desktop search engine like Spotlight for instance indexes a file as soon as
it gets made or modified.

There's also this document I found on the Web: it describes some problems
with using Nutch on the personal scale owing to its specialization for web
crawling----it says there is a limit on files crawled per directory, and
size of files crawled.  This was all I was able to find under "Nutch desktop
search" in Google.  However, now that I look at it more closely it's from
2004, so it seems to me Nutch might have gotten rid of these problems in the
interim....

http://docs.google.com/viewer?a=v&q=cache:bDjjs__eYPcJ:www.commercenet.com/images/0/06/CN-TR-04-04.pdf+nutch+desktop+search&hl=en&gl=us&pid=bl&srcid=ADGEESg12Bq0VDGk3FpevwOHIdbfr1bCkEZ3CH1yojEliyfeCJv_3JhGRe1gMPx66LiywsUYFWJhKKzsLBVoCtATNcghrW4DRLWlT5sd4YhIWMVaQjMKs5xN-8vqTOHFV2pw9bzCtoQY&sig=AHIEtbTpxSL0xmZJxa5CWm8MzDWD4vyAAg

Thanks,

Andrew

On Mon, Aug 15, 2011 at 6:07 AM, Markus Jelsma
<[email protected]>wrote:

> With Nutch you can crawl your FS with ease and index to a Solr instance.
> It'll
> surely work. But you may also be interested in the cool KDE technologies
> that
> are specifically built for desktop search.
>
> http://thomasmcguire.wordpress.com/2009/10/03/akonadi-nepomuk-and-strigi-
> explained/
>
> On Monday 15 August 2011 04:41:11 Andrew Naylor wrote:
> > Any suggestions for the best way to get desktop search in the
> > Lucene/Solr/Nutch/Tika ecosystem?  I want to be able to access (from my
> own
> > program) lists of terms that are indexed and weights for each file, for
> > example, but if a filesystem indexer and index updater already exists
> > somewhere I'd like to use it rather than write my own.
> >
> > I'm planning on working in Clojure, btw, not that that should make any
> > difference---
> >
> > Thanks,
> >
> > Andrew
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Reply via email to