Distributed Indexes, Searches and HDFS

2006-09-21 Thread Chris D
Hi List, As a bit of an experiment I'm redoing some of our indexing and searching code to try to make it easier to manage and distributed. The system has to modify its indexes frequently, sometimes in huge batches, and the documents in the indexes are frequently modified (deleted, modified and re

Re: Distributed Indexes, Searches and HDFS

2006-09-21 Thread Yonik Seeley
On 9/21/06, Chris D <[EMAIL PROTECTED]> wrote: The cronjob/link solution which is quite clean, doesn't work well in a windows environment. While it's my favorite, no dice... Rats. There may be hope yet for that on Windows. Hard links work on Windows, but the only problem is that you can't renam

Re: Distributed Indexes, Searches and HDFS

2006-09-22 Thread Michael McCandless
I think this is a great question ("what's the best way to really scale up Lucene?"). I don't have alot of experience in that area so I'll defer to others (and I'm eager to learn myself!). I think understanding Solr's overall approach (whose design I believe came out of the thread you've referen

Re: Distributed Indexes, Searches and HDFS

2006-09-22 Thread Chris D
Afternoon (here anyway), I think understanding Solr's overall approach (whose design I believe came out of the thread you've referenced) is also a good step here. Even if you can't re-use the hard links trick, you might be able to reuse its snapshotting & index distribution protocol. I'll ha