Re: about RAMDirectory based B/S plantform problem

2010-08-17 Thread xiaoyan Zheng
Hey, Anshum u mean Indexwriter based on RAMdirectory must be a singleton/static, yeah, that works, finally success[?][?][?][?][?], thanks a lot! Regards Hilly 2010/8/17 Anshum > Hi Hilly, > Seems like you are trying to use an already closed writer. Could you keep > the writer open and continu

Re: lucene usage on TREC data

2010-08-17 Thread Ramneek Maan Singh
Thanks for the info Glen. ~Ramneek On Sun, Aug 15, 2010 at 9:18 AM, Glen Newton wrote: > Lucene has been used - usually as a starting base that has been > modified for specific tasks - by a number of IR researchers for > various TREC challenges. Here are some (there are many more): > > IBM Haif

Re: cluster documents based on fields' values

2010-08-17 Thread Grant Ingersoll
Hi Nik, Inline below. On Aug 15, 2010, at 5:01 PM, Nik Kolev wrote: > Hi, > > I am researching the possibility of using Lucene for discovering > clusters of documents and since I am new to Lucene I decided to > ask the community for advice before I poke the APIs and the internals. > Your input w

Re: LUCENE-2456 (A Column-Oriented Cassandra-Based Lucene Directory)

2010-08-17 Thread Utku Can Topçu
Hi Otis, Thank you for the notice. I'll do so. "What happened with Lucandra?" Is really a hard question to answer. After testing the CassandraDirectory and Lucandra against a real-time stream of "large" data. I've concluded that the approach to make this data searchable in Lucene over Cassandra

RE: cluster documents based on fields' values

2010-08-17 Thread Nik Kolev
Thanks Grant. I'll take a look at Solr's faceting. A colleague of mine also discovered solr's clustering component - http://wiki.apache.org/solr/ClusteringComponent. It's still labeled as experimental - does anybody have experience with it? Another option (pointed out by your post: http://www.luc

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Michel Nadeau
Hi Erick, Here's some more details about our structure. First here's an example of document in our index : PrimaryKey= SJAsfsf353JHGada66GH6 (it's a hash) DocType = X Data = This is the data SearchableContent = This is the data DateCreated

Apache Dinner DUS (co-located with FSFE fellowship meetup)

2010-08-17 Thread Isabel Drost
Hello, the evening after FrOSCon - that is on August 22nd 2010 at 7:30p.m. CEST - a combined "FSFE Fellowship meetup/ Apache dinner*" takes place in Tigges in Düsseldorf (Brunnenstraße 1, at Bilker S-Bahnhof). Given it doesn't rain, we'll be sitting outside. Would be great to meet you there f

Re: LUCENE-2456 (A Column-Oriented Cassandra-Based Lucene Directory)

2010-08-17 Thread William Newport
I think it's similar to datagrid directory plugins which are likely higher performance than Cassandra but still have performance issues with large indexes. Sent from my iPhone On Aug 17, 2010, at 12:20 PM, Utku Can Topçu wrote: > Hi Otis, > > Thank you for the notice. I'll do so. > > "What happ

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Erick Erickson
If you have tens of millions of documents, almost all with unique fields that you're sorting on, you'll chew through memory like there's no tomorrow. Have you looked at trie fields? See: http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/ I'm a littl

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Michel Nadeau
Would our approach to limit the search top 250 documents (and then sort these 250 documents) work fine ? Or even 250 unique terms with a lot of users is bad on memory when sorting ? We didn't look at trie fields - I will do though, thanks for the tip ! We do store the original 'Data' field (only

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Erick Erickson
Hmmm, I glossed over your comment about sorting the top 250. There's no reason that wouldn't work. Well, one way for, say, dates is to store separate fields. , MM, DD, HH, MM, SS, MS. That gives you say, 100 year terms, + 12 month +31 days + for a very small total. You pay the price thoug

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Michel Nadeau
I could at least drop hours/mins/sec, we don't need them, so my timestamp could become 'MMDD', that would cut the number of unique terms at least for dates. What about my other question about numbers : *" We do pad our numbers with zeros though (for example: 10 becomes 0010, etc.) because

Re: "Natural sorting" of documents in a Lucene index - possible?

2010-08-17 Thread Ian Lea
Using NumericField for dates and other numbers is likely to help a lot, and removes padding problems. I'd try that first, or just sort the top n hits yourself. -- Ian. On Tue, Aug 17, 2010 at 8:46 PM, Michel Nadeau wrote: > I could at least drop hours/mins/sec, we don't need them, so my times

Solr SynonymFilter in Lucene analyzer

2010-08-17 Thread Arun Rangarajan
I am trying to have multi-word synonyms work in lucene using Solr's * SynonymFilter*. I need to match synonyms at index time, since many of the synonym lists are huge. Actually they are really not synonyms, but are words that belong to a concept. For example, I would like to map {"New York", "Los