Re: FieldCache

2011-10-22 Thread Simon Willnauer
I think i'd try to use a bitset instead of a string for your categories, is that possible? how many categories do you have roughly? simon On Sat, Oct 22, 2011 at 6:01 AM, Peyman Faratin wrote: > Hi > > I have a field that is indexed as follows > > for(String c: article.getCategories()){ >      

Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
Hi All, I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..." (http://na11.apachecon.com/talks/18396). It's based on my observation, that over the years, a number of us in the community have done some pretty cool things using Lucene that don't fit under the core premise of

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Paul Libbrecht
Grant, for years the ActiveMath learning environment has been using as storage engine. At the time (~2004), it was by far the best storage engine ever doable in a pure java-world. Now it still is perfect in terms of performance. We had an issue with the separate versions where the stored-fields w

Re: No longer able to set merge factor since updating to Lucene 3.4

2011-10-22 Thread Michael McCandless
Hmm, this is because as of 3.2.0 the default MergePolicy is now TieredMergePolicy. But: if you pass Version.LUCENE_31 when you create the IndexWriterConfig you should get the old default (LogMergePolicy) and then IW.setMergeFactor should work. But it's better to use TieredMergePolicy (it's able t

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Sujit Pal
Hi Grant, Not sure if this qualifies as a "bet you didn't know", but one could use Lucene term vectors to construct document vectors for similarity, clustering and classification tasks. I found this out recently (although I am probably not the first one), and I think this could be quite useful. -

using lucene to find neighbouring points in an n-dimensional space

2011-10-22 Thread prasenjit mukherjee
My use case is the following : Given an n-dimensional vector ( only +ve quadrants/points ) find its closest neighbours. I would like to try out with lucene's default ranking. Here is how a typical document will look like : ( or same thing ) doc1 = 1245:15 3490:20 8856:20 etc. As reflected in th

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Wouter Heijke
Hi Grant, These are 2 cases into work i've done that I can think of: -use Lucene to match products in a database with eBay auctions, the title of the auction is used as the query to Lucene. -use a servlet filter and Lucene to map well-formed URL's into a website to it's individual (product) page

Re: Language Identifier with Lucene?

2011-10-22 Thread Petite Abeille
On Oct 22, 2011, at 2:49 AM, Luca Rondanini wrote: > I usually use Nutch for this but, just for fun, I tried to create a language > identifier based on Lucene only. Talking of which: Google's Compact Language Detector http://blog.mikemccandless.com/2011/10/language-detection-with-googles-compac

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Grant Ingersoll
On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote: > Hi Grant, > > Not sure if this qualifies as a "bet you didn't know", but one could use > Lucene term vectors to construct document vectors for similarity, > clustering and classification tasks. I found this out recently (although > I am probably no

Re: Bet you didn't know Lucene can...

2011-10-22 Thread Shashi Kant
Using Lucene as a recommendation engine. On Sat, Oct 22, 2011 at 6:33 PM, Grant Ingersoll wrote: > > On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote: > >> Hi Grant, >> >> Not sure if this qualifies as a "bet you didn't know", but one could use >> Lucene term vectors to construct document vectors for