Questions about MultiSimilarity

2013-06-28 Thread Nikos Voskarides
I am using MultiSimilarity to compute CombSum and I have noticed that the computeNorm() method takes the value of the first Similarity in the array of similarities. Is it safe to use MultiSimilarity with similarities that have different computeNorm() implementations? Also, I would like to perform

Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
Hi, I'm using lucene and solr right now in a production environment with an index of about a million docs. I'm working on a recommender that basically would list the n most similar items to the user based on the current item he is viewing. I've been thinking of using solr/lucene since I already h

RE: Content based recommender using lucene/solr

2013-06-28 Thread Saikat Kanjilal
Why not just use mahout to do this, there is an item similarity algorithm in mahout that does exactly this :) https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html You can use mahout in distributed and non-distributed mode a

Re: Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
Hey saikat, thanks for your suggestion. I've looked into mahout and other alternatives for computing k nearest neighbors. I would have to run a job and computer the k nearest neighbors and track them in the index for retrieval. I wanted to see if this was something I could do with lucene using luce

Re: Content based recommender using lucene/solr

2013-06-28 Thread Otis Gospodnetic
Hi, Have a look at http://www.youtube.com/watch?v=13yQbaW2V4Y . I'd say it's easier than Mahout, especially if you already have and know your way around Solr. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at

RE: Content based recommender using lucene/solr

2013-06-28 Thread Saikat Kanjilal
You could build a custom recommender in mahout to accomplish this, also just out of curiosity why the content based approach as opposed to building a recommender based on co-occurence. One other thing, what is your data size, are you looking at scale where you need something like hadoop? > Fro

Re: Content based recommender using lucene/solr

2013-06-28 Thread Walter Underwood
More Like This already is kNN. It extracts features from the document (makes a query), and runs that query against the collection. If you want the items most similar to the current item, use MLT. wunder On Jun 28, 2013, at 11:02 AM, Luis Carlos Guerrero Covo wrote: > Hey saikat, thanks for you

Re: Content based recommender using lucene/solr

2013-06-28 Thread Otis Gospodnetic
Hi, It doesn't have to be one or the other. In the past I've built a news recommender engine based on CF (Mahout) and combined it with Content Similarity-based engine (wasn't Solr/Lucene, but something custom that worked with ngrams, but it may have as well been Lucene/Solr/ES). It worked well.

Re: Content based recommender using lucene/solr

2013-06-28 Thread Luis Carlos Guerrero Covo
I only have about a million docs right now so scaling is not a big issue. I'm looking to provide a quick implementation and then worry about scale when I get around to implementing a more robust recommender. I'm looking at a content based approach because we are not tracking users and items viewed

How to Perform a Full Text Search on a Number with Leading Zeros or Decimals?

2013-06-28 Thread Todd Hunt
I have an application that is indexing the text from various reports and forms that are generated from our core system. The reports will contain dollar amounts and various indexes that contain all numbers, but have leading zeros. If a document contains that following text that is stored in one

Re: How to Perform a Full Text Search on a Number with Leading Zeros or Decimals?

2013-06-28 Thread Jack Krupansky
The user could use a regular expression query to match the numbers, but otherwise, you will have to write some specialized token filter to recognize numeric tokens and generate extra tokens at the same position for each token variant that you want to search for. -- Jack Krupansky -Origina

Re: How to Perform a Full Text Search on a Number with Leading Zeros or Decimals?

2013-06-28 Thread Uwe Schindler
You can add PatternReplaceFilter (http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/pattern/PatternReplaceFilter.html) to replace the tokens only consisting of digits by their vsrisnt with leading zeroes removed. Uwe Jack Krupansky schrieb: >The user could use

In memory index (current status in Lucene)

2013-06-28 Thread Emmanuel Espina
I'm building a distributed index (mostly as a reasearch project for school) and I'm evaluating indexing the entire collection in memory (like google, facebook and others have done years ago). The obvious reason for this is performance considering that the replication will give me a reasonably good

Re: In memory index (current status in Lucene)

2013-06-28 Thread Steven Schlansker
On Jun 28, 2013, at 2:29 PM, Emmanuel Espina wrote: > I'm building a distributed index (mostly as a reasearch project for > school) and I'm evaluating indexing the entire collection in memory > (like google, facebook and others have done years ago). The obvious > reason for this is performance c