Re: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Em
Hi Yuval, 1. Regarding the performances - the similarity class (And my subtype as well) gets the IDF and TF and SQUARED SUMS calculations as inputs - they just factor them differently. Even though I ignore the values they are being computed. Good point. However I think that these values are

Re: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Alan Woodward
Hi Yuval, You can just override Similarity, rather than DefaultSimilarity - that way you don't burn any CPU cycles on TF/IDF calculations. Alan On 22 Feb 2012, at 07:17, Yuval Kesten wrote: Hi Em, 1. Regarding the performances - the similarity class (And my subtype as well) gets the IDF

RE: Custom lucene scoring - Dot product between field boost and query boost

2012-02-22 Thread Yuval Kesten
Hi all, Inspired by another thread here (Question about CustomScoreQuery) I am using this solution which is working really well (with one drawback): I discovered that some of my problems were due to the fact that my assumption was wrong: I did have many fields/queries terms with the same field

Impact of max merged segment setting

2012-02-22 Thread Vitaly Funstein
Hello, I am currently experimenting with tuning of max merged segment MB parameter on TieredMergePolicy in Lucene 3.5, and seeing significant gains in index writing speed from values dramatically lower than the default (5 Gb). For instance, when setting it to 5 or 10 MB, I can see my writing

TaxonomySearch similar words?

2012-02-22 Thread Cheng
Hi, I am using Taxonomy Search to build a facet comprising things such as “/author/American/Mark Twain”. Since the word author has a synonym of writer, can I use writer instead of author to get the path? Currently I can only use exactly the word author to do it. Thanks

Re: TaxonomySearch similar words?

2012-02-22 Thread Shai Erera
Hi Cheng, You will need to use the exact path labels in order to get to the category 'Mark Twain', unless you index multiple paths from start, e.g.: /author/American/Mark Twain /writer/American/Mart Twain The taxonomy index does not process the CategoryPath labels in anyway to e.g. produce

Re: TaxonomySearch similar words?

2012-02-22 Thread Cheng
Thank you. The alternative sounds reasonable. On Thu, Feb 23, 2012 at 12:54 PM, Shai Erera ser...@gmail.com wrote: Hi Cheng, You will need to use the exact path labels in order to get to the category 'Mark Twain', unless you index multiple paths from start, e.g.: /author/American/Mark Twain

date issues

2012-02-22 Thread Jason Toy
I have a solr instance with about 400m docs. For text searches it is perfectly fine. When I do searches that calculate the amount of times a word appeared in the doc set for every day of a month, it usually causes solr to crash with out of memory errors. I calculate this by running ~30

Re: date issues

2012-02-22 Thread findbestopensource
Hi, You could consider storing date field as String in MMDD format. This will save space and it will perform better. Regards Aditya www.findbestopensource.com On Thu, Feb 23, 2012 at 11:55 AM, Jason Toy jason...@gmail.com wrote: I have a solr instance with about 400m docs. For text

When deletes will be removed?

2012-02-22 Thread Ganesh
Hello all, I am using v3.5 with all default options. In my index the deletes are not removed. When will it be removed? I have not done optimize (forced merge). 1618714 Feb 22 20:42 _11y_l.del 499 Feb 22 20:42 _195_k.del 591 Feb 22 20:42 _1hs_l.del 556 Feb 22 20:42 _1pl_l.del

Multiple index vs Single Index

2012-02-22 Thread Ganesh
Hello all, This debate we might have had more frequently in the group. Yet one more time, i want to clarify. I was using multiple indexes (per week one index) with previous versions of Lucene (2.4 - 3.0.3). The performance was really good for incremental indexing. I used to optimize once per

Re: date issues

2012-02-22 Thread Jason Toy
Can I still do range searches on a string? It seems like it would be more efficient to store as an integer. Hi, You could consider storing date field as String in MMDD format. This will save space and it will perform better. Regards Aditya www.findbestopensource.com On Thu, Feb