Re: date issues

2012-02-23 Thread findbestopensource
Yes. By storing as String, You should be able to do range search. I am not sure, which is better, storing as String / Integer. Regards Aditya www.findbestopensource.com On Thu, Feb 23, 2012 at 1:25 PM, Jason Toy jason...@gmail.com wrote: Can I still do range searches on a string? It seems

Re: date issues

2012-02-23 Thread Danil Ε’ORIN
Ranges on String are painfully slow. Format them as MMDD and store as class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0 On Thu, Feb 23, 2012 at 10:19, findbestopensource findbestopensou...@gmail.com wrote: Yes. By storing as String, You should be able to do range

data extraction architecture

2012-02-23 Thread chris chisolm
I'm relatively new to this field and I have a problem that seems to be solvable in lots of different ways, and I'm looking for some recommendations on how to approach a data refining pipeline. I'm not sure where to look for this type of architecture description. My best finds so far have been

Re: When deletes will be removed?

2012-02-23 Thread Ian Lea
Eventually, as more modifications take place and merges are triggered. If you really care, and are using the default TieredMergePolicy, you could try playing with TieredMergePolicy.setForceMergeDeletesPctAllowed(double v). Might help. Or you could call IndexWriter.forceMergeDeletes(), The

Re: Multiple index vs Single Index

2012-02-23 Thread Ian Lea
Millions of docs in a single index is definitely OK. If it was my system I'd willingly trade slightly slower indexing for simplicity and ease of use. If it works and is fast enough, job done. -- Ian. On Thu, Feb 23, 2012 at 7:31 AM, Ganesh emailg...@yahoo.co.in wrote: Hello all, This

Re: Multiple index vs Single Index

2012-02-23 Thread Ian Lea
Well, you certainly can force a merge if you wish, I guess it's a balance between an expensive, disk intensive operation that may make other operations quicker. Your choice. I only have one set of multi-million doc indexes whose performance I care about and they are updated in bulk every night

RE: Custom lucene scoring - Dot product between field boost and query boost

2012-02-23 Thread Yuval Kesten
One important thing - Since I am not using the indexed documents fields' norms, because the weight is the value of the field, I am now indexing the fields using: Field field = new Field(field_name, Float.toString(weight), Store.YES, Index.NOT_ANALYZED_NO_NORMS); And the memory usage is back to

Re: date issues

2012-02-23 Thread Erick Erickson
1 Don't use sint, it's being deprecated. And it'll take up more space than a TrieDate 2 Precision. Sure, use the coarsest time you can, normalizing everything to day would be a good thing. You won't get any space savings by storing to day resolution, it's just a long under the covers. But

Custom scoring

2012-02-23 Thread Damerian
Hello, I am trying to implement my own Jaccard similarity for Lucene. So far i have the following code public class JaccardSimilarity extends DefaultSimilarity { int numberOfDocumentTerms; //String field=contents; // Should the Jaccard similarity be only based in the contents field

Re: Custom scoring

2012-02-23 Thread Ahmet Arslan
The problem is that coord() method is not used (or at least so that i understand) neither in searching nor in indexing What do i do wrong? If you want to see coord() values, use a multi-word query (two or more query terms) and go to last page of result set.