Re: sort question

2012-05-21 Thread Chaoqing Li
Sorry for the confusion. It's the first one you mentioned below. We have a sort on discount field, and if the keyword match the name field, we need it's important than sorting. if don't sort, how can we implement this request? I'm stuck here. and the discount has been convert to number already, t

Re: lucene (search) performance tuning

2012-05-21 Thread Li Li
something wrong when writing in my android client. if RAMDirectory do not help, i think the bottleneck is cpu. you may try to tune jvm but i do not expect much improvement. the best one is splitting your index into 2 or more smaller ones. you can then use solr s distributed searching. if the cpu is

Re: lucene (search) performance tuning

2012-05-21 Thread Li Li
在 2012-5-22 凌晨4:59,"Yang" 写道: > > I'm trying to make my search faster. right now a query like > > name:Joe Moe Pizza address:77 main street city:San Francisco >is this a conjunction query or a disjunction query? > in a index with 20mil such short business descriptions (total size about 3GB) take

Re: Searching accross 2 fields

2012-05-21 Thread Mohit Anchlia
Thanks! Are there any good examples I can look at? In some cases it's the nested document in other cases it's within the same document. Something like: In below example I want to search for form.id = 1040 and name = age and value = 20 and return only doc1. Does this also fall under "cross matchin

Re: Searching accross 2 fields

2012-05-21 Thread Mark Harwood
You're describing what I call the "cross matching" problem if you flatten nested, repeating structures with multiple fields into a single flat Lucene document model. The approach for handling the more complex mappings is to use nested child docs in Lucene and for that look at BlockJoinQuery. Ho

Re: FilterClause serializable

2012-05-21 Thread Simon Willnauer
we removed almost all serializable from lucene since it was causing many problems and wasn't complete either. users should serialize classes / logic themself or use higher level impls that deal with that already. simon On Mon, May 21, 2012 at 1:05 PM, Lars Gjengedal wrote: > Hi > > I have not bee

Re: Memory question

2012-05-21 Thread Chris Bamford
This is a progress update on the issue: I have tried several things and they all gave improvements. In order of magnitude they are 1) Reduced heap space from 6GB to 3GB. This on it's own has so far been the biggest win as swapping almost completely stopped after this step. 2) Began limiting t

Re: Grouping Based on Multiple Fields Similarity

2012-05-21 Thread Robby
Hi All, Sorry... I give wrong example, should be like this actually.. On Mon, May 21, 2012 at 9:31 PM, Robby wrote: > - Grouping 1, count : 3 > - row id = 1 > - row id = 23 > - row id = 100 > - Grouping 2 > - row id = 11 > - row id = 133 > - ... > Regards, Ro

Grouping Based on Multiple Fields Similarity

2012-05-21 Thread Robby
Hi Everyone, I'm quite new to Lucene and would like to ask if my case below is possible with Lucene solution. Let's say I have 200,000 rows from a relational table with multiple fields, and I will have them indexed with Lucene. After indexing, I'd like to have a grouping / clustering based on sim

FilterClause serializable

2012-05-21 Thread Lars Gjengedal
Hi I have not been able to figure out why FilterClause is no longer serializable in the change log for 3.5 or in lucene. http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/changes/Changes.html#3.5.0.test_cases Does anyone have a pointer to the explanation? -- Lars Gjengedal Insper

Re: sort question

2012-05-21 Thread Ian Lea
I'm not clear what you are asking. Are you saying that you want keyword matching to be more important than sorting? If that's the case, don't sort. Or are you saying that sorting of null values isn't doing what you want? Use an actual value instead of null, whatever makes sense in your applicati

Re: Per User filtering of public/common documents

2012-05-21 Thread Ian Lea
Certainly lots of questions, and I can't answer most of them, but a couple of comments/opinions. Collecting all docs will potentially use a lot of memory but isn't necessarily excessively slow. It's generally only doing something like reading field values for all docs that can be prohibitively sl

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Shashi Kant
A related thread on Stackoverflow: http://stackoverflow.com/questions/3215029/nosql-mongodb-vs-lucene-or-solr-as-your-database/3216550#3216550 On Fri, May 18, 2012 at 10:44 AM, Konstantyn Smirnov wrote: > Hi all, > > apologies, if this question was already asked before. > > If I need to store a l

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Apostolis Xekoukoulotakis
There is an IndexdocValue(renamed docvalues) in Lucene 4 which maps ids to a value and has different characteristics that the inverted index. If someone could answer my question as well , it entails using a k-v database for having personalized ranking(see the previous mail

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Li Li
what's your meaning of performance of storage? lucene just stores all fields of a document(or columns of a row if in db) together. it can only store string. you can't store int or long( except you convert it to string). to retrieve a given field of a document will cause many io operations. it's des

Re: Sort runs out of memory

2012-05-21 Thread Toke Eskildsen
On Thu, 2012-05-17 at 23:03 +0200, Robert Bart wrote: > I am running Lucene 3.6 in a system that indexes about 4 billion documents > across several indexes, and I'm hoping to get documents in order of a > certain NumericField. What is the maximum size on any single index, in terms of number of doc

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Konstantyn Smirnov
That's ok, but what is the real difference? Are there any performance tests? I can assume, that up to 1 GB index size, there will be no noticeable difference with stored fields in comparison with some MongoDB, but if the index size grows? -- View this message in context: http://lucene.472066.n3.