Re: highlighting performance

2011-06-20 Thread Michael Sokolov
Koji- I'm not familiar with the benchmarking system, but maybe I'll see if I can run that benchmark on my test data as a point of comparison - thanks for the pointer! -Mike On 6/20/2011 8:21 PM, Koji Sekiguchi wrote: Mike, FVH used to be faster for large docs. I wrote FVH section for Lucene

RE: how to do something like sql in () clause

2011-06-20 Thread Hiller, Dean x66079
Thanks much, Dean -Original Message- From: Denis Bazhenov [mailto:dot...@gmail.com] Sent: Monday, June 20, 2011 6:27 PM To: java-user@lucene.apache.org Subject: Re: how to do something like sql in () clause SQL IN operator behaves as OR operator. So as Occur.SHOULD does. It will match

Re: how to do something like sql in () clause

2011-06-20 Thread Denis Bazhenov
SQL IN operator behaves as OR operator. So as Occur.SHOULD does. It will match document only if _one or more_ of the child queries match. BooleanQuery query = new BooleanQuery(); query.add(new TermQuery(new Term("accountId", "1")), Occur.SHOULD); query.add(new TermQuery(new Term("accountId", "2")

Re: highlighting performance

2011-06-20 Thread Koji Sekiguchi
Mike, FVH used to be faster for large docs. I wrote FVH section for Lucene in Action and it said: In contrib/benchmark (covered in appendix C), there’s an algorithm file called highlight-vs-vector-highlight.alg that lets you see the difference between two highlighters in processing time. As of

RE: how to do something like sql in () clause

2011-06-20 Thread Hiller, Dean x66079
But the issue is that it MUST be 1, OR MUST be 2 so does that still work? Also, how do you write that in the query syntax? Thanks, Dean -Original Message- From: Denis Bazhenov [mailto:dot...@gmail.com] Sent: Monday, June 20, 2011 5:50 PM To: java-user@lucene.apache.org Subject: Re: how

Re: how to do something like sql in () clause

2011-06-20 Thread Denis Bazhenov
You could use BooleanQuery with Occur.SHOULD quantificator http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/BooleanClause.Occur.html#SHOULD On Jun 21, 2011, at 9:24 AM, Hiller, Dean x66079 wrote: > I need to do something like a lucene query with > > Where accountId in ( 1,

RE: any documentation on creating a query without query language

2011-06-20 Thread Hiller, Dean x66079
Swet, thanks, Dean -Original Message- From: Raf [mailto:r.ventag...@gmail.com] Sent: Monday, June 20, 2011 11:34 AM To: java-user@lucene.apache.org Subject: Re: any documentation on creating a query without query language You can always "create" your query by hand, using the various

how to do something like sql in () clause

2011-06-20 Thread Hiller, Dean x66079
I need to do something like a lucene query with Where accountId in ( 1, 2, 3, 4) Is there a way to do that in Lucene Query language? Thanks, Dean This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If

highlighting performance

2011-06-20 Thread Mike Sokolov
Our apps use highlighting, and I expect that highlighting is an expensive operation since it requires processing the text of the documents, but I ran a test and was surprised just how expensive it is. I made a test index with three fields: path, modified, and contents. I made the index using

anyway to store value as bytes?

2011-06-20 Thread Hiller, Dean x66079
I see the api in Lucene is new Field(String, String, Store, Index) Is there anyway to store one of the fields as byte[]? Specifically the value I am looking up I would like it to be byte[] instead of String. All my other ones are String anyways. Thanks, Dean This message and any attachments

Re: any documentation on creating a query without query language

2011-06-20 Thread Raf
You can always "create" your query by hand, using the various Query objects. For example: BooleanQuery bq = new BooleanQuery(); bq.add(new TermQuery(new Term("account", myAccount)), Occur.MUST); bq.add(new TermQuery(new Term("strategy", myStrategy)), Occur.MUST); bq.add(new TermQuery(n

Re: How to deal with not analyzed fields and analyzed ones in the same query

2011-06-20 Thread Raf
You can simply use a KeywordAnalyzer for your NOT_ANALYZED fields. This analyzer, in fact, does not modify your input. Regards, *Raf* On Mon, Jun 20, 2011 at 5:12 PM, G.Long wrote: > Ok, I'll try this. > > But will it work if one of the fields has no analyzers assigned ? > > For example field1

any documentation on creating a query without query language

2011-06-20 Thread Hiller, Dean x66079
I would like to skip creating the query using the query language. Our queries are simple and fixed Like account = :account and strategy=:strategy and date > :date So I would prefer maybe not to use a parser in the future sometime and am really just wondering how. For now, I am just going to us

Re: How to deal with not analyzed fields and analyzed ones in the same query

2011-06-20 Thread G.Long
Ok, I'll try this. But will it work if one of the fields has no analyzers assigned ? For example field1 is associated with a keyword analyzer, field2 with a standardAnalyzer and field3 has no analyzer because it was indexed as Field.Index.NOT_ANALYZED. Is there something to specify in the co

Re: How to deal with not analyzed fields and analyzed ones in the same query

2011-06-20 Thread Erick Erickson
See PerFieldAnalyzerWrapper, then form your query like field1:word1 OR field2:word1 Best Erick On Mon, Jun 20, 2011 at 10:40 AM, G.Long wrote: > Hi :) > > I know it is possible to create a query on different fields with different > analyzers with PerFieldAnalyzer class but is it possible to also

How to deal with not analyzed fields and analyzed ones in the same query

2011-06-20 Thread G.Long
Hi :) I know it is possible to create a query on different fields with different analyzers with PerFieldAnalyzer class but is it possible to also include fields which are not analyzed ? I want some fields not to be tokenized (an exact reference of an article for example) and others to be tok

Re: looks like no allowing of paging without counting entire result set?

2011-06-20 Thread Erick Erickson
<<< that if the first page took 3 seconds to come up, the second page took 3 seconds + x seconds>>> This is really suspicious, what all are you trying to do in your process? Because I'm starting to guess that Solr isn't the performance problem here, assuming reasonably-sized pages (e.g. < thousand

RE: looks like no allowing of paging without counting entire result set?

2011-06-20 Thread Hiller, Dean x66079
One more note: We hit a big performance problem in that if the first page took 3 seconds to come up, the second page took 3 seconds + x seconds to come upthis was the major problem we hit. Our client is not a web app but automated software so the timings on the second page really need to b

RE: looks like no allowing of paging without counting entire result set?

2011-06-20 Thread Hiller, Dean x66079
The noSQL world flips indexing upside down. Instead of the database doing it for you, you do it, and this turns out to be a huge advantage in noSQL when I have huge data. I need to create an index on my activity table account, security, activityDate columns...one index for each account instead

Re: getting OutOfMemoryError

2011-06-20 Thread harsh srivastava
Hi Erick, In continuation to my below mails, I have a socket based multithreaded server that serves in average 1 request per second. The index size is 31GB and document count is about 22 millions. The index directories are first divided in 4 directories and then each subdivided to 21 directories.

Re: looks like no allowing of paging without counting entire result set?

2011-06-20 Thread Erick Erickson
re: 20020101 to the end of time.. Use a clause like [2002-01-01 TO *] About paging... Yes, you have to start all over again for each search. The basic problem is that you have to score every document each search, the last document scored might be the highest-scoring document. But let's back up a