RE: Improve Search Speed

2018-08-16 Thread thturk
Thank you for your advice as i researched many people suggest me same things like making better complex queries to get more spesific results . but i didnt excatly get what is more spesifc queries . More indexed fields and put many different kind of boolean queries in it mostly i am using fuzzy

RE: Improve Search Speed

2018-08-15 Thread Uwe Schindler
Hi, In general, the speed of the Lucene search depends on several aspects, one of those is of course the underlying hardware its IO performance. To improve search performance, one way is to use SSDs or increase the available file system cache of your operating system. Full text search engines

Improve Search Speed

2018-08-15 Thread thturk
Hey i am new in Lucene and i want to search indexes in *100 *ms; FileSize is* 2gb* and indexes are seperated via some type( 6 files ) and there are *20m records* in file i set searcher and reader in the constructor and i have boolean query which include fuzzy query and wildcard query. ev

Re: Lucene Speed

2018-07-18 Thread Michael McCandless
Hi Ehson, Have you looked at the luceneutil source code that runs the benchmarks? https://github.com/mikemccand/luceneutil The sources are not super clean, but that's what's running the nightly benchmarks, starting from src/main/perf/Indexer.java. Mike McCandless http://blog.mikemccandless.com

Re: Lucene Speed

2018-07-18 Thread Adrien Grand
Have you already checked https://wiki.apache.org/lucene-java/ImproveIndexingSpeed? Often when running such benchmarks, the bottleneck is not indexing but opening or parsing input files, so you should review that part as well. Le mer. 18 juil. 2018 à 16:12, Ehson Umrani a écrit : > Hello, > > My

Lucene Speed

2018-07-18 Thread Ehson Umrani
Hello, My name is Ehson Umrani and I am currently running some experiments using Lucene. FOr the expiraments I am running I need Lucene to run as fast as possible. Do you have any suggestions on how to achieve speeds listed on the nightly benchmark page. I am also using 1kb Wikipedia files and

how to speed up the Scorer

2017-08-17 Thread wu...@mxtrip.cn
hi~ I have some problems when I use the Lucene. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

How to use FieldCache and Custom Collector to improve search speed

2017-04-10 Thread neeraj shah
I am using Lucene 3.6 and i am trying to implement FieldCache. I have seen some posts but did not get any clear idea. Can anyone please suggest me any link where i can find proper example of FieldCache and how to use it while searching.

Re: Lucene indexing speed on NVMe drive

2015-05-01 Thread Michael McCandless
ocs) with similar parameters > as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 processor ,40 with > hyperthreading, 64G Memory) and study indexing speed on HDD, SSD and NVMe. > While I do see benefit when switching from HDD to SSD, there is not much > noticeable benefit m

RE: Lucene indexing speed on NVMe drive

2015-04-30 Thread Anahita Shayesteh-SSI
AM To: java-user@lucene.apache.org Cc: Anahita Shayesteh-SSI Subject: Re: Lucene indexing speed on NVMe drive : Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe. : parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 : proc

Re: Lucene indexing speed on NVMe drive

2015-04-30 Thread Chris Hostetter
: Hi. I am studying Lucene performance and in particular how it benefits from faster I/O such as SSD and NVMe. : parameters as used in nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 : processor ,40 with hyperthreading, 64G Memory) and study indexing speed ... : I get best performance

Lucene indexing speed on NVMe drive

2015-04-30 Thread Anahita Shayesteh-SSI
study indexing speed on HDD, SSD and NVMe. While I do see benefit when switching from HDD to SSD, there is not much noticeable benefit moving to NVMe. I get best performance (200GB/hour) with 20 indexing threads, increasing number of threads to 40 hurts performance. Similarly increasing

Re: 回复: Speed up searching in multiple-thread?

2014-09-15 Thread Toke Eskildsen
On Mon, 2014-09-15 at 11:41 +0200, Harry Yu wrote: > 17ms / searches is the whole process of search service, include > accessing complete data form db, calling REST service etc. Try looking at QTime in solr.log and compare it with your measured response times, to see if it is Solr or your other s

回复: Speed up searching in multiple-thread?

2014-09-15 Thread Harry Yu
rvice, include accessing complete data form db, calling REST service etc. Regards, Harry Yu -- 原始邮件 -- 发件人: "Toke Eskildsen";; 发送时间: 2014年9月15日(星期一) 下午4:47 收件人: "java-user@lucene.apache.org"; 主题: Re: Speed up searching in multiple-thread?

Re: Speed up searching in multiple-thread?

2014-09-15 Thread Toke Eskildsen
On Mon, 2014-09-15 at 09:10 +0200, Harry Yu wrote: > I'm developing poi search application using lucene 4.8 . Recently, I > met a trouble that the performance of IndexSearcher.search is bad in > multiple-thread environment. According the test results, I found that > if thread number is 1, the resp

回复: Speed up searching in multiple-thread?

2014-09-15 Thread Harry Yu
. Best Regards, Harry Yu -- 原始邮件 -- 发件人: "Michael McCandless";; 发送时间: 2014年9月15日(星期一) 下午3:48 收件人: "Lucene Users"; 主题: Re: Speed up searching in multiple-thread? If you run 30 search threads on a core i5 it's expected there will be big s

Re: Speed up searching in multiple-thread?

2014-09-15 Thread Michael McCandless
If you run 30 search threads on a core i5 it's expected there will be big slowdowns in the per-query latency since core i5 only has 2 real (4 with hyperthreading) cores? Mike McCandless http://blog.mikemccandless.com On Mon, Sep 15, 2014 at 3:10 AM, Harry Yu <502437...@qq.com> wrote: > Dear mem

Speed up searching in multiple-thread?

2014-09-15 Thread Harry Yu
Dear members at Lucene project, I'm developing poi search application using lucene 4.8 . Recently, I met a trouble that the performance of IndexSearcher.search is bad in multiple-thread environment. According the test results, I found that if thread number is 1, the response time of searching

Re: Speed up searching on index created using JdbcDirectory

2014-08-23 Thread Pradeep Bhattiprolu
213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >>> -Original Message- >>> From: Mahesh Charegaonkar [mailto:mahesh.charegaon...@gmail.com] >>> Sent: Saturday, August 23, 2014 11:42 PM >>> To: java-user@luce

Re: Speed up searching on index created using JdbcDirectory

2014-08-23 Thread Mahesh Charegaonkar
onkar [mailto:mahesh.charegaon...@gmail.com] > > Sent: Saturday, August 23, 2014 11:42 PM > > To: java-user@lucene.apache.org > > Subject: Re: Speed up searching on index created using JdbcDirectory > > > > Thanks Uwe for your response. > > > > Could you please tell

RE: Speed up searching on index created using JdbcDirectory

2014-08-23 Thread Uwe Schindler
, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mahesh Charegaonkar [mailto:mahesh.charegaon...@gmail.com] > Sent: Saturday, August 23, 2014 11:42 PM > To: java-user@lucene.apache.org > Subject: Re: Speed up searching on inde

Re: Speed up searching on index created using JdbcDirectory

2014-08-23 Thread Mahesh Charegaonkar
charegaon...@gmail.com] > > Sent: Saturday, August 23, 2014 11:12 PM > > To: java-user@lucene.apache.org > > Subject: Re: Speed up searching on index created using JdbcDirectory > > > > HI All, > > > > Please help me out to resolve this issue. Your help is real

RE: Speed up searching on index created using JdbcDirectory

2014-08-23 Thread Uwe Schindler
haregaonkar [mailto:mahesh.charegaon...@gmail.com] > Sent: Saturday, August 23, 2014 11:12 PM > To: java-user@lucene.apache.org > Subject: Re: Speed up searching on index created using JdbcDirectory > > HI All, > > Please help me out to resolve this issue. Your help is really appriciated.

Re: Speed up searching on index created using JdbcDirectory

2014-08-23 Thread Mahesh Charegaonkar
HI All, Please help me out to resolve this issue. Your help is really appriciated. Thanks Mahesh On Wed, Aug 20, 2014 at 1:57 PM, Mahesh Charegaonkar < mahesh.charegaon...@gmail.com> wrote: > Hi Lucene masters, > > I was using lucene couple of years back. We have developed application > which

Speed up searching on index created using JdbcDirectory

2014-08-20 Thread Mahesh Charegaonkar
Hi Lucene masters, I was using lucene couple of years back. We have developed application which uses lucene's JdbcDirecory feature. Using JdbcDirecory we have writing and reading data from database. Over the time data has increased tremendously and that why we are facing performance issue with se

Re: improve indexing speed with nomergepolicy

2014-08-14 Thread Shai Erera
gt; forceMerge(). > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Shai Erera [mailto:ser...@gmail.com] > > Sent: Thursday, August 07, 2014

RE: improve indexing speed with nomergepolicy

2014-08-07 Thread Uwe Schindler
.@gmail.com] > Sent: Thursday, August 07, 2014 4:11 PM > To: java-user@lucene.apache.org > Subject: Re: improve indexing speed with nomergepolicy > > Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you pass > it > at construction time an

Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Shai Erera
you use NRTCachingDirectory? It's usually recommended for NRT, > > even with default MP, since the tiny segments are merged in-memory, and > > your NRT reopens don't result in flushing new segments to disk. > > > > Shai > > > > > > On Thu, Aug 7, 20

Aw: Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Sascha Janz
16:05 Uhr Von: "Jon Stewart" An: java-user@lucene.apache.org Betreff: Re: improve indexing speed with nomergepolicy Related, how does one change the MergePolicy on an IndexWriter (e.g., use NoMergePolicy during batch indexing, then change to something better once finished with batch)

Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Jon Stewart
ly the same size). You can increase that. > > Also, do you use NRTCachingDirectory? It's usually recommended for NRT, > even with default MP, since the tiny segments are merged in-memory, and > your NRT reopens don't result in flushing new segments to disk. > > Shai > > >

Aw: Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Sascha Janz
many thanks again. this was a good tip. after switching from FSDirectory to NRTCachingDirectory queries run at double speed. Sascha     Gesendet: Donnerstag, 07. August 2014 um 14:54 Uhr Von: "Sascha Janz" An: java-user@lucene.apache.org Betreff: Aw: Re: improve indexing

Aw: Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Sascha Janz
 many thanks for the tip with NRTCachingDirectory. didn't know that. i will try it . Sascha   Gesendet: Donnerstag, 07. August 2014 um 13:37 Uhr Von: "Shai Erera" An: "java-user@lucene.apache.org" Betreff: Re: improve indexing speed with nomergepolicy Using NoMerge

Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Shai Erera
t MP, since the tiny segments are merged in-memory, and your NRT reopens don't result in flushing new segments to disk. Shai On Thu, Aug 7, 2014 at 1:14 PM, Sascha Janz wrote: > hi, > > i try to speed up our indexing process. we use SeacherManager with > applydeletes to get n

improve indexing speed with nomergepolicy

2014-08-07 Thread Sascha Janz
hi, i try to speed up our indexing process. we use SeacherManager with applydeletes to get near real time Reader. we have not really "much" incoming documents, but the documents must be updated from time to time and the amount of documents to be updated could be quite large. i

Re: IndexUpgrade - Any ways to speed up?

2013-08-03 Thread Ramprakash Ramamoorthy
On Fri, Aug 2, 2013 at 5:56 PM, Shai Erera wrote: > Unfortunately you cannot upgrade directly from 2.3.1 to 4.1. > > You can consider upgrading to 3.6.2 and stop there. Lucene 4.1 can read 3.x > indexes, and when segments will are merged, they are upgraded automatically > to the newest file forma

Re: IndexUpgrade - Any ways to speed up?

2013-08-02 Thread Shai Erera
Unfortunately you cannot upgrade directly from 2.3.1 to 4.1. You can consider upgrading to 3.6.2 and stop there. Lucene 4.1 can read 3.x indexes, and when segments will are merged, they are upgraded automatically to the newest file format. However, if this single segment is too big, such that it w

Re: IndexUpgrade - Any ways to speed up?

2013-08-02 Thread Ramprakash Ramamoorthy
Thank you Shai for the quick response. Have responded inline. On Fri, Aug 2, 2013 at 5:37 PM, Shai Erera wrote: > Hi > > You cannot just update headers -- the file formats have changed. Therefore > you need to rewrite the index entirely, at least from 2.3.1 to 3.6.2 (for > 4.1 to be able to rea

Re: IndexUpgrade - Any ways to speed up?

2013-08-02 Thread Shai Erera
Hi You cannot just update headers -- the file formats have changed. Therefore you need to rewrite the index entirely, at least from 2.3.1 to 3.6.2 (for 4.1 to be able to read it). If your index is already optimized, then IndexUpgrader is your best option. The reason it calls forceMerge(1) is that

IndexUpgrade - Any ways to speed up?

2013-08-02 Thread Ramprakash Ramamoorthy
Team, We are migrating from lucene version 2.3.1 to 4.1. We are migrating the indices as well, and we do this in two steps 2.3.1 to 3.6.2 and 3.6.2 to 4. We just call IndexUpgrader.upgrade(), using the IndexUpgraderMergePolicy. I see that, the upgrade() method actually calls a forcemerge(1

Any CommonGrams-inspired tricks to speed up other proximity query types?

2012-06-21 Thread Chris Harris
rms in relationship, you can speed things up. Bigrams are probably the simplest such proximity-capturing structure, but it seems like others could exist. What I'm curious about is whether there are more advanced structures that could potentially speed up proximity search more generally, ideall

Re: slow speed of searching

2012-02-08 Thread Cheng
I have about 6.5 million documents which lead to 1.5G index. The speed of > > search a couple terms, like "dvd" and "price", causes about 0.1 second. > > > > I am afraid that our data will grow rapidly. Except for dividing > documents > > into multi

Re: slow speed of searching

2012-02-08 Thread Ian Lea
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed (the 3rd item is Use a local filesystem!) -- Ian. On Wed, Feb 8, 2012 at 12:44 PM, Cheng wrote: > Hi, > > I have about 6.5 million documents which lead to 1.5G index. The speed of > search a couple terms, like "dvd&quo

Re: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-07 Thread Peter K
Hi Uwe, > Die, Maven, die :-) Well, I for myself have a love-hate-relationship to maven: its simple and works nice for deps management. also others can set it up quickly and IDE support is nice. But sometimes it does a bit too much (unexpected ;)) or is too complicated to customize. > (I assum

RE: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-07 Thread Uwe Schindler
Hi, > > I mean my benchmarks show up > > to 300% improvement with 4.x versus older versions so something is > > weird ie. non-realistic here or there is a bug so lets figure this > > out. Can you profile you app and see if you find something suspicious? > > I'll try now and report back. > > It s

Re: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-07 Thread Peter K
> I mean my benchmarks show up > to 300% improvement with 4.x versus older versions so something is > weird ie. non-realistic here or there is a bug so lets figure this > out. Can you profile you app and see if you find something suspicious? > I'll try now and report back. It seems to be largely

Shared IndexWriter does not increase speed

2012-01-06 Thread Cheng
Hi, I am trying to use a shared IndexWriter instance for a multi-thread application. Surprisingly, this under performs by creating a writer instance within a thread. My code is as follow. Can someone help explain why? Thanks. Scenario 1: shared IndexWriter instance RAMDirectory ramDir = new RA

Re: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-05 Thread Peter K
Hi Simon, answers below. >> It does not seem to be an 'IO related issue' because using RAMDirectory >> results in the same times. >> And indexing via Luc4 with only one thread shouldn't be slower than 3.5 (?) > it could be since we use a different term dictionary impl which is > more expensive in

Re: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-05 Thread Simon Willnauer
ict takes longer on those kind of inputs? > > Could it be due to some garbage collector or thread overhead with luc4? > As I see a bigger execution speed variation for single lucene 4.0 runs > (differences of seconds!) than for 3.5 (differences in 0.1seconds!). > E.g. how could I

Re: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-03 Thread Peter K
s a generated and incremented id from AtomicLong and two types. Or do you have an explanation why luc4 can be slower on such 'simple' fields? Could it be due to some garbage collector or thread overhead with luc4? As I see a bigger execution speed variation for single lucene 4.0 r

Re: Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-03 Thread Simon Willnauer
hub.com/karussell/lucene-tmp > Or is there something wrong with my too simplistic scenario? > > Furthermore: How could I further improve Lucene 4.0 indexing speed? > (I already read through the performance list on the wiki) > > Regards, > Peter. > > * > ope

Comparing Indexing Speed of Lucene 3.5 and 4.0

2012-01-03 Thread Peter K
my too simplistic scenario? Furthermore: How could I further improve Lucene 4.0 indexing speed? (I already read through the performance list on the wiki) Regards, Peter. * open jdk 1.6.0_20 (but also confirmed with latest java6 from oracle) ubuntu/10.10 linux/2.6.35-31 i686, 2GB ram ** lucene

Re: Improving indexing speed

2011-11-17 Thread Wouter Heijke
mat need to parse to extract information >> after >> >> that i had to index. >> >> Single thread process one file at a time then i decided to use multi >> >> threads when the main thread that loops the directory and pass the >> file >> >> int

Re: Improving indexing speed

2011-11-17 Thread KARTHIK SHIVAKUMAR
decided to use multi > >> threads when the main thread that loops the directory and pass the file > >> into pool of worker threads using a queue > >> all of the which share same index writer, How ever there is no any > >> significant changes in indexing speed

Re: Improving indexing speed

2011-11-10 Thread Ian Lea
using a queue >> all of the which share same index writer, How ever there is no any >> significant changes in indexing speed >> >> Any hints I am doing wrong or any suggestion >> >> >> Thanks >> Antony >> > > --

Re: Improving indexing speed

2011-11-10 Thread Simon Willnauer
cess one file at a time then i decided to use multi > threads when the main thread that loops the directory and pass the file > into pool of worker threads using a queue > all of the which share same index writer, How ever there is no any > significant changes in indexing speed > &g

Improving indexing speed

2011-11-10 Thread antony jospeh
directory and pass the file into pool of worker threads using a queue all of the which share same index writer, How ever there is no any significant changes in indexing speed Any hints I am doing wrong or any suggestion Thanks Antony

Re: Indexing speed on NTFS

2011-05-31 Thread Toke Eskildsen
On Tue, 2011-05-31 at 08:52 +0200, Maciej Klimczuk wrote: > I did some testing with 3.1.0 demo on Windows and encountered some strange > bahaviour. I tried to index ~6 small text documents using the demo. > - First trial took about 18 minutes. > - Second and third trial took about 2 minutes.

Indexing speed on NTFS

2011-05-30 Thread Maciej Klimczuk
Hello everyone I did some testing with 3.1.0 demo on Windows and encountered some strange bahaviour. I tried to index ~6 small text documents using the demo. - First trial took about 18 minutes. - Second and third trial took about 2 minutes. I then made another test on other, bigger docum

Re: Speed up payload loading?

2011-05-03 Thread Michael McCandless
On Tue, May 3, 2011 at 5:35 AM, Chris Bamford wrote: > Hi, > > I have been experimenting with using a int payload as a unique identifier, > one per Document.  I have successfully loaded them in using the TermPositions > API with something like: > >    public static void loadPayloadIntArray(Index

Speed up payload loading?

2011-05-03 Thread Chris Bamford
Hi, I have been experimenting with using a int payload as a unique identifier, one per Document. I have successfully loaded them in using the TermPositions API with something like: public static void loadPayloadIntArray(IndexReader reader, Term term, int[] intArray, int from, int to) thro

Re: speed of CheckIndex

2011-04-14 Thread jm
mostly status of the indexes, whether there is some corruption or all is ok. On Thu, Apr 14, 2011 at 9:20 PM, Simon Willnauer < simon.willna...@googlemail.com> wrote: > what kind of diagnostics are you looking for? > > simon > > On Thu, Apr 14, 2011 at 9:14 PM, jm wrote: > > Thanks Erick, but I

Re: speed of CheckIndex

2011-04-14 Thread Simon Willnauer
what kind of diagnostics are you looking for? simon On Thu, Apr 14, 2011 at 9:14 PM, jm wrote: > Thanks Erick, but I guess what you refer to lives in Solr right? I am using > plain Lucene. > > On Thu, Apr 14, 2011 at 7:33 PM, Erick Erickson > wrote: > >> What information do you need? Could you

Re: speed of CheckIndex

2011-04-14 Thread jm
Thanks Erick, but I guess what you refer to lives in Solr right? I am using plain Lucene. On Thu, Apr 14, 2011 at 7:33 PM, Erick Erickson wrote: > What information do you need? Could you just ping the stats component > and parse the results (basically the info on the admin/stats page). > > Best >

Re: speed of CheckIndex

2011-04-14 Thread Erick Erickson
What information do you need? Could you just ping the stats component and parse the results (basically the info on the admin/stats page). Best Erick On Thu, Apr 14, 2011 at 11:56 AM, jm wrote: > Hi, > > I need to collect some diagnostic info from customer sites, so I would like > to get info on

speed of CheckIndex

2011-04-14 Thread jm
Hi, I need to collect some diagnostic info from customer sites, so I would like to get info on the status of lucene indexes...but I don't want the process of collecting to take very long. So I am considering Checkindex. I tested in a small index (60k docs) and it took 12 seconds. A site usually h

Re: custom low-level indexer (to speed things up) when fields, terms and docids are in order

2010-04-08 Thread Michael McCandless
t;> writing, it'll mean you can freely swap in different codecs. >>> >>> The only thing you can do further is to conflate your custom code with >>> the codec, ie, so that you make a single chain that directly writes >>> index files.  But I'm not s

Re: custom low-level indexer (to speed things up) when fields, terms and docids are in order

2010-04-07 Thread britske
ndex files. But I'm not sure you'll gain much performance by doing >> so... (and then you can't [as easily] swap codecs). >> >> Have you profiled to see where the time is being spent? >> >> Mike >> >> On Thu, Mar 25, 2010 at 7:40 PM, bri

Re: custom low-level indexer (to speed things up) when fields, terms and docids are in order

2010-03-26 Thread britske
dex files. But I'm not sure you'll gain much performance by doing > so... (and then you can't [as easily] swap codecs). > > Have you profiled to see where the time is being spent? > > Mike > > On Thu, Mar 25, 2010 at 7:40 PM, britske <[hidden > email]<htt

Re: custom low-level indexer (to speed things up) when fields, terms and docids are in order

2010-03-26 Thread Danil ŢORIN
oing > so... (and then you can't [as easily] swap codecs). > > Have you profiled to see where the time is being spent? > > Mike > > On Thu, Mar 25, 2010 at 7:40 PM, britske wrote: > > > > Hi, > > > > perhaps first some background: > &g

Re: custom low-level indexer (to speed things up) when fields, terms and docids are in order

2010-03-26 Thread Michael McCandless
hen you can't [as easily] swap codecs). Have you profiled to see where the time is being spent? Mike On Thu, Mar 25, 2010 at 7:40 PM, britske wrote: > > Hi, > > perhaps first some background: > > I need to speed-up indexing for an particular application which has a pretty &

custom low-level indexer (to speed things up) when fields, terms and docids are in order

2010-03-25 Thread britske
Hi, perhaps first some background: I need to speed-up indexing for an particular application which has a pretty unsual schema: besides the normal stored and indexed fields we have about 20.000 fields per document which are all indexed/ non-stored sInts. Obviously indexing was really slow

UNC speed vs DOS path speed

2010-03-22 Thread Woolf, Ross
\general\index, which is to the exact location as the localhost UNC path, it only takes 14 seconds to index the same 3000 documents. I realize that using the UNC path causes the involvement of the IP stack, but I'm surprised at the difference of speed. Is there anything in Lucene itself that

Re: Question about how to speed up custom scoring

2009-10-11 Thread scott w
On Sun, Oct 11, 2009 at 9:10 AM, Jake Mannix wrote: > What do you mean "not something I can plug in on top of my original query"? > > Do you mean that you can't do it like the more complex example in the class > you posted earlier in the thread, where you take a linear combination of > the > Map

Re: Question about how to speed up custom scoring

2009-10-11 Thread Jake Mannix
What do you mean "not something I can plug in on top of my original query"? Do you mean that you can't do it like the more complex example in the class you posted earlier in the thread, where you take a linear combination of the Map -based score, and the regular text score? Another option is to j

Re: Question about how to speed up custom scoring

2009-10-10 Thread scott w
Haven't tried it yet but looking at it closer it looks like it's not something I can plug in on top of my original query. I am definitely happy using an approximation for the sake of performance but I do need to be able to have the original results stay the same. On Fri, Oct 9, 2009 at 5:32 PM, Ja

Re: Question about how to speed up custom scoring

2009-10-09 Thread Jake Mannix
Great Scott (hah!) - please do report back, even if it just works fine and you have no more questions, I'd like to know whether this really is what you were after and actually works for you. Note that the FieldCache is kinda "magic" - it's lazy (so the first query will be slow and you should fire

Re: Question about how to speed up custom scoring

2009-10-09 Thread scott w
Thanks Jake! I will test this out and report back soon in case it's helpful to others. Definitely appreciate the help. Scott On Fri, Oct 9, 2009 at 3:33 PM, Jake Mannix wrote: > On Fri, Oct 9, 2009 at 3:07 PM, scott w wrote: > > > Example Document: > > model_1_score = 0.9 > > model_2_score = 0

Re: Question about how to speed up custom scoring

2009-10-09 Thread Jake Mannix
On Fri, Oct 9, 2009 at 3:07 PM, scott w wrote: > Example Document: > model_1_score = 0.9 > model_2_score = 0.3 > model_3_score = 0.7 > > I want to be able to pass in the following map at query time: > {model_1_score=0.4, model_2_score=0.7} and have that map get used as input > to a custom score f

Re: Question about how to speed up custom scoring

2009-10-09 Thread scott w
Hi Jake -- Sorry for the confusion. I have two similar but slightly different use cases in mind and the example I gave you corresponds to one use case while the code corresponds to the other slightly more complicated one. Ignore the original example, and let me restate the one I have in mind so it

Re: Question about how to speed up custom scoring

2009-10-09 Thread Jake Mannix
Hey Scott, I'm still not sure I understand what your dynamic boosts are for: they are the names of fields, right, not terms in the fields? So in terms of your example { company = microsoft, city = redmond, size = big }, the three possible choices for keys in your map are company, city, or size,

Re: Question about how to speed up custom scoring

2009-10-09 Thread scott w
(Apologies if this message gets sent more than once. I received an error sending it the first two times so sent directly to Jake but reposting to group.) Hi Jake -- Thanks for the feedback. What I am trying to implement is a way to custom score documents using a scoring function that takes as inp

Re: Question about how to speed up custom scoring

2009-10-09 Thread scott w
Right exactly. I looked into payload initially and realized it wouldn't work for my use case. On Fri, Oct 9, 2009 at 2:00 PM, Grant Ingersoll wrote: > Oops, just reread and realized you wanted query time weights. Payloads are > an index time thing. > > > On Oct 9, 2009, at 5:49 PM, Grant Ingers

Re: Question about how to speed up custom scoring

2009-10-09 Thread Grant Ingersoll
Oops, just reread and realized you wanted query time weights. Payloads are an index time thing. On Oct 9, 2009, at 5:49 PM, Grant Ingersoll wrote: If you are trying to add specific term weights to terms in the index and then incorporate them into scoring, you might benefit from payloads a

Re: Question about how to speed up custom scoring

2009-10-09 Thread Grant Ingersoll
If you are trying to add specific term weights to terms in the index and then incorporate them into scoring, you might benefit from payloads and the PayloadTermQuery option. See http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ -Grant On Oct 8, 2009, at 11:56 AM

Re: Question about how to speed up custom scoring

2009-10-09 Thread Jake Mannix
Scott, To reiterate what Erick and Andrzej's said: calling IndexReader.document(docId) in your inner scoring loop is the source of your performance problem - iterating over all these stored fields is what is killing you. To do this a better way, can you try to explain exactly what this Scorer

Re: Question about how to speed up custom scoring

2009-10-09 Thread scott w
Thanks for the suggestions Erick. I am using Lucene 2.3. Terms are stored and given Andrzej's comments in the follow up email sounds like it's not the stored field issue. I'll keep investigating... thanks, Scott On Thu, Oct 8, 2009 at 8:06 AM, Erick Erickson wrote: > I suspect your problem here

Re: Question about how to speed up custom scoring

2009-10-08 Thread Andrzej Bialecki
Erick Erickson wrote: I suspect your problem here is the line: document = indexReader.document( doc ); See the caution in the docs You could try using lazy loading (so you don't load all the terms of the document, just those you're interested in). And I *think* (but it's been a while) that if t

Re: Question about how to speed up custom scoring

2009-10-08 Thread Erick Erickson
I suspect your problem here is the line: document = indexReader.document( doc ); See the caution in the docs You could try using lazy loading (so you don't load all the terms of the document, just those you're interested in). And I *think* (but it's been a while) that if the terms you load are in

Re: Question about how to speed up custom scoring

2009-10-08 Thread scott w
Oops, forgot to include the class I mentioned. Here it is: public class QueryTermBoostingQuery extends CustomScoreQuery { private Map queryTermWeights; private float bias; private IndexReader indexReader; public QueryTermBoostingQuery( Query q, Map termWeights, IndexReader indexReader, fl

Question about how to speed up custom scoring

2009-10-08 Thread scott w
I am trying to come up with a performant query that will allow me to use a custom score where the custom score is a sum-product over a set of query time weights where each weight gets applied only if the query time term exists in the document . So for example if I have a doc with three fields: comp

UNC speed vs DOS path speed

2009-08-07 Thread Woolf, Ross
\index, which is to the exact location as the localhost UNC path, it only takes 14 seconds to index the same 3000 documents. I realize that using the UNC path causes the involvement of the IP stack, but I'm surprised at the difference of speed. Is there anything in Lucene itself that would ac

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Erick Erickson
item for list of BADs... but not for me > as we do not use phrase Qs to be honest, I do not even know how they > are implemented... but no, there are no positions in such cache... > > well, they remain slower (but they work!) the rest will be faster... with > existing api... >

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
remain slower (but they work!) the rest will be faster... with existing api... It is maybe even possible somehow to speed them up with it, at the end of a day, even for phrase queries, you need first to determine which document matches term... But as said, I never looked into this part of code. I

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Jason Rutherglen
be honest, I do not know is anyone today runs high volume search from disk > (maybe SSD), even than, significant portion has to be in RAM... > > One day we could throw many CPUs at Query... but this is not an easy one... > > > > > > - Original Message >> F

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
day, 16 July, 2009 19:22:28 > Subject: Re: speed of BooleanQueries on 2.9 > > Do we think that we'll be able to support indexing stop words > using PFOR (with relaxation on the compression to gain > performance?) Today it seems like the best approach to indexing > stop word

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Jason Rutherglen
ptimist without numbers to prove it). > > Cheers, Eks > > > > > > > > > - Original Message >> From: Michael McCandless >> To: java-user@lucene.apache.org >> Sent: Thursday, 16 July, 2009 16:23:57 >> Subject: Re: speed of BooleanQueries on 2.9 >>

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
a-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 16:23:57 > Subject: Re: speed of BooleanQueries on 2.9 > > Super, thanks for testing! > > And, the 10% speedup overall is good progress... > > Mike > > On Thu, Jul 16, 2009 at 9:16 AM, eks devwrote: > > > > and o

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread Michael McCandless
rom: eks dev >> To: java-user@lucene.apache.org >> Sent: Thursday, 16 July, 2009 14:40:26 >> Subject: Re: speed of BooleanQueries on 2.9 >> >> >> ok new facts, less chaos :) >> >> - LUCENE-1744 fixed it definitely; I have it confirmed >> Also

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
up you will hear from me... Thanks again to all. Cheers, Eks - Original Message > From: eks dev > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 14:40:26 > Subject: Re: speed of BooleanQueries on 2.9 > > > ok new facts, less chaos :) >

Re: speed of BooleanQueries on 2.9

2009-07-16 Thread eks dev
0.28227 ZIPS:berien^0.25947002 ZIPS:berling^0.23232001 ZIPS:perlin^0.2615))^1.2) Thanks! - Original Message > From: Michael McCandless > To: java-user@lucene.apache.org > Sent: Thursday, 16 July, 2009 13:52:06 > Subject: Re: speed of BooleanQueries on 2.9 > > On Thu, J

  1   2   3   >