Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-16 Thread Navneet Verma
Hi Uwe, Thanks for the prompt response. I have created the gh issue: https://github.com/apache/lucene/issues/13920 for more discussion. We can move all discussions to the gh issues. Thanks Navneet On Tue, Oct 15, 2024 at 3:17 AM Uwe Schindler wrote: > Hi, > > The problem with your aproach is th

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-15 Thread Uwe Schindler
Hi, The problem with your aproach is that you can change the madvise on a clone, but as the underlying memory is the same for the cloned index input, it won't revert back to RANDOM. Basically there's no need to clone or create a slice. We should better allow to change the advise for an Index

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-15 Thread Navneet Verma
Hi Uwe, *>> thinking about it a bit more: In 10.x we already have some ways to **preload data with WILL_NEED (or similar). Maybe this can also be used on **merging when we reuse an already open IndexInput. Maybe it is possible **to change the madvise on an already open IndexInput and change it **b

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Navneet Verma
Hi Uwe, Thanks for sharing the link and providing the useful information. I will definitely go ahead and create a gh issue. In the meantime I did some testing by changing the IOContext from RANDOM to READ for FlatVectors

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
Hi, thinking about it a bit more: In 10.x we already have some ways to preload data with WILL_NEED (or similar). Maybe this can also be used on merging when we reuse an already open IndexInput. Maybe it is possible to chanhge the madvise on an already open IndexInput and change it before merg

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
Hi, great. I still think the difference between RANDOM and READ is huge in your case. Are you sure that you have not misconfigured your system. The most important thing for Lucene is to make sure that heap space of the Java VM is limited as much as possible (shortly over the OOM boundary) and

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Navneet Verma
Hi Uwe, To ans your question about the RAM and heap size. Here are some details RAM: 128GB Heap: 32GB CPU: 16 This is where I will put some reproducible benchmarks using Lucene alone. I have currently used Opensearch 2.17 version to run these benchmarks. *In general, the correct fix for this is

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-10-01 Thread Uwe Schindler
Hi, this seems to be aspecial case in FlatVectors, because normally theres a separate method to open an IndexInput for checksumming: https://github.com/apache/lucene/blob/524ea208c870861a719f21b1ea48943c8b7520da/lucene/core/src/java/org/apache/lucene/store/Directory.java#L155-L157 Could you o

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-30 Thread Navneet Verma
Hi Uwe and Mike, Thanks for providing such a quick response. Let me try to ans few things here: *In addition, inLucene 9.12 (latest 9.x) version released today there are some changesto ensure that checksumming is always done with IOContext.READ_ONCE(which uses READ behind the scenes).* I didn'

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-30 Thread Uwe Schindler
Hi, please also note: In Lucene 10 there checksum IndexInput will always be opened with IOContext.READ_ONCE. If you want to sequentially read a whole index file for other reasons than checksumming, please pass the correct IOContext. In addition, in Lucene 9.12 (latest 9.x) version released t

Re: Performance Difference between files getting opened with IoContext.RANDOM vs IoContext.READ

2024-09-29 Thread Michael McCandless
Hi Navneet, With RANDOM IOcontext, on modern OS's / Java versions, Lucene will hint the memory mapped segment that the IO will be random using madvise POSIX API with MADV_RANDOM flag. For READ IOContext, Lucene maybe hits with MADV_SEQUENTIAL, I'm not sure. Or maybe it doesn't hint anything? It'

Re: Performance changes within the Lucene 8 branch

2023-12-14 Thread Michael McCandless
Hi Marc, How are you retrieving your hits? Lucene's stored fields, or doc values, or both? Do you sort the hits docids and then retrieve them in docid order (NOT in the sorted order Lucene returned them in)? I think that might be faster as Lucene's stored fields use block compression and if the

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-09 Thread Michael McCandless
I'd also love to understand this: > using SimpleFSDirectoryFactory (since Mmap doesn't quite work well on Windows for our index sizes which commonly run north of 1 TB) Is this a known problem on certain versions of Windows? Normally memory mapped IO can scale to very large sizes (well beyond s

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-07 Thread Adrien Grand
I agree it's worth discussing. I opened https://github.com/apache/lucene/issues/12355 and https://github.com/apache/lucene/issues/12356. On Tue, Jun 6, 2023 at 9:17 PM Rahul Goswami wrote: > > Thanks Adrien. I spent some time trying to understand the readByte() in > ReverseRandomAccessReader (thr

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Rahul Goswami
Thanks Adrien. I spent some time trying to understand the readByte() in ReverseRandomAccessReader (through FST) and compare with 7.x. Although I don't understand ALL of the details and reasoning for always loading the FST (and in turn the term index) off-heap (as discussed in https://github.com/ap

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
Yes, this changed in 8.x: - 8.0 moved the terms index off-heap for non-PK fields with MMapDirectory. https://github.com/apache/lucene/issues/9681 - Then in 8.6 the FST was moved off-heap all the time. https://github.com/apache/lucene/issues/10297 More generally, there's a few files that are no l

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Rahul Goswami
Thanks Adrien. Is this behavior of FST something that has changed in Lucene 8.x (from 7.x)? Also, is the terms index not loaded into memory anymore in 8.x? To your point on MMapDirectoryFactory, it is much faster as you anticipated, but the indexes commonly being >1 TB makes the Windows machine fr

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
+Alan Woodward helped me better understand what is going on here. BufferedIndexInput (used by NIOFSDirectory and SimpleFSDirectory) doesn't play well with the fact that the FST reads bytes backwards: every call to readByte() triggers a refill of 1kB because it wants to read the byte that is just be

Re: Performance regression in getting doc by id in Lucene 8 vs Lucene 7

2023-06-06 Thread Adrien Grand
My best guess based on your description of the issue is that SimpleFSDirectory doesn't like the fact that the terms index now reads data directly from the directory instead of loading the terms index in heap. Would you be able to run the same benchmark with MMapDirectory to check if it addresses th

Re: Performance Comparison of Benchmarks by using Lucene 9.1.0 vs 8.5.1

2022-07-26 Thread Baris Kazar
Great, this was very helpful. This gives rough idea using the dates of the Lucene bugs/features added on those graphs. Best regards From: Michael Sokolov Sent: Tuesday, July 26, 2022 3:55 PM To: java-user@lucene.apache.org Cc: Baris Kazar Subject: Re

Re: Performance Comparison of Benchmarks by using Lucene 9.1.0 vs 8.5.1

2022-07-26 Thread Michael Sokolov
https://home.apache.org/~mikemccand/lucenebench/ shows how various benchmarks have evolved over time *on the main branch*. There is no direct comparison of every version against every other version that I have seen though. On Tue, Jul 26, 2022 at 2:12 PM Baris Kazar wrote: > > Dear Folks,- > Sim

RE: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Gietzen, Markus
May 2021 13:55 To: Michael McCandless ; Lucene Users Subject: RE: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0) Hi, thanks for reaching me that fast! Your hint that there were changes to NRTCachingDirectory were the right point: I copied the 8.3 NRTCachingDirectory impl

Re: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Adrien Grand
w-down. > I’ll report here. > > Bye, > > Markus > > > From: Michael McCandless > Sent: Wednesday, 19 May 2021 13:39 > To: Lucene Users ; Gietzen, Markus < > markus.giet...@softwareag.com> > Subject: Re: Performance decrease with NRT use-case in 8.8.x (coming from &

RE: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Gietzen, Markus
fine. Now 8.8 performs as fast as 8.3! I will check the differences and put them in step by step to find out which change causes the slow-down. I’ll report here. Bye, Markus From: Michael McCandless Sent: Wednesday, 19 May 2021 13:39 To: Lucene Users ; Gietzen, Markus Subject: Re

Re: Performance decrease with NRT use-case in 8.8.x (coming from 8.3.0)

2021-05-19 Thread Michael McCandless
> The update showed no issues (e.g. compiled without changes) but I noticed that our test-suites take a lot longer to finish. Hmm, that sounds bad. We need our tests to stay fast but also do a good job testing things ;) Does your production usage also slow down? Tests do other interesting thing

Re: Performance of Prefix, Wildcard and Regex queries?

2016-10-17 Thread Michael McCandless
It doesn't matter at all if you try to e.g. optimize a WildcardQuery like foo* into a PrefixQuery, because Lucene turns all of these queries into an AutomatonQuery anyway, which efficiently intersects a term automaton with the terms dictionary. Mike McCandless http://blog.mikemccandless.com On

Re: Performance of Prefix, Wildcard and Regex queries?

2016-10-16 Thread Trejkaz
On Sat, Oct 15, 2016 at 1:21 AM, Rajnish Kamboj wrote: > Hi > > Performance of Prefix, Wildcard and Regex queries? > Does Lucene internally optimizes this (using rewrite or something else) or > I have to manually create specific queries depending on input pattern. > > Example > if input is 78* cre

RE: Performance impact of searching across multiple fields

2015-07-28 Thread aurelien . mazoyer
Hi, Thank you for your answer. Is it something that is somehow theoretically quantifiable, or the only way to quantify the overhead is to prototype and to benchmark? Regards, Aurelien On 28.07.2015 17:15, Uwe Schindler wrote: It depends on the number of fields. If you search on 3 fields it

RE: Performance impact of searching across multiple fields

2015-07-28 Thread Uwe Schindler
It depends on the number of fields. If you search on 3 fields it is not likely to be a problem (the general use case 3 fields: plain, stemmed, folded). But if you have like 50 fields, the slow down is likely very large! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi

Aw: RE: RE: Re: Performance StringCoding.decode

2014-08-07 Thread Sascha Janz
we use jdk 1.7.55 and lucene 4.9.0 Sascha     Gesendet: Mittwoch, 06. August 2014 um 18:11 Uhr Von: "Uwe Schindler" An: java-user@lucene.apache.org Betreff: RE: RE: Re: Performance StringCoding.decode What Java version are you using? In Java 7 decoding of bytes to strings should be

RE: RE: Re: Performance StringCoding.decode

2014-08-06 Thread Uwe Schindler
Wednesday, August 06, 2014 5:57 PM > To: java-user@lucene.apache.org > Subject: Aw: RE: Re: Performance StringCoding.decode > > > > hi, > > no, not for all results, but user can configure the result list size up to 100 > documents. > > i was already afraid, th

Aw: RE: Re: Performance StringCoding.decode

2014-08-06 Thread Sascha Janz
g.toString(timestamp), Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.NO) greetings sascha   Gesendet: Mittwoch, 06. August 2014 um 10:50 Uhr Von: "Uwe Schindler" An: java-user@lucene.apache.org Betreff: RE: Re: Performance StringCoding.decode Hi, It looks like you are fetc

RE: Re: Performance StringCoding.decode

2014-08-06 Thread Uwe Schindler
rg > Subject: Aw: Re: Performance StringCoding.decode > > i used JMC ( Java Mission Control) from jdk7 u40+ > > > see here > > > http://www.oracle.com/technetwork/java/javase/2col/jmc-relnotes- > 2004763.html > > > > Gesendet: Dienstag, 05. August 20

Aw: Re: Performance StringCoding.decode

2014-08-06 Thread Sascha Janz
i used JMC ( Java Mission Control) from jdk7 u40+ see here http://www.oracle.com/technetwork/java/javase/2col/jmc-relnotes-2004763.html     Gesendet: Dienstag, 05. August 2014 um 17:41 Uhr Von: "d...@neusoft.com" An: "java-user@lucene.apache.org" Betr

Re: Performance StringCoding.decode

2014-08-05 Thread Erick Erickson
Well, that code is when you're reading the fields of documents off disk. Stored fields are compressed/decompressed automatically. So one question is what is your test doing? In other words, is it artificially hitting this? The theory is that this should only be done when you gather the final top N

Re: Performance StringCoding.decode

2014-08-05 Thread d...@neusoft.com
how to monitor? use jprofile? From: Sascha Janz Date: 2014-08-05 22:36 To: java-user@lucene.apache.org Subject: Performance StringCoding.decode hi, i want to speed up our search performance. so i run test and monitor them with java mission control. the analysis showed that one hotspot is

Re: Performance issue when using multiple PhraseQueries against a 1+ million entries index

2014-05-19 Thread Liviu Matei
Thanks for the reply. When you mention system memory you referring to RAM (or HEAP as this is running as a java process) ? The index size is around 13G and the java process is not given so many memory (in terms of XMX). Could this be the cause? My understandint while reading some articles on the in

Re: Performance issue when using multiple PhraseQueries against a 1+ million entries index

2014-05-19 Thread Jack Krupansky
Does your index fit fully in system memory - the OS file cache? If not, there could be a lot of thrashing (I/O) as Lucene accesses the index. -- Jack Krupansky -Original Message- From: Liviu Matei Sent: Monday, May 19, 2014 4:21 PM To: java-user@lucene.apache.org Subject: Performance

Re: Performance issues with the default field compression

2014-04-10 Thread Alex Parvulescu
Hi Adrien, Thanks for clarifying! We're going to go the custom codec & custom visitor route. best, alex On Wed, Apr 9, 2014 at 10:38 PM, Adrien Grand wrote: > Hi Alex, > > Indeed, one or several (the number depends on the size of your > documents) documents need to be fully decompressed in o

Re: Performance issues with the default field compression

2014-04-09 Thread Adrien Grand
Hi Alex, Indeed, one or several (the number depends on the size of your documents) documents need to be fully decompressed in order to read a single field of a single document. Regarding the stored fields visitor, the default one doesn't return STOP when the field has been found because other fie

RE: Performance testing Lucene

2014-01-27 Thread Scott Schneider
many unit tests! Scott > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Friday, January 24, 2014 3:03 AM > To: java-user@lucene.apache.org > Subject: RE: Performance testing Lucene > > Hi Scott, > > the unit tests are also a good

RE: Performance testing Lucene

2014-01-24 Thread Uwe Schindler
[mailto:scott_schnei...@symantec.com] > Sent: Friday, January 24, 2014 2:41 AM > To: java-user@lucene.apache.org > Subject: RE: Performance testing Lucene > > Thanks! I ran this Directory subclass through the Lucene unit tests (and > found 3 race conditions). Unit tests are wonder

Re: Performance testing Lucene

2014-01-24 Thread Michael McCandless
gt; found 3 race conditions). Unit tests are wonderful. > > Scott > > >> -Original Message- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Wednesday, January 22, 2014 7:05 AM >> To: Lucene Users >> Subject: Re: Performance testi

RE: Performance testing Lucene

2014-01-23 Thread Scott Schneider
Lucene Users > Subject: Re: Performance testing Lucene > > All the source code for the nightly Lucene perf tests I run ( > http://people.apache.org/~mikemccand/lucenebench/ ) are here: > https://code.google.com/a/apache-extras.org/p/luceneutil/ > > These are also the scripts I use fo

Re: Performance testing Lucene

2014-01-22 Thread Michael McCandless
All the source code for the nightly Lucene perf tests I run ( http://people.apache.org/~mikemccand/lucenebench/ ) are here: https://code.google.com/a/apache-extras.org/p/luceneutil/ These are also the scripts I use for A/B performance tests for a new patch. It's somewhat tricky getting those Pyth

Re: Performance/scoring impacts with multiple occurrences of a field

2013-10-11 Thread Ian Lea
With multiple fields of the same name vs a single field I doubt you'd be able to tell the difference in performance or matching or scoring in normal use. There may be some matching/ranking effect if you are looking at, say, span queries across the multiple fields. Try it out and see what happens.

Re: Performance measurements

2013-08-20 Thread Sriram Sankar
fast even though it matches everything - no > scoring. > > -- Jack Krupansky > > -Original Message- From: Arjen van der Meijden > Sent: Thursday, July 25, 2013 3:06 PM > > To: java-user@lucene.apache.org > Subject: Re: Performance measurements > > Hi Sriram, >

Re: Performance measurements

2013-07-25 Thread Jack Krupansky
fast even though it matches everything - no scoring. -- Jack Krupansky -Original Message- From: Arjen van der Meijden Sent: Thursday, July 25, 2013 3:06 PM To: java-user@lucene.apache.org Subject: Re: Performance measurements Hi Sriram, I don't see any obvious mistakes, although

Re: Performance measurements

2013-07-25 Thread Arjen van der Meijden
Hi Sriram, I don't see any obvious mistakes, although you don't need to create a FilteredQuery: There are plenty of search-methods on the IndexSearcher that accept both a query (your TermQuery) and a filter (your TermsFilter). The way I understand Filters (but I have no advanced in-depth know

Re: Performance measurements

2013-07-25 Thread Sriram Sankar
Thanks everyone. I'm trying this out: > So searching would become: > - Create a Query with only your termA > - Create a TermsFilter with all your termB's > - execute your preferred search-method with both the query and the filter I don't the get the same results as before - and am still debuggin

Re: Performance measurements

2013-07-25 Thread Arjen van der Meijden
On 24-7-2013 21:58 Sriram Sankar wrote: On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky wrote: Scoring has been a major focus of Lucene. Non-scored filters are also available, but the query parsers are focused (exclusively) on scored-search. When you say "filter" do you mean a step performed

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
d be wrapped as a CSQ for search so that no scoring would be done. -- Jack Krupansky -Original Message- From: Sriram Sankar Sent: Wednesday, July 24, 2013 3:58 PM To: java-user@lucene.apache.org Subject: Re: Performance measurements On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
his more). Sriram. > > > -- Jack Krupansky > > -Original Message- From: Sriram Sankar > Sent: Wednesday, July 24, 2013 1:03 PM > To: java-user@lucene.apache.org > Subject: Re: Performance measurements > > > No I do not need scoring. This is a pur

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
1:03 PM To: java-user@lucene.apache.org Subject: Re: Performance measurements No I do not need scoring. This is a pure retrieval query - which matches what we used to do with Unicorn in Facebook - something like: (name:sriram AND (friend:1 OR friend:2 ...)) This automatically gives us second

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
No I do not need scoring. This is a pure retrieval query - which matches what we used to do with Unicorn in Facebook - something like: (name:sriram AND (friend:1 OR friend:2 ...)) This automatically gives us second degree. With Unicorn, we would always get sub-millisecond performance even for n

Re: Performance measurements

2013-07-24 Thread Adrien Grand
Hi, On Wed, Jul 24, 2013 at 6:11 PM, Sriram Sankar wrote: > termA AND (termB1 OR termB2 OR ... OR termBn) Maybe this comment is not appropriate for your use-case, but if you don't actually need scoring from the disjunction on the right of the query, a TermsFilter will be faster when n gets large

Re: Performance measurements

2013-07-24 Thread Jack Krupansky
Thanks for the detailed numbers. Nothing seems unexpected to me. Increasing query complexity or term count is simply going to increase query execution time. I think I'll add a new rule to my informal performance guidance - Query complexity of no more than ten to twenty terms is a "slam dunk",

Re: Performance measurements

2013-07-24 Thread Sriram Sankar
Clarification - I used an MMap'd index and warmed it up with similar queries, as well as running the identical query many times before starting measurements. I had ample heap space. Sriram. On Wed, Jul 24, 2013 at 9:11 AM, Sriram Sankar wrote: > I did some performance tests on a real index us

RE: Performance of NULL check *:* -category:[* TO *]

2013-05-14 Thread srividhyau
Hi -We are using 3.0.3. Could you point me to a similar functionality prior to 4.0?-Vidhya -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-NULL-check-category-TO-tp4063021p4063158.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

RE: Performance of NULL check *:* -category:[* TO *]

2013-05-13 Thread Uwe Schindler
There is a Filter that can find documents *without* or *with any* value: FieldValueFilter http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/FieldValueFilter.html You can create a query out of it: new ConstantScoreQuery(new FieldValueFilter("fieldname", true)) Uwe - Uwe Sch

Re: Performance of IndexSearcher.explain(Query)

2012-11-20 Thread Trejkaz
On Wed, Nov 21, 2012 at 10:40 AM, Robert Muir wrote: > Explain is not performant... but the comment is fair I think? Its more of a > worst-case, depends on the query. > Explain is going to rewrite the query/create the weight and so on just to > advance() the scorer to that single doc > So if this

Re: Performance of IndexSearcher.explain(Query)

2012-11-20 Thread Robert Muir
On Tue, Nov 20, 2012 at 6:18 PM, Trejkaz wrote: > I have a feature I wanted to implement which required a quick way to > check whether an individual document matched a query or not. > > IndexSearcher.explain seemed to be a good fit for this. > > The query I tested was just a BooleanQuery with two

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-24 Thread Andrei
I wrote rest server that uses mongodb as datastore and uses lucene for search https://sites.google.com/site/mongodbjavarestserver/home You can try it On Fri, May 18, 2012 at 9:44 AM, Konstantyn Smirnov wrote: > Hi all, > > apologies, if this question was already asked before. > > If I need to sto

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-23 Thread Aditya
Agreed. Here the discussion is whether Lucene could be considered for storing data? Whether Lucene could be used as NoSQL? The Answer is YES. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 2:12 PM, Konstantyn Smirnov wrote: > simple > > what is the speed of indexing of docum

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-22 Thread Konstantyn Smirnov
simple what is the speed of indexing of document with stored fields? what is the retrieval rate? how good can it scale? How good performs the MongoDB and other within the same discipline? Has anyone conducted such comparison-tests? To dump like 1 mio documents into the index (with the single inde

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-22 Thread findbestopensource
Just updated my view in the article.. Feel free to add your comments.. http://www.findbestopensource.com/article-detail/lucene-solr-as-nosql-db Regards Aditya www.findbestopensource.com On Mon, May 21, 2012 at 2:25 PM, Shashi Kant wrote: > A related thread on Stackoverflow: > > http://stackov

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Shashi Kant
A related thread on Stackoverflow: http://stackoverflow.com/questions/3215029/nosql-mongodb-vs-lucene-or-solr-as-your-database/3216550#3216550 On Fri, May 18, 2012 at 10:44 AM, Konstantyn Smirnov wrote: > Hi all, > > apologies, if this question was already asked before. > > If I need to store a l

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Apostolis Xekoukoulotakis
There is an IndexdocValue(renamed docvalues) in Lucene 4 which maps ids to a value and has different characteristics that the inverted index. If someone could answer my question as well , it entails using a k-v database for having personalized ranking(see the previous mail

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Li Li
what's your meaning of performance of storage? lucene just stores all fields of a document(or columns of a row if in db) together. it can only store string. you can't store int or long( except you convert it to string). to retrieve a given field of a document will cause many io operations. it's des

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-21 Thread Konstantyn Smirnov
That's ok, but what is the real difference? Are there any performance tests? I can assume, that up to 1 GB index size, there will be no noticeable difference with stored fields in comparison with some MongoDB, but if the index size grows? -- View this message in context: http://lucene.472066.n3.

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-20 Thread findbestopensource
Hi, Lucene is not a data store. You should store data in file system / DB and store only the reference key and data related to display summary results as part of Lucene. Usually in most application, once the search is performed list of search results with just few information will be displayed. O

Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-18 Thread Glen Newton
Storing content in large indexes can significantly add to index time. The model of indexing fields only in Lucene and storing just a key, and then storing the content in some other container (DBMS, NoSql, etc) with the key as lookup is almost a necessity for this use case unless you have a complet

Re: Performance improvements for fuzzy queries ?

2012-03-08 Thread Paul Taylor
On 03/02/2012 15:01, Paul Taylor wrote: Using Lucene 3.5, I created a query parser based on the dismax parser but in order to get matches on misspellings ecetra I additionally do a fuzzy search and a wildcard search http://svn.musicbrainz.org/search_server/trunk/servlet/src/main/java/org/mu

Re: Performance of MultiFieldQueryParser versus QueryParser

2012-03-02 Thread Ian Lea
Highly unlikely unless your subclass was slow for some reason. -- Ian. On Fri, Mar 2, 2012 at 7:31 AM, Paul Taylor wrote: > If I happen to subclass MultiFieldQueryParser unneccessarily (thought need > more than one default search but don't after all) would it  have any impact > on performance

Re: performance question - number of documents

2011-10-27 Thread Felipe Hummel
Thanks again. > > > > > - Original Message - > From: Erick Erickson > To: java-user@lucene.apache.org; sol myr > Cc: > Sent: Sunday, October 23, 2011 7:18 PM > Subject: Re: performance question - number of documents > > "Why would it matter...top 5 mat

Re: performance question - number of documents

2011-10-24 Thread sol myr
Thanks again. - Original Message - From: Erick Erickson To: java-user@lucene.apache.org; sol myr Cc: Sent: Sunday, October 23, 2011 7:18 PM Subject: Re: performance question - number of documents "Why would it matter...top 5 matches" Because Lucene has to calculate the

Re: performance question - number of documents

2011-10-23 Thread Antony Sequeira
This may not be directly relevant to Lucene, but I wanted to learn: How does a web search engine do something like this. Do they also "score every matching document on every query" OR do they pick a subset first based on some static/offlline ranking criteria then do what Lucene does OR do they sea

Re: performance question - number of documents

2011-10-23 Thread Erick Erickson
"Why would it matter...top 5 matches" Because Lucene has to calculate the score of all documents in order to insure that it returns those 5 documents. What if the very last document scored was the most relevant? Best Erick On Sun, Oct 23, 2011 at 3:06 PM, sol myr wrote: > Hi, > > We've noticed s

Re: Performance question

2011-07-14 Thread Mihai Caraman
Thank you for the reply, if you need more info to understand the question, I'll try to be as prompt as possible. > -if i search on last week's index and the individual index (this needs to be > opened at search request!?) will it be faster than using a single huge index > for all groups, for all w

Re: Performance question

2011-07-14 Thread Ian Lea
Searching billions of anything is likely to be challenging. Mark Miller's document at http://www.lucidimagination.com/content/scaling-lucene-and-solr looks well worth a read. > -if i search on last week's index and the individual index (this needs to be > opened at search request!?) will it be fas

Re: Performance and index size (rephrased question)

2011-03-31 Thread Erick Erickson
5-10 G indexes are pretty small by Lucene/Solr standards, so given reasonable hardware resources this should be no problem. That said, only measurement will nail this down. But an often-used rule of thumb is that you need to consider some better strategies in the 40G range. CAUTION: you haven't sp

Re: Performance problems with lazily loaded fields

2011-03-22 Thread Erick Erickson
Don't do that Let's back up a second and ask why in the world you want to do this, what's the use-case you're satisfying? Because spinning through all the results and getting information from the underlying documents is inherently expensive since, as Sanne says, you're doing disk seeks. Most L

Re: Performance problems with lazily loaded fields

2011-03-21 Thread Sanne Grinovero
2011/3/21 Brian Hurt : > I'm having a problem with the performance of lazily-loaded fields with > lucene.  The basic structure of the code is that I get a set of documents > back from a query, then iterate through them, reading most fields to collect > fragments.  This is taking an excessively long

RE: performance issues in multivalued fields

2011-03-07 Thread suman.holani
, March 07, 2011 5:50 PM To: java-user@lucene.apache.org Subject: Re: performance issues in multivalued fields You have to describe in detail what "taking a huge performance hit" means, there's not much to go on here... But in general, adding N elements to a mutli-valued field isn

Re: performance issues in multivalued fields

2011-03-07 Thread Erick Erickson
You have to describe in detail what "taking a huge performance hit" means, there's not much to go on here... But in general, adding N elements to a mutli-valued field isn't a problem at all. This bit of code: Document D = searcher.doc(hits[i].doc); is very suspicious. Does your cLucene version h

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Shai Erera
That's right. In 3x though you have to call addIndexes followed by maybeMerge if you want to achieve the same effect of addindexesNoOptimize. Shai On Friday, November 12, 2010, Marc Sturlese wrote: > > Thanks, so clarifying. As far as I've understood, if I have to end up > optimizing the index j

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Marc Sturlese
Thanks, so clarifying. As far as I've understood, if I have to end up optimizing the index just after merging it, no matter if I use the lucene 3.X addIndexes or addIndexesNoOptimize as the sum of time of doing both things will be the same in one case or other. Am I right? -- View this message i

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Shai Erera
Ok, so a couple of clarifications: addIndexes(Directory...) *does not* trigger any merges. It simply registers the incoming directories in the target index, and returns. You can later call maybeMerge() or optimize() as you see fit. Compound files are irrelevant to addIndexes - it just adds the in

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Marc Sturlese
Thanks a lot Shai, couple of questions: >> In Lucene 3x there is a new addIndexes which accepts Directory… that >> simply registers the new indexes in the index, without running merges. >> That makes addIndexes very fast. With the lucene 3.X addIndexes which accepts Directory, if after the mer

Re: performance merging indexes with addIndexesNoOptimize

2010-11-12 Thread Shai Erera
In Lucene 3x there is a new addIndexes which accepts Directory… that simply registers the new indexes in the index, without running merges. That makes addIndexes very fast. Also, you can consider calling close(false) to not wait for merges. That can speed things up as well. But note that not run

Re: Performance Results on changing the way fields are stored

2010-01-13 Thread Paul Taylor
Grant Ingersoll wrote: On Jan 5, 2010, at 7:44 AM, Paul Taylor wrote: So currently in my index I index and store a number of small fields, I need both so I can search on the fields, then I use the stored versions to generate the output document (which is either an XML or JSON representatio

Re: Performance Results on changing the way fields are stored

2010-01-07 Thread Otis Gospodnetic
You could try Avro instead of JSON/XML/Java Serialization. It's compact (and new). http://hadoop.apache.org/avro/ Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: Paul Taylor > To: java-user@lucene.apache.org > Sent: Tue, January 5, 2010

Re: Performance Results on changing the way fields are stored

2010-01-06 Thread Grant Ingersoll
On Jan 5, 2010, at 7:44 AM, Paul Taylor wrote: > So currently in my index I index and store a number of small fields, I need > both so I can search on the fields, then I use the stored versions to > generate the output document (which is either an XML or JSON representation), > because I read

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
0, 2009 6:37 PM > To: java-user@lucene.apache.org > Subject: Re: Performance problems with Lucene 2.9 > > The problem with this method is that I won't be able to know how many > total > results / pages a search have? > > For example if I do a search X that returns 1,00

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
; > > useful, because the first 200 hits cannot be ranked. > > > > > > > > - > > > > Uwe Schindler > > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > > http://www.thetaphi.de > > > > eMail: u...@thetaphi.de > > >

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
ctors? > >> > > > > > >> > > > > - Mike > >> > > > > aka...@gmail.com > >> > > > > > >> > > > > > >> > > > > On Mon, Nov 30, 2009 at 11:03 AM, Uwe Schindler > > >> > > wrot

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
st 200 hits cannot be ranked. > > > > > > - > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > >

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
esults, TopDocs is not >> > very >> > useful, because the first 200 hits cannot be ranked. >> > >> > - >> > Uwe Schindler >> > H.-H.-Meier-Allee 63, D-28213 Bremen >> > http://www.thetaphi.de >> > eMail: u.

Re: Performance problems with Lucene 2.9

2009-11-30 Thread Michel Nadeau
e > > eMail: u...@thetaphi.de > > > > > -Original Message- > > > From: Michel Nadeau [mailto:aka...@gmail.com] > > > Sent: Monday, November 30, 2009 5:35 PM > > > To: java-user@lucene.apache.org > > > Subject: Re: Performance proble

RE: Performance problems with Lucene 2.9

2009-11-30 Thread Uwe Schindler
gt; > query. > > > > > > > > > > And if you iterate over all results never-ever use Hits! (its > > already > > > > > deprecated). Write a Collector instead (as you are not interested > in > > > > > scoring). > > > > >

  1   2   3   >