IndexDocValues

2014-06-26 Thread Sandeep Khanzode
I came across this type when I checked this blog:  http://blog.trifork.com/2011/10/27/introducing-lucene-index-doc-values/   The blog mentions that the IndexDocValues are created as sorting types indexed specifically for the purpose and reduce the overhead created by the FieldCache. I could not l

Re: QueryParserUtil, big query with wildcards -> runs endlessly and produces heavy load

2014-06-26 Thread Michael McCandless
The test case is "only" parsing this query, not trying to run it, right? So it doesn't involve automaton/FST ... just the flexible query parser code? It seems bad that flexible QP would take so long, even if the query is "strange". Can you open an issue, and maybe attach a thread dump so we can

Re: Batch wise Indexing Structured Documents

2014-06-26 Thread parnab kumar
download lucene source code... and check the demo source files that are shipped with it ... you should find a sample indexing file... On Thu, Jun 26, 2014 at 9:27 PM, Venkata krishna wrote: > Hi, > > I have to index millions of files, that's why i am thinking batch wise > indexing is good. > >

Re: QueryParserUtil, big query with wildcards -> runs endlessly and produces heavy load

2014-06-26 Thread Erick Erickson
I suspect you're getting leading wildcard searches as well, which must do entire term scans unless you're doing the reverse trick. Replacing all successive whitespace gives you: Lorem*ipsum*dolor*sit*amet,*consetetur*sadipscing*elitr,*sed*diam*nonumy*eirmod*tempor*invidunt*ut*labore*et*dolore*magn

Batch wise Indexing Structured Documents

2014-06-26 Thread Venkata krishna
Hi, I have to index millions of files, that's why i am thinking batch wise indexing is good. Is it possible to do batch indexing using lucene? If batch indexing is possible using lucene provide me sample snippet. So could you please provide your valuable suggestions. Thanks Venkata krishna

Re: QueryParserUtil, big query with wildcards -> runs endlessly and produces heavy load

2014-06-26 Thread Jack Krupansky
I'll defer the the hard-core Lucene committers for the technical details, but I would suggest that a very large term with dozens of wildcards is a "known limitation" (albeit not well-documented.) IOW, to use wildcards in Lucene in a performant manner, they need to be "brief". -- Jack Krupansky

Re: SortedDocValuesField

2014-06-26 Thread Robert Muir
don't use RAMDirectory: its not very performant and really intended for e.g. testing and so on. also, using a ramdirectory here defeats the purpose: the idea behind using a docvaluesfield in most cases is to keep (most of) such datastructures out of heap memory. The datastructures and even the com

SortedDocValuesField

2014-06-26 Thread Sandeep Khanzode
Hi,   I was checking the SortedDocValuesField and its performance in Sort as opposed to a normal i.e. StringField and its performance in the same sort. So, I used the same string/bytesref value in both fields and in separate JVM processes, I launched the two sorts. I used a RAMDirectory and cre

QueryParserUtil, big query with wildcards -> runs endlessly and produces heavy load

2014-06-26 Thread Clemens Wyss DEV
The following "testcase" runs endlessly and produces VERY heavy load. ... String query = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut " + "labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et