[jira] Resolved: (LUCENE-2111) Wrapup flexible indexing

2010-04-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2111. Resolution: Fixed > Wrapup flexible index

Re: Controlling the maximum size of a segment during indexing

2010-04-09 Thread Lance Norskog
ce Norskog wrote: >>> >>> Here is a Java unit test that uses the LogByteSizeMergePolicy to >>> control the maximum size of segment files during indexing. That is, it >>> tries. It does not succeed. Will someone who truly understands the >>> merge policy c

Re: Controlling the maximum size of a segment during indexing

2010-04-09 Thread Lance Norskog
a unit test that uses the LogByteSizeMergePolicy to >> control the maximum size of segment files during indexing. That is, it >> tries. It does not succeed. Will someone who truly understands the >> merge policy code please examine it. There is probably one tiny >> paramet

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-04-09 Thread Shai Erera (JIRA)
too :). At least the one I received. But never mind that ... as long as we both agree the implementation should change. I didn't mean to say anything bad about what you did .. I know the limitations you had to work with. > Parallel incremen

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-04-09 Thread Michael Busch (JIRA)
an upgraded version that works with 3.0 within IBM, Shai. > Parallel incremental indexing > - > > Key: LUCENE-1879 > URL: https://issues.apache.org/jira/browse/LUCENE-1879 > Project: Lucene - Java >

Re: Controlling the maximum size of a segment during indexing

2010-04-09 Thread Mark Miller
smaller than maxMergeMB that create a segment larger than maxMergeMB. -- - Mark http://www.lucidimagination.com On 04/09/2010 01:01 AM, Lance Norskog wrote: Here is a Java unit test that uses the LogByteSizeMergePolicy to control the maximum size of segment files during indexing. That is, it

Re: Controlling the maximum size of a segment during indexing

2010-04-08 Thread Shai Erera
olicy to > control the maximum size of segment files during indexing. That is, it > tries. It does not succeed. Will someone who truly understands the > merge policy code please examine it. There is probably one tiny > parameter missing. > > It adds 20 documents that each are 100k

Controlling the maximum size of a segment during indexing

2010-04-08 Thread Lance Norskog
Here is a Java unit test that uses the LogByteSizeMergePolicy to control the maximum size of segment files during indexing. That is, it tries. It does not succeed. Will someone who truly understands the merge policy code please examine it. There is probably one tiny parameter missing. It adds 20

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-06 Thread Shivender Devarakonda (JIRA)
any infostream output when it had gone OOM. I see the same exception trace that I posted above. Do you think I missed anything? > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-06 Thread Michael McCandless (JIRA)
ream output shows no exception... I'm confused. > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 > Project: Lucene - Java

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-06 Thread Shivender Devarakonda (JIRA)
m on all writers. > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 > Project: Lucene - Java > Issue Type: Bug >

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-06 Thread Michael McCandless (JIRA)
oryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 > Project: Lucene - Java > Issue Type: Bug > Components: I

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-05 Thread Shivender Devarakonda (JIRA)
1371 _9yq:c42466 _cml:c45324 _fgk:c46186 _i63:c45931 _kz8:c46568 _ntr:c47071 _qiq:c46080 _qin:c18 _qip:c1 _qir:c4 _qit:c19 _qiv:c1 _qix:c51 _qiz:c2 > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 >

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-03 Thread Shivender Devarakonda (JIRA)
ways reproducible, I will post the infostream output once I get it reproduced again. > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 >

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-03 Thread Michael McCandless (JIRA)
ways 100? Why are you using NoLockFactory? That's generally very dangerous. Also, why do you call FSDir's setReadChunkSizeMB? Can you post the infoStream output? > OutOfMemoryException while Indexing > --- > > Key: LUCENE-

[jira] Issue Comment Edited: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-03 Thread Shivender Devarakonda (JIRA)
insertionTime = MasterClock.currentTimeMillis() - startTime; } We do not set any RAMBufferSize in this case. Please let me know if you need anything on this. > OutOfMemoryException while Indexing > --- > > Key: LUCE

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-03 Thread Shivender Devarakonda (JIRA)
ow if you need anything on this. > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 > Project: Lucene - Java > Issue Typ

Re: [jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-03 Thread Lars Grote
y? > > Can you enable IndexWriter's infoStream and post the output? > > > OutOfMemoryException while Indexing > > --- > > > > Key: LUCENE-2361 > > URL: https://issues.apache.org/jira/browse/

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-03 Thread Michael McCandless (JIRA)
s on how you're using Lucene? What's IndexWriters ramBufferSizeMB? How much heap are you giving the JVM? Are you calling commit (or closing/opening a new IndexWriter) frequently or rarely? Can you enable IndexWriter's infoStream and post the output? > OutOfMemoryEx

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Shivender Devarakonda (JIRA)
the same in other cases too. Basically, We have load that continuously pushes specific objects and we index each incoming object. . If we do not put that load then it is working fine. > OutOfMemoryException while Indexing > --- > >

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Earwin Burrfoot (JIRA)
tion always have this stacktrace? Maybe someone else litters in PermGen? > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 >

[jira] Issue Comment Edited: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Shivender Devarakonda (JIRA)
7 AM: I am sure we have less than 20 fields in the document. Do you think it still causes this issue? was (Author: shivenderd): I am sure we have less than 20 fields in the index > OutOfMemoryException while I

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Shivender Devarakonda (JIRA)
less than 20 fields in the index > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 > Project: Lucene - Java > Issue Type:

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Earwin Burrfoot (JIRA)
I tested String.intern(), it failed to cause OOMs being bombarded by random strings, like it did in java 1.4.something. > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Shivender Devarakonda (JIRA)
s it happens without the profiler also. Thanks > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 > Project: Lucene - Java >

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Earwin Burrfoot (JIRA)
iler? Does that happen without it? > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 > Project: Lucene - Java >

[jira] Commented: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Earwin Burrfoot (JIRA)
that? > OutOfMemoryException while Indexing > --- > > Key: LUCENE-2361 > URL: https://issues.apache.org/jira/browse/LUCENE-2361 > Project: Lucene - Java > Issue Type: Bug > Components: Index

[jira] Created: (LUCENE-2361) OutOfMemoryException while Indexing

2010-04-02 Thread Shivender Devarakonda (JIRA)
OutOfMemoryException while Indexing --- Key: LUCENE-2361 URL: https://issues.apache.org/jira/browse/LUCENE-2361 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-04-01 Thread Michael McCandless (JIRA)
andle > 2B terms; I also strengthened CheckIndex to verify that .ord() of the TermsEnum always returns the right result, for codecs that implement .ord. I'll commit shortly... > Wrapup flexible indexing > > > Key: LUCENE-2111 >

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-31 Thread Michael McCandless (JIRA)
rms -- attached patch creates such an index. I'm still getting to the bottom of it... > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene -

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-03-30 Thread Michael McCandless (JIRA)
problem around (you pass a DFA to the codec and it does the intersection & enums the result), and we used byte-based DFAs, I think we'd get a good speedup. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL:

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-03-30 Thread Michael McCandless (JIRA)
s me -- that can't be a terms dict thing (just one lookup); i'm not sure offhand why it's faster. That code is not very different than trunk. bq. How's the indexing performance? Unchanged -- I indexed first 10M docs of wikipedia and the times were nearly identical. &g

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-03-30 Thread Robert Muir (JIRA)
de today. All this being said, I think flex is a great move forward for multitermqueries, at least we have a seeking-friendly API! One step at a time. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-03-30 Thread Michael Busch (JIRA)
some work! What changes make those queries run faster with the default codec? Mostly terms dict changes and automaton for fuzzy/wildcard? How's the indexing performance? bq. I think net/net we are good to land flex! +1! Even if there are still small things to change/fix I think it makes

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-30 Thread Michael McCandless (JIRA)
exc if it's run on a field that omitTFAPs (matches PhraseQuery), fixes all jdoc warnings, spells out back compat breaks in changes. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jir

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-03-30 Thread Robert Muir (JIRA)
flex! +1. The tests have been passing for some time now, and Solr tests pass too. It would be nice to look at merging flex into the trunk soon so that it gets more exposure. > Wrapup flexible indexing > > > Key: LUCENE-2111 >

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-03-30 Thread Michael McCandless (JIRA)
;s followed by a suffix). * Flex API on a trunk index does take a perf hit but it looks contained enough that we don't need to spend any time optimizing that emulation layer... I also ran an indexing test (index first 10M docs of wikipedia) and flex a

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-27 Thread Grant Ingersoll (JIRA)
the intent, and was likely further confused due to the fact that Michael B and I discussed it over tasty Belgian Beer in Oakland. I'll open a discussion on list for incremental field updates. > Parallel incremental indexing > - > >

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-27 Thread Michael McCandless (JIRA)
oot/src/flex.clean/contrib/benchmark/logs/flexOnTrunk.1 25.87 QPS [-58.6% worse] run flex on flex index... cd /root/src/flex.clean/contrib/benchmark log: /root/src/flex.clean/contrib/benchmark/logs/flexOnFlex.2 39.30 QPS [-37.1% worse] 124623 hits {code} Other queries I&

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-26 Thread Shai Erera (JIRA)
is interesting, and deserves discussion, but in a separate issue/thread? > Parallel incremental indexing > - > > Key: LUCENE-1879 > URL: https://issues.apache.org/jira/browse/LUCENE-1879 > Project: Lucene - Ja

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-26 Thread Grant Ingersoll (JIRA)
t more than a few thousand changes in a minute or two and the background merger would be responsible for keeping the total number of disjoint documents low. > Parallel incremental indexing > - > > Key: LUCENE-1879 > URL: https://i

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-26 Thread Shai Erera (JIRA)
aded indexing is to do a two-phase addDocument. First, allocate a doc ID from DocumentsWriter (synchronized) and then add the Document to each Slice with that doc ID. DocumentsWriter was not suppose to know it is a parallel index ... something like the following. {code} int docId = obtainDocId();

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-26 Thread Michael Busch (JIRA)
port multi-threaded parallel-indexing. If we have single-threaded DocumentsWriters, then it should be easy to have a ParallelDocumentsWriter? > Parallel incremental indexing > - > > Key: LUCENE-1879 > URL: https://issues.apa

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-24 Thread Michael McCandless (JIRA)
on flex branch! I turned most of them into TODOs :) > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Is

[jira] Reopened: (LUCENE-2111) Wrapup flexible indexing

2010-03-19 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-2111: Duh -- wrong issue! I only wish ;) > Wrapup flexible index

RE: [jira] Resolved: (LUCENE-2111) Wrapup flexible indexing

2010-03-19 Thread Uwe Schindler
e.apache.org > Subject: [jira] Resolved: (LUCENE-2111) Wrapup flexible indexing > > > [ https://issues.apache.org/jira/browse/LUCENE- > 2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Mic

[jira] Resolved: (LUCENE-2111) Wrapup flexible indexing

2010-03-19 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2111. Resolution: Fixed Thanks Shai! > Wrapup flexible index

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-11 Thread Michael McCandless (JIRA)
e at the right times. Vs the current approach that makes "faker" merge policy/scheduler (I think?). Some of this will require IW to open up some APIs -- eg making docID assignment a separate method call. Likely many of these will just be protected APIs w/in IW. > Pa

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-10 Thread Michael McCandless (JIRA)
TermsEnum. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: Ind

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-10 Thread Michael McCandless (JIRA)
pup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Af

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-10 Thread Michael McCandless (JIRA)
o let int block codecs provide direct access to the int[] they have (saves extra copy). * Down to 9 nocommits!! > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 &

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-08 Thread Michael McCandless (JIRA)
a large patch... > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: I

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-07 Thread Michael McCandless (JIRA)
(adding @Override to interface). > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement >

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-07 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2111: --- Attachment: LUCENE-2111.patch Down to 15 nocommits! > Wrapup flexible index

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-03-07 Thread Michael McCandless (JIRA)
ooks good Robert... thanks! > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Co

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-07 Thread Michael McCandless (JIRA)
, MTQ.getTermsEnum() to never return null, but IR.fields() and Fields.terms(String field), and .docs/.docsAndPositions can return null. Also whittled down more nocommits -- down to 53 now! > Wrapup flexible indexing > > > Key: LUCENE-2111 >

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-07 Thread Robert Muir (JIRA)
one. I think it looks kinda dumb but if its useful, I'll commit it. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Jav

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-06 Thread Robert Muir (JIRA)
compat, so null can have some other meaning. Instead it uses VirtualMethod, with the default implementatinos throwing UOE. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/bro

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-03-06 Thread Michael McCandless (JIRA)
y to determine that a codec does not store positions. Thinking more about this... I think we should switch back to a null return from .docsAndPositionsEnum if the codec doesn't support positions. We only return .EMPTY if the enum is really just empty. > Wrapup flexi

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-03-06 Thread Michael McCandless (JIRA)
against .EMPTY. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Compon

[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-03 Thread Shai Erera (JIRA)
t in time from the master's. * The current approach does not support multi-threaded indexing, but I think that's a limitation that could be solved by exposing some API on IW or DW. * Only SMS is supported on the slaves. * Optimize, expungeDeletes are unsupported. Though the could and perhaps ju

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-26 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2111: Attachment: flex_merge_916543.patch patch for review of flex merge > Wrapup flexible index

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-26 Thread Robert Muir (JIRA)
> Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Af

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-25 Thread Michael McCandless (JIRA)
(instead return .EMPTY objects). > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement >

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-25 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2111: Attachment: LUCENE-2111.patch a few more easy nocommits > Wrapup flexible index

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-24 Thread Robert Muir (JIRA)
at the backwards tests now > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement >

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-24 Thread Michael McCandless (JIRA)
renaming BytesRef.toString -> BytesRef.utf8ToString. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java >

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-24 Thread Robert Muir (JIRA)
5791. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-24 Thread Robert Muir (JIRA)
to use @lucene.experimental i didnt mess with IndexFileNames as there is an open issue about it right now. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 >

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-23 Thread Robert Muir (JIRA)
tted in revision 915511 > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Co

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-23 Thread Michael McCandless (JIRA)
anks :) Why not just remove UTF8Result altogether? (ie don't bother deprecating). It's an internal API... The new method to compute hash is great, saving the extra pass in THPF. > Wrapup flexible indexing > > > Key: LUCENE-2111 &g

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-23 Thread Robert Muir (JIRA)
from a StringBuilder if you want) there are some breaks (e.g. binary api compat), but its an internal api. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 >

[jira] Issue Comment Edited: (LUCENE-2246) While indexing Turkish web pages, "Parse Aborted: Lexical error...." occurs

2010-02-18 Thread Petr Nekvinda (JIRA)
UNICODE_ESCAPE=true;" (without quotes) into options in HTMLParser.jj and regenrate lexical analyzer (javacc HTMLParser.jj). That solve problem for me. > While indexing Turkish web pages, "Parse Ab

[jira] Issue Comment Edited: (LUCENE-2246) While indexing Turkish web pages, "Parse Aborted: Lexical error...." occurs

2010-02-18 Thread Petr Nekvinda (JIRA)
enrate lexical analyzer (javacc HTMLParser.jj). That solve problem for me. > While indexing Turkish web pages, "Parse Aborted: Lexical error" occurs > --- > > Key: LUCE

[jira] Commented: (LUCENE-2246) While indexing Turkish web pages, "Parse Aborted: Lexical error...." occurs

2010-02-18 Thread Petr Nekvinda (JIRA)
into options in HTMLParser.jj and regenrate lexical analyzer (javacc HTMLParser.jj). That solve problem for me. > While indexing Turkish web pages, "Parse Aborted: Lexical error" occurs > --- > >

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Robert Muir (JIRA)
ches from String to char[], which should improve perf. actually i didnt apply your LUCENE-2111 when running the benchmark (the improvement is simply the char[]). the test is now actually slightly slower now with the rest of LUCENE-2111 > Wrapup flexible i

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Michael McCandless (JIRA)
was faster before because the previous impl of Multi*Enums was using the same Docs/AndPositionsEnums before. This patch fixes that. Ahh, and also because your patch switches from String to char[], which should improve perf. Your patch looks good Robert! Thanks. > Wrapup flexible i

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Michael McCandless (JIRA)
Uwe -- thanks for re-merging! > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Co

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Uwe Schindler (JIRA)
singleton in TermsEnum class itsself. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Michael McCandless (JIRA)
ster before because the previous impl of Multi*Enums was using the same Docs/AndPositionsEnums before. This patch fixes that. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Uwe Schindler (JIRA)
flex trunk). > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Compon

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-10 Thread Uwe Schindler (JIRA)
ayer. > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: Index

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

2010-02-09 Thread Robert Muir (JIRA)
ived benchmark for LUCENE-2089, wierd that flex was slower than trunk before. numbers are stable across many iterations. ||unpatched flex||patched flex||trunk|| |4362ms|3239ms|3459ms| > Wrapup flexible indexing > > > Key: LUCENE-2111 >

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-09 Thread Robert Muir (JIRA)
: * remove synchronization (not necessary, history here: LUCENE-296) * reuse char[] rather than create Strings * remove unused ctors > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/bro

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-02-09 Thread Michael McCandless (JIRA)
sugar static methods on MultiFields (eg, MultiFields.getFields(IndexReader)) to easily do this, and cutover places in Lucene that may need direct postings from a multi-reader to use this method. I've updated the javadocs explaining this. > Wrapup flexible indexing > -

[jira] Created: (LUCENE-2246) While indexing Turkish web pages, "Parse Aborted: Lexical error...." occurs

2010-01-31 Thread Selim Nadi (JIRA)
While indexing Turkish web pages, "Parse Aborted: Lexical error" occurs --- Key: LUCENE-2246 URL: https://issues.apache.org/jira/browse/LUCENE-2246 Project: Luc

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-01-28 Thread Michael McCandless (JIRA)
e all fields share the 1024 sized cache) I cutover all codecs to the new API... all tests pass if you switch the default codec (in oal.index.codec.Codecs.getWriter) to any of the four. > Wrapup flexible indexing > > > Key: LUCENE-211

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-01-19 Thread Michael McCandless (JIRA)
add > Wrapup flexible indexing > > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: Index

[jira] Updated: (LUCENE-2111) Wrapup flexible indexing

2010-01-17 Thread Michael McCandless (JIRA)
-> oal.util.BytesRef. I think, eventually, we should fix the various places that refer to byte slices, eg Field.get/setBinary*, Payload, UnicodeUtil.UTF8Result, IndexOutput.writeBytes, IndexInput.readBytes, to use BytesRef instead. > Wrapup flexible in

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

2010-01-15 Thread Sanne Grinovero
he Lucene index from an Hibernate mapped database. While I recommend reading for newcomers, I'd also appreciate feedback and comments from Lucene experts and developers :-) Regards, Sanne 2010/1/14 Michael McCandless : > Calling commit after every addition will drastically slow down your >

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

2010-01-14 Thread Michael McCandless
Calling commit after every addition will drastically slow down your indexing throughput, and concurrency (commit is internally synchronized), but should not create lock timeouts, unless you are also opening a new IndexWriter for every addition? Mike On Thu, Jan 14, 2010 at 12:15 PM, jchang

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

2010-01-14 Thread jchang
With only 10 concurrent consumers, I do get lock problems. However, I am calling commit() at the end of each addition. Could I expect better concurrency without timeouts if I did not commit as often? -- View this message in context: http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-13 Thread Michael McCandless
Lucene (or actually Lucene + > Compass), not Zoie at the moment.  For some of the responses, I'm not clear > if the information applies to Zoie specifically, or also to straight Lucene. > -- > View this message in context: > http://old.nabble.com/Lucene-2.9.0-Near-Real-Tim

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-13 Thread Michael McCandless
IndexWriter should show good concurrency, ie, as you add threads you should see indexing speedup, assuming you have no external synchronization, your hardware has free concurrency and you use a large enough RAM buffer, and don't commit too frequently. But you should use a single IndexW

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-13 Thread jchang
d updates in a crash? BTW, I'm using straight Lucene (or actually Lucene + Compass), not Zoie at the moment. For some of the responses, I'm not clear if the information applies to Zoie specifically, or also to straight Lucene. -- View this message in context: http://old.nabble.com/L

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-13 Thread jchang
disk alongside the Lucene index which it uses to decide > where it must reindex from to "catch up" if it there have been incoming > indexing events while the server was out of commission. > > Zoie does not support multiple servers using the same index, because each >

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-13 Thread Michael McCandless
en to reopen the reader" (freshness). How often to commit is entirely a safety vs performance tradeoff, to be made by the app. Commit often and you lose (have to replay) very little on crash, but, have worse indexing throughput. How often to reopen is a freshness vs performance tradeoff. The

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-13 Thread John Wang
"NRT reader "simply" lets you search the full index, including un-committed changes." I am not sure I understand: I think the context of the discussion is for when the indexer crashes before IW.commit. At which point, does not really matter if you are using NRT, e.g. IW.getReader, or IndexReader.

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-13 Thread Michael McCandless
On Tue, Jan 12, 2010 at 6:10 PM, jchang wrote: > Does anybody know how this works out with service restarts (both orderly > shutdown and a crash)?  If the service goes down while indexed items are in > RAMDir but not on disk, are they lost?  Or is there some kind of log > recovery? Lucene expose

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

2010-01-12 Thread Jake Mannix
replicate, and handle server crashes (as well as doing background batch indexing followed with incremental realtime catchup). > The created_at column for near realtime seems like it could hurt > the database due to excessive polling? Has anyone tried it yet? > I haven't tried it i

  1   2   3   4   5   6   7   8   9   10   >