[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-2111.
Resolution: Fixed
> Wrapup flexible index
ce Norskog wrote:
>>>
>>> Here is a Java unit test that uses the LogByteSizeMergePolicy to
>>> control the maximum size of segment files during indexing. That is, it
>>> tries. It does not succeed. Will someone who truly understands the
>>> merge policy c
a unit test that uses the LogByteSizeMergePolicy to
>> control the maximum size of segment files during indexing. That is, it
>> tries. It does not succeed. Will someone who truly understands the
>> merge policy code please examine it. There is probably one tiny
>> paramet
too :). At least the one I received.
But never mind that ... as long as we both agree the implementation should
change. I didn't mean to say anything bad about what you did .. I know the
limitations you had to work with.
> Parallel incremen
an upgraded version that works with 3.0 within IBM,
Shai.
> Parallel incremental indexing
> -
>
> Key: LUCENE-1879
> URL: https://issues.apache.org/jira/browse/LUCENE-1879
> Project: Lucene - Java
>
smaller than maxMergeMB that create a segment larger than
maxMergeMB.
--
- Mark
http://www.lucidimagination.com
On 04/09/2010 01:01 AM, Lance Norskog wrote:
Here is a Java unit test that uses the LogByteSizeMergePolicy to
control the maximum size of segment files during indexing. That is, it
olicy to
> control the maximum size of segment files during indexing. That is, it
> tries. It does not succeed. Will someone who truly understands the
> merge policy code please examine it. There is probably one tiny
> parameter missing.
>
> It adds 20 documents that each are 100k
Here is a Java unit test that uses the LogByteSizeMergePolicy to
control the maximum size of segment files during indexing. That is, it
tries. It does not succeed. Will someone who truly understands the
merge policy code please examine it. There is probably one tiny
parameter missing.
It adds 20
any infostream output when it had gone OOM. I see the same
exception trace that I posted above. Do you think I missed anything?
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org
ream output shows no exception... I'm confused.
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
> Project: Lucene - Java
m on all writers.
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
> Project: Lucene - Java
> Issue Type: Bug
>
oryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
> Project: Lucene - Java
> Issue Type: Bug
> Components: I
1371 _9yq:c42466
_cml:c45324 _fgk:c46186 _i63:c45931 _kz8:c46568 _ntr:c47071 _qiq:c46080
_qin:c18 _qip:c1 _qir:c4 _qit:c19 _qiv:c1 _qix:c51 _qiz:c2
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
>
ways reproducible, I will post the infostream output
once I get it reproduced again.
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
>
ways 100?
Why are you using NoLockFactory? That's generally very dangerous.
Also, why do you call FSDir's setReadChunkSizeMB?
Can you post the infoStream output?
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-
insertionTime = MasterClock.currentTimeMillis() - startTime;
}
We do not set any RAMBufferSize in this case.
Please let me know if you need anything on this.
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCE
ow if you need anything on this.
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
> Project: Lucene - Java
> Issue Typ
y?
>
> Can you enable IndexWriter's infoStream and post the output?
>
> > OutOfMemoryException while Indexing
> > ---
> >
> > Key: LUCENE-2361
> > URL: https://issues.apache.org/jira/browse/
s on how you're using Lucene? What's IndexWriters
ramBufferSizeMB? How much heap are you giving the JVM? Are you calling commit
(or closing/opening a new IndexWriter) frequently or rarely?
Can you enable IndexWriter's infoStream and post the output?
> OutOfMemoryEx
the same in other cases too. Basically, We have load that
continuously pushes specific objects and we index each incoming object. . If
we do not put that load then it is working fine.
> OutOfMemoryException while Indexing
> ---
>
>
tion always have this stacktrace? Maybe someone
else litters in PermGen?
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
>
7 AM:
I am sure we have less than 20 fields in the document. Do you think it still
causes this issue?
was (Author: shivenderd):
I am sure we have less than 20 fields in the index
> OutOfMemoryException while I
less than 20 fields in the index
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
> Project: Lucene - Java
> Issue Type:
I tested String.intern(), it failed to cause
OOMs being bombarded by random strings, like it did in java 1.4.something.
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org
s it happens without the profiler also.
Thanks
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
> Project: Lucene - Java
>
iler? Does that happen without it?
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
> Project: Lucene - Java
>
that?
> OutOfMemoryException while Indexing
> ---
>
> Key: LUCENE-2361
> URL: https://issues.apache.org/jira/browse/LUCENE-2361
> Project: Lucene - Java
> Issue Type: Bug
> Components: Index
OutOfMemoryException while Indexing
---
Key: LUCENE-2361
URL: https://issues.apache.org/jira/browse/LUCENE-2361
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: 2.9.1
andle > 2B terms; I also
strengthened CheckIndex to verify that .ord() of the TermsEnum always returns
the right result, for codecs that implement .ord. I'll commit shortly...
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
>
rms -- attached patch creates such an
index. I'm still getting to the bottom of it...
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene -
problem around (you pass a DFA to the codec
and it does the intersection & enums the result), and we used byte-based DFAs,
I think we'd get a good speedup.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL:
s me -- that can't be a terms dict thing (just one lookup); i'm
not sure offhand why it's faster. That code is not very different than trunk.
bq. How's the indexing performance?
Unchanged -- I indexed first 10M docs of wikipedia and the times were nearly
identical.
&g
de today.
All this being said, I think flex is a great move forward for multitermqueries,
at least
we have a seeking-friendly API! One step at a time.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.
some work! What changes make those queries run faster with the default
codec? Mostly terms dict changes and automaton for fuzzy/wildcard?
How's the indexing performance?
bq. I think net/net we are good to land flex!
+1! Even if there are still small things to change/fix I think it makes
exc if it's run on a
field that omitTFAPs (matches PhraseQuery), fixes all jdoc warnings, spells out
back compat breaks in changes.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jir
flex!
+1. The tests have been passing for some time now, and Solr tests pass too.
It would be nice to look at merging flex into the trunk soon so that it gets
more exposure.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
>
;s
followed by a suffix).
* Flex API on a trunk index does take a perf hit but it looks contained enough
that we don't need to spend any time optimizing that emulation layer...
I also ran an indexing test (index first 10M docs of wikipedia) and
flex a
the intent, and was likely further confused
due to the fact that Michael B and I discussed it over tasty Belgian Beer in
Oakland. I'll open a discussion on list for incremental field updates.
> Parallel incremental indexing
> -
>
>
oot/src/flex.clean/contrib/benchmark/logs/flexOnTrunk.1
25.87 QPS [-58.6% worse]
run flex on flex index...
cd /root/src/flex.clean/contrib/benchmark
log: /root/src/flex.clean/contrib/benchmark/logs/flexOnFlex.2
39.30 QPS [-37.1% worse]
124623 hits
{code}
Other queries I&
is interesting, and
deserves discussion, but in a separate issue/thread?
> Parallel incremental indexing
> -
>
> Key: LUCENE-1879
> URL: https://issues.apache.org/jira/browse/LUCENE-1879
> Project: Lucene - Ja
t more than a few thousand
changes in a minute or two and the background merger would be responsible for
keeping the total number of disjoint documents low.
> Parallel incremental indexing
> -
>
> Key: LUCENE-1879
> URL: https://i
aded indexing is to do a two-phase
addDocument. First, allocate a doc ID from DocumentsWriter (synchronized) and
then add the Document to each Slice with that doc ID. DocumentsWriter was not
suppose to know it is a parallel index ... something like the following.
{code}
int docId = obtainDocId();
port multi-threaded parallel-indexing. If we
have single-threaded DocumentsWriters, then it should be easy to have a
ParallelDocumentsWriter?
> Parallel incremental indexing
> -
>
> Key: LUCENE-1879
> URL: https://issues.apa
on flex branch! I turned
most of them into TODOs :)
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Is
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reopened LUCENE-2111:
Duh -- wrong issue! I only wish ;)
> Wrapup flexible index
e.apache.org
> Subject: [jira] Resolved: (LUCENE-2111) Wrapup flexible indexing
>
>
> [ https://issues.apache.org/jira/browse/LUCENE-
> 2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
>
> Mic
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-2111.
Resolution: Fixed
Thanks Shai!
> Wrapup flexible index
e at
the right times. Vs the current approach that makes "faker" merge
policy/scheduler (I think?).
Some of this will require IW to open up some APIs -- eg making docID
assignment a separate method call. Likely many of these will just be
protected APIs w/in IW.
> Pa
TermsEnum.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Ind
pup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
>Af
o
let int block codecs provide direct access to the int[] they have
(saves extra copy).
* Down to 9 nocommits!!
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
&
a
large patch...
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Components: I
(adding @Override to
interface).
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
>
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-2111:
---
Attachment: LUCENE-2111.patch
Down to 15 nocommits!
> Wrapup flexible index
ooks good Robert... thanks!
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Co
, MTQ.getTermsEnum() to never return
null, but IR.fields() and Fields.terms(String field), and
.docs/.docsAndPositions can return null.
Also whittled down more nocommits -- down to 53 now!
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
>
one.
I think it looks kinda dumb but if its useful, I'll commit it.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Jav
compat,
so null can have some other meaning. Instead it uses VirtualMethod,
with the default implementatinos throwing UOE.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/bro
y to determine that a codec does not store positions.
Thinking more about this... I think we should switch back to a null return from
.docsAndPositionsEnum if the codec doesn't support positions. We only return
.EMPTY if the enum is really just empty.
> Wrapup flexi
against .EMPTY.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Compon
t in time from the master's.
* The current approach does not support multi-threaded indexing, but I think
that's a limitation that could be solved by exposing some API on IW or DW.
* Only SMS is supported on the slaves.
* Optimize, expungeDeletes are unsupported. Though the could and perhaps ju
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2111:
Attachment: flex_merge_916543.patch
patch for review of flex merge
> Wrapup flexible index
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
>Af
(instead return .EMPTY
objects).
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
>
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2111:
Attachment: LUCENE-2111.patch
a few more easy nocommits
> Wrapup flexible index
at the backwards tests now
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
>
renaming BytesRef.toString ->
BytesRef.utf8ToString.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
>
5791.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
>
to use
@lucene.experimental
i didnt mess with IndexFileNames as there is an open issue about it right now.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
>
tted in revision 915511
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Co
anks :)
Why not just remove UTF8Result altogether? (ie don't bother deprecating).
It's an internal API...
The new method to compute hash is great, saving the extra pass in THPF.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
&g
from a StringBuilder if you want)
there are some breaks (e.g. binary api compat), but its an internal api.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
>
UNICODE_ESCAPE=true;" (without quotes) into options in
HTMLParser.jj and regenrate lexical analyzer (javacc HTMLParser.jj). That solve
problem for me.
> While indexing Turkish web pages, "Parse Ab
enrate lexical analyzer (javacc HTMLParser.jj). That solve problem for me.
> While indexing Turkish web pages, "Parse Aborted: Lexical error" occurs
> ---
>
> Key: LUCE
into options in HTMLParser.jj and regenrate
lexical analyzer (javacc HTMLParser.jj). That solve problem for me.
> While indexing Turkish web pages, "Parse Aborted: Lexical error" occurs
> ---
>
>
ches from String to char[], which
should improve perf.
actually i didnt apply your LUCENE-2111 when running the benchmark (the
improvement is simply the char[]).
the test is now actually slightly slower now with the rest of LUCENE-2111
> Wrapup flexible i
was faster before because the previous impl of
Multi*Enums was using the same Docs/AndPositionsEnums before. This patch fixes
that.
Ahh, and also because your patch switches from String to char[], which should
improve perf.
Your patch looks good Robert! Thanks.
> Wrapup flexible i
Uwe -- thanks for re-merging!
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Co
singleton in TermsEnum class
itsself.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
ster before because the previous impl of Multi*Enums
was using the same Docs/AndPositionsEnums before. This patch fixes that.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira
flex trunk).
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Compon
ayer.
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
ived benchmark for LUCENE-2089, wierd
that flex was slower than trunk before.
numbers are stable across many iterations.
||unpatched flex||patched flex||trunk||
|4362ms|3239ms|3459ms|
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
>
:
* remove synchronization (not necessary, history here: LUCENE-296)
* reuse char[] rather than create Strings
* remove unused ctors
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/bro
sugar static methods on
MultiFields (eg, MultiFields.getFields(IndexReader)) to easily do
this, and cutover places in Lucene that may need direct postings from
a multi-reader to use this method.
I've updated the javadocs explaining this.
> Wrapup flexible indexing
> -
While indexing Turkish web pages, "Parse Aborted: Lexical error" occurs
---
Key: LUCENE-2246
URL: https://issues.apache.org/jira/browse/LUCENE-2246
Project: Luc
e all fields share the 1024 sized cache)
I cutover all codecs to the new API... all tests pass if you switch
the default codec (in oal.index.codec.Codecs.getWriter) to any of the
four.
> Wrapup flexible indexing
>
>
> Key: LUCENE-211
add
> Wrapup flexible indexing
>
>
> Key: LUCENE-2111
> URL: https://issues.apache.org/jira/browse/LUCENE-2111
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
-> oal.util.BytesRef.
I think, eventually, we should fix the various places that refer to byte
slices, eg Field.get/setBinary*, Payload, UnicodeUtil.UTF8Result,
IndexOutput.writeBytes, IndexInput.readBytes, to use BytesRef instead.
> Wrapup flexible in
he Lucene index from an Hibernate mapped database.
While I recommend reading for newcomers, I'd also appreciate feedback
and comments from Lucene experts and developers :-)
Regards,
Sanne
2010/1/14 Michael McCandless :
> Calling commit after every addition will drastically slow down your
>
Calling commit after every addition will drastically slow down your
indexing throughput, and concurrency (commit is internally
synchronized), but should not create lock timeouts, unless you are
also opening a new IndexWriter for every addition?
Mike
On Thu, Jan 14, 2010 at 12:15 PM, jchang
With only 10 concurrent consumers, I do get lock problems. However, I am
calling commit() at the end of each addition. Could I expect better
concurrency without timeouts if I did not commit as often?
--
View this message in context:
http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing
Lucene (or actually Lucene +
> Compass), not Zoie at the moment. For some of the responses, I'm not clear
> if the information applies to Zoie specifically, or also to straight Lucene.
> --
> View this message in context:
> http://old.nabble.com/Lucene-2.9.0-Near-Real-Tim
IndexWriter should show good concurrency, ie, as you add threads you
should see indexing speedup, assuming you have no external
synchronization, your hardware has free concurrency and you use a
large enough RAM buffer, and don't commit too frequently.
But you should use a single IndexW
d
updates in a crash? BTW, I'm using straight Lucene (or actually Lucene +
Compass), not Zoie at the moment. For some of the responses, I'm not clear
if the information applies to Zoie specifically, or also to straight Lucene.
--
View this message in context:
http://old.nabble.com/L
disk alongside the Lucene index which it uses to decide
> where it must reindex from to "catch up" if it there have been incoming
> indexing events while the server was out of commission.
>
> Zoie does not support multiple servers using the same index, because each
>
en to reopen the reader" (freshness).
How often to commit is entirely a safety vs performance tradeoff, to
be made by the app. Commit often and you lose (have to replay) very
little on crash, but, have worse indexing throughput.
How often to reopen is a freshness vs performance tradeoff.
The
"NRT reader "simply" lets you search the full index, including
un-committed changes."
I am not sure I understand:
I think the context of the discussion is for when the indexer crashes before
IW.commit. At which point, does not really matter if you are using NRT, e.g.
IW.getReader, or IndexReader.
On Tue, Jan 12, 2010 at 6:10 PM, jchang wrote:
> Does anybody know how this works out with service restarts (both orderly
> shutdown and a crash)? If the service goes down while indexed items are in
> RAMDir but not on disk, are they lost? Or is there some kind of log
> recovery?
Lucene expose
replicate, and handle server crashes
(as well as
doing background batch indexing followed with incremental realtime catchup).
> The created_at column for near realtime seems like it could hurt
> the database due to excessive polling? Has anyone tried it yet?
>
I haven't tried it i
1 - 100 of 924 matches
Mail list logo