> of course I will distribute my index over many machines:
> store everything on
> one computer is just crazy, 1.4B docs is going to be an index
> of almost 2T
> (in my case)
billion = giga in english
billion = tera in non-english
2T docs = 2.000.000.000.000 docs... ;)
AFAIK 2 ^ 32 - 1 docs is
> > How can I open it "readonly"?
>
> See the javadocs for IndexReader.
I did it already for 2.3 - cannot find readonly
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-use
> Are you opening your IndexReader with readOnly=true? If not, you're
> likely hitting contention on the "isDeleted" method.
How can I open it "readonly"?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For ad
> It's the similarity scoring formula. EG see here:
>
>http://lucene.apache.org/java/2_4_0/scoring.html
>
> and here:
>
>
> http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene
> /search/Similarity.html
OK; thank you
-
> I think for "ordinary" Lucene queries, "score" and "relevance" mean
> the same thing.
>
> But if you do eg function queries, or you "mixin" recency into your
> scoring, etc., then "score" could be anything you computed, a value
> from a field, etc.
Hm, how is relevance then defined?
---
Hi,
When I say: sorted by relevance or sorted by score -
are relevance and score synonym for each other or what is the difference in
relation to sorting?
Thank you
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apach
> Yes. DBSight helps to flatten database objects into Lucene's
> documents.
OK, thx for the advice.
But back to my original question.
When I have to merge both resultsets, what is the best approach to do this?
-
To unsubscrib
> Actually you can use DBSight(disclaimer:I work on it) to
> collect the data
> and keep them in sync.
Hm... it fulltext-indexes a database?
It supports document content outside the database (custom crawler)?
What query-syntax it supports?
--
> Contrariwise, look for anything by Marcelo Ochoa on the user list
> about embedding Lucene in Oracle (which I confess I haven't looked
> into at all, but seems interesting).
I know this lucene-oracle text cartridge.
But my solution has to work with any of the big databases (MS, IBM, Oracle).
-
> I feel this may not be a good example.
It was a very simple example.
The real database query is very complex and joins serveral tables.
It would be an absolute nightmare to copy all these tables into lucene and
keep both in sync.
Hi,
what is the best approach to merge a database index with a lucene fulltext
index? Both databases store a unique ID per doc. This is the join criteria.
requirements:
* both resultsets may be very big (100.000 and much more)
* the merged resultset must be sorted by database index and/or releva
> > * How can a hit have a score of <=0?
>
> A function query, or a negative boost would do it.
Ah ok.
> Solr has always allowed all scores through w/o screening out <=0
Why?
-
To unsubscribe, e-mail: java-user-unsubscr...@lu
> That works fine, because hq.size() is still less than numHits. So
> nomatter what, the first numHits hits will be added to the queue.
>
> > public void collect(int doc, float score) {
> > 57 if (score > 0.0f) {
> > 59 if (hq.size() < numHits || score >= minScore) {
Oh damned... it'
Looking into TopDocCollector code, I have some questions:
* How can a hit have a score of <=0?
* What happens if the first hit has the highest score of all hits? It seems
that topDocs whould then contain only this doc!?
public void collect(int doc, float score) {
57 if (score > 0.0f) {
58
Hi,
what kind of fields loads IndexSearcher.Document doc(int i)? Only those with
Field.Store.YES?
I'm asking because I do not need to load the tokens - should I use a
FieldSelector or are these fields not loaded?
Thank you
-
To
> The HitCollector used will determine how things are ordered.
> In 2.4, the
> TopDocCollector will order by relevancy and the
> TopFieldDocCollector can
> order by
> relevancy, index order, or by field. Lucene delivers the hit
> ids to the
> HitCollector and it can order as it pleases.
So
Hi,
in what order does search(Query query, HitCollector results) return the
results? By relevance?
Thank you.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lu
Hi,
> You get one answer if each document is 1K, another if it's
> 1G. If you have 2 users or 10,000 users. If you require
> 100 queries/sec response time or 1 query can take 10
> seconds. If you require an update to the index every
> second or month...
Each doc has up to 10 A4 pages text.
There
Hi,
We have have an application which manages the data of multiple customers.
A customer can only search its own data, never the data of other customers.
So what is more efficent in respect of performance and resources:
One big single index filtered by an index field (customer-Id) or multiple
sm
> The fastest way to reconstruct the token
> stream would
> be to use the TermFreqVector but if you didn't store it at
> index time
> you would have traverse the inverted index using TermEnum and
> TermPositions in order to pick up the term values and
> positions. This
> can be a rather
Hi,
I have already indexed documents. I want to recombine them into new
documents. Is this possible without the original documents - only with the
index?
Example:
doc1, doc2, doc3 are indexed.
I want a new indexed doc4 which is indexed as if I had concatenated doc1,
doc2, doc3 into doc4 and then
The problem is the logical combination of documents in folders not of terms in
documents.
See original post.
Original-Nachricht
> Datum: Tue, 14 Oct 2008 16:29:15 +0530
> Von: "Ganesh" <[EMAIL PROTECTED]>
> An: java-user@lucene.apache.org
> Betreff: Re: Searching sets of documen
The folder name and the document name are stored for each document.
Original-Nachricht
> Datum: Tue, 14 Oct 2008 14:11:09 +0530
> Von: "Ganesh" <[EMAIL PROTECTED]>
> An: java-user@lucene.apache.org
> Betreff: Re: Searching sets of documents
> You should have stored the foldernam
The docs are already indexed.
> -Original Message-
> From: ??? [mailto:[EMAIL PROTECTED]
> Sent: Montag, 13. Oktober 2008 02:28
> To: java-user@lucene.apache.org
> Subject: Re: Searching sets of documents
>
> all folders which match "A AND Y", do you search for file name?
> If yes, A or
Hi,
I want to search for sets of documents. For instance I index some folders
with documents in it and now I do not want to find certain documents but
folders.
Sample:
folder A
doc 1, contains X, Y
doc 2, contains Y, Z
folder B
doc 3, contains X, Y
doc 4, contains A, Z
Now I want to fi
> This isn't quite true. If you open IndexWriter with autoCommit=false,
> then none of the changes you do with it will be visible to an
> IndexReader, even one reopened while IndexWriter is doing its work,
> until you close the IndexWriter.
Where are the docs for this transaction buffered?
> How about just copying and performing your indexing (or index write
> related)
> operations on the copy and then performing a rename operation followed by
> reopening of the index readers.
This is how we did it until now. But the indexes become bigger and bigger (50
GB and more) and so we are
Hi,
I have some questions about indexing:
1. Is it possible to open indexes with Multireader+IndexSearcher and add
documents to these indexes simultaneously?
2. Is it possible to open indexes with Multireader+IndexSearcher and
optimize these indexes simultaneously?
3. Is it possible to open index
> Even if they're in multiple indexes, the doc IDs being ints
> will still prevent
> it going past 2Gi unless you wrap your own framework around it.
Hm. Does this mean that a MultiReader has the int-limit too?
I thought that this limit applies to a single index only...
Yes of course, the answers to your questions are important too.
But no anwser at all until now :(
For me I can say (not production yet):
2 ID-Fields and one content field per doc. Seach on content field only.
Simple searches like "content:foo" or "content:foo*".
1,5 GB index per 1 million docs.
A
Hi,
I have some question about the index size on a single machine:
What is your biggest index you use in production?
Do you use MultiReader/Searcher?
What hardware do you need to serve it?
What kind of application is it?
Thank you.
--
> Right... but trust me, you really wouldn't want to. You need
> distributed search at that level anyway.
Hm, 2 billion small docs are not so much.
Why do I need distributed search and what exactly do you means with
distributed search? Multiple IndexSearchers? Multiple processes? Multiple
machin
Does this mean that I cannot search indexes with more than 2 billion docs at
all with a single IndexSearcher?
> -Original Message-
> From: Mark Miller [mailto:[EMAIL PROTECTED]
> Sent: Samstag, 8. März 2008 18:57
> To: java-user@lucene.apache.org
> Subject: Re: MultiSearcher to overcome
> With a commit after every add: (286 sec / 10,000 docs) 28.6 ms.
> With a commit after every 100 add: (12 sec / 10,000 docs) 1.2 ms.
> Only one commit: (8 sec / 10,000 docs) 0.8 ms.
Of couse. If you need so less time to create a document than a commit which
may take, lets say 10 - 500 ms, will s
> > With a commit after every add: 30 min.
> > With a commit after 100 add: 23 min.
> > Only one commit: 20 min.
>
> All of these times look pretty slow... perhaps lucene is not the
> bottleneck here?
Therefore I wrote:
"(including time to get the document from the archive)"
Not the absolute
> Since Lucene buffers in memory, you will always have the risk of
> losing recently added documents that haven't been flushed yet.
> Committing on every document would be too slow to be practical.
Well it is not sooo slw...
I have indexed 10.000 docs, resulting in 14 MB index. The index has
Hm, what exactly does NO_NORM mean?
Thank you
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
> If you want something from an index it has to be IN the
> index. So, store a
> summary field in each document and make sure that field is part of the
> query.
And how could one create automatically such a summary?
Taking the first 2 lines of a document makes not always much sense.
How does goog
> I don't think creating an IndexWriter is very expensive at all.
Ah ok. I tested it. Creating an IndexWriter on an index with 10.000 docs
(about 15 MB) takes about 200 ms.
This is a very cheap operation for me ;)
I only saw the many calls in init() which reads files and so on and
therefore I to
> > For what time is the 2.4 release planned?
>
> Not really sure at this point ...
Hm. Digging into IndexWriter#init it seems that this is a really expensive
operation and thus my self made "commit" too. Isn't it?
-
To unsubsc
> In 2.4, commit() sets the rollback point. So abort() will
> roll index
> back to the last time you called commit() (or to when the writer was
> opened if you haven't called commit).
>
> In 2.3, your only choice is to close & re-open the writer to reset
> the rollback point.
OK, thank yo
> Then, you can call close() to commit the changes to the index, or
> abort() to rollback the index to the starting state (when the writer
> was opened).
As I understand the docs, the index will get rolled back to the state as it
was when the index was opened.
How can I achieve a rollback which o
Hi,
is it possible to change the wildcard charaters which are used by
QueryParser?
Or do I have to replace them myself in the query string?
Thank you
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
> That will let you do it, be warned however there is most definitely a
> significant performance degradation associated with doing this.
Yes of course. Like in a relational database with a leading wildcard.
-
To unsubscribe, e
> 1) See setAllowLeadingWildcard in QP.
Oh damned... late in the evening ;)
Hm, just tested it:
Searching for "format" works.
Searching for "form*" works.
Searching for "*ormat" works NOT.
Confused again ;)
-
To unsubscribe,
Hi,
using WildcardQuery directly it is possible to search for suffixes like
"*foo".
The QueryParser throws an exception that this is not allowed in a
WildcardQuery.
Hm, now I'm confused ;)
How can I configure the QueryParser to allow a wildcard as first character?
Thank you
-
You can use Luke to rebuild the document. It will show you the terms of the
analyzed document, not the original content.
And this is what you want, if I understood you correctly.
> -Original Message-
> From: Itamar Syn-Hershko [mailto:[EMAIL PROTECTED]
> Sent: Freitag, 22. Februar 2008 1
Thank you.
> -Original Message-
> From: Shai Erera [mailto:[EMAIL PROTECTED]
> Sent: Donnerstag, 21. Februar 2008 14:11
> To: java-user@lucene.apache.org
> Subject: Re: How to construct a MultiReader?
>
> Hi
>
> You can use IndexReader.open() static method to open a reader over
> direc
Hi,
how can I construct a MultiReader?
There is only a constructor with an IndexReader-array. But IndexReader is
abstract and all other IndexReader-implementations also need an IndexReader
as constructor param.
Now I'm a bit confused...
I want to construct a MultiReader which reads multiple FDD
No ideas? :(
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Samstag, 16. Februar 2008 15:42
> To: java-user@lucene.apache.org
> Subject: Searching multiple indexes
>
> Hi,
>
> I have some questions about searching multiple indexes.
>
> 1. IndexSearche
Hi,
I have some questions about searching multiple indexes.
1. IndexSearcher with a MultiReader will search the indexes sequentially?
2. ParallelMultiSearcher searches in parallel. How is this done? One thread
per index? When will it return? When the slowest search is fineshed?
3. When I have t
> You need to watch both the positionincrementgap
> (which, as I remember, gets added for each new field of the
> same name you add to the document). Make it 0 rather than
> whatever it is currently. You may have to create a new analyzer
> by subclassing your favorite analyzer and overriding the
>
Well, it seems that this may be a solution for me too.
But I'm afraid that someone one day will change this string. And then my app
will not work anymore...
> -Original Message-
> From: Adrian Smith [mailto:[EMAIL PROTECTED]
> Sent: Freitag, 15. Februar 2008 13:02
> To: java-user@lucene
> > Document doc = new Document()
> > for (int i = 0; i < pages.length; i++) {
> > doc.add(new Field("text", pages[i], Field.Store.NO,
> > Field.Index.TOKENIZED));
> > doc.add(new Field("text", "$$", Field.Store.NO,
> > Field.Index.UN_TOKENIZED));
> > }
>
> UN_TOKENIZED. Nice idea!
> Document doc = new Document()
> for (int i = 0; i < pages.length; i++) {
> doc.add(new Field("text", pages[i], Field.Store.NO,
> Field.Index.TOKENIZED));
> doc.add(new Field("text", "$$", Field.Store.NO,
> Field.Index.UN_TOKENIZED));
> }
UN_TOKENIZED. Nice idea!
I will check this
> Why not just use ?
Because nearly every analyzer removes it (SimpleAnalyzer, German, Russian,
French...)
Just tested it with luke in the search dialog.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional comman
> Rather than index one doc per page, you could index a special
> token between pages. Say you index $ as the special
> token.
I have decided to use this version, but...
What token can I use? It must be a token which gets never removed by an
analyzer or altered in a way that it not uniqu
The metadata is quite offen altered and there are millions of documents.
Also document access is secured by complex sql statements which lucene might
not support.
So this is not an option I think.
> -Original Message-
> From: John Byrne [mailto:[EMAIL PROTECTED]
> Sent: Mittwoch, 13. Febr
Hi,
I have the following scenario:
RDBMS which contains the metadata for documents (ID, customer number,
doctype etc.).
Now I want to add fulltext search support.
So I will index the documents content in lucene and add the documents ID as
a stored field in lucene.
Now somebody wants to search l
OK, understood.
Maybe a little hint in the legend, like "Only for stored fields".
> -Original Message-
> From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
> Sent: Dienstag, 12. Februar 2008 19:13
> To: java-user@lucene.apache.org
> Subject: Re: Lukes document hitlist display
>
> [EMAIL PR
Hi,
using Luke 0.7.1.
The document hitlist has a column header ITSVop0LBC.
When I add a field like this:
new Field("CONTENT", contentReader, TermVector.WITH_OFFSETS)
Luke shows only "--". Why?
Shouldn't it be "IT-Vo-"?
Thank you
-
This would be really nice!
> -Original Message-
> From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
> Sent: Dienstag, 12. Februar 2008 16:41
> To: java-user@lucene.apache.org
> Subject: Re: TermPositionVector
>
> [EMAIL PROTECTED] wrote:
> > Hi,
> >
> > could somebody please explain wha
TermA TermB
TermA has position 0 and offset 0
TermB has position 1 and offset 6
Right?
> -Original Message-
> From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
> Sent: Dienstag, 12. Februar 2008 15:16
> To: java-user@lucene.apache.org
> Subject: Re: TermPositionVector
>
> Position is jus
Hi,
could somebody please explain what the difference between positions and
offsets is?
And: Is there a trick to show theses infos in luke?
Thank you.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail
Thank you.
So I will call flush in 2.3 (and may lose data when machine dies) and
commit() in 2.4+ (here a sync() will save the data).
> -Original Message-
> From: Michael McCandless [mailto:[EMAIL PROTECTED]
> Sent: Freitag, 8. Februar 2008 21:01
> To: java-user@lucene.apache.org
> Subjec
OK, so there is nothing in 2.3 besides IndexWriter.close to ensure that the
docs are written to disk and that the index will survive an application /
machine death?
> -Original Message-
> From: Michael McCandless [mailto:[EMAIL PROTECTED]
> Sent: Freitag, 8. Februar 2008 19:34
> To: java
Hi,
if I understand this property correctly every time the ram buffer is full it
gets automaticaly written to disk. Something like a commit in a database.
Thus if my application dies, all docs in the buffer get lost. Right?
If so, is there any event/callback etc. which informs my application that
OK, I will try it.
Thank you.
> -Original Message-
> From: Erick Erickson [mailto:[EMAIL PROTECTED]
> Sent: Freitag, 8. Februar 2008 14:25
> To: java-user@lucene.apache.org
> Subject: Re: Which analyzer
>
> WhitespaceAnalyzer should do the trick. Give it a try...
>
> My point was that
Hello,
lets say the document contains
01.02.1999
and
152,45
Then I want to search for:
01.02.1999 AND 152,45
01.02.1999
152,45
1999
152
Thank you.
> -Original Message-
> From: Erick Erickson [mailto:[EMAIL PROTECTED]
> Sent: Freitag, 8. Februar 2008 00:20
> To: java-user@lucene.apa
Hi,
I have a huge number of documents which contain mainly numbers and dates
(german format dd.MM.), like this:
Tgr. gilt ab 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99
01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 46X0 01
0480101080512070010
Gefahren
> > And how can I find the offsets of something like "foo bar"?
> I think
> > this
> > will get tokenized into 2 terms and thus I have no chance to find
> > it, right?
>
> I wouldn't say no chance... TermVectorMapper would be good
> for this,
> as you can watch the terms as they are being
> Also, search the archives for Term Vector, as you will find
> discussion
> of it there.
Ah I see, I need to cast it to TermPositionVector. OK.
> You may also, eventually, be interested in the new
> TermVectorMapper capabilities in 2.3 which should help speed up the
> processing of term
Sorry, this was a bit nonsense ;)
I store a document with a content field like this:
Document#add(new Field("content", someReader, TermVector.WITH_OFFSETS));
Later I search this document with an IndexSearcher and want the
TermPositions from this single document.
There is a IndexReader#termPosit
Hi,
how do I get the TermVector from a document which I have gotten from an
IndexSearcher via IndexSearcher#search(Query q).
Luke can do it, but I do not know how...
Thank you.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For a
> Or, you could just do things twice. That is, send your text through
> a TokenStream, then call next() and count. Then send it all
> through doc.add().
Hm.
This means read the content twice, doesn't matter using an own analyzer oder
overriding/wrapping the main analyzer.
Is there anywhere a hoo
OK, I will give this a try.
Now I have the problem that I do not know how to get the offsets (or
positions? What is the difference?) back from the searched document...
There is a IndexReader#termPositions (Term t) - but this returns the
positions for the whole index, not a single document.
> -
> -Original Message-
> From: Erick Erickson [mailto:[EMAIL PROTECTED]
> Sent: Freitag, 11. Januar 2008 16:16
> To: java-user@lucene.apache.org
> Subject: Re: Design questions
> But you could also vary this scheme by simply storing in your document
> the offsets for the beginning of each p
Yes, sorry, that's the case.
Thank you!
> -Original Message-
> From: Erick Erickson [mailto:[EMAIL PROTECTED]
> Sent: Donnerstag, 24. Januar 2008 19:49
> To: java-user@lucene.apache.org
> Subject: Re: Creating search query
>
> That should work fine, assuming that foo and bar are the un
Thank you.
> -Original Message-
> From: Lukas Vlcek [mailto:[EMAIL PROTECTED]
> Sent: Mittwoch, 23. Januar 2008 08:23
> To: java-user@lucene.apache.org
> Subject: Re: Compass
>
> Hi,
>
> I am using Compass with Spring and JPA. It works pretty nice.
&
Hi,
I have an index with some fields which are indexed and un_tokenized
(keywords) and one field which is indexed and tokenized (content).
Now I want to create a Query-Object:
TermQuery k1 = new TermQuery(new Term("foo", "some foo"));
TermQuery k2 = new TermQuery(new Term("bar",
Hi,
compass (http://www.opensymphony.com/compass/content/lucene.html) promisses
many nice things in my opinion.
Has anybody production experiences with it?
Especially Jdbc Directory and Updates?
Thank you.
-
To unsubscribe, e-
> Genau! Indices are simply merged on disk, their content is
> not re-analyzed.
Thank you!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
> A non-clustered and clustered index has resovle the problem,
> but Lucene can
> not do the same thing like that?
Well, I bet the database solution is the best, as long as you do not search
in big text fields or you need special fulltext features like fuzzy search
etc.
Synchronizing a lucene in
> I can use the cluster index on the table. But you can create only one
> cluster index in a table. In this table , lots of data need
> to search, so I
> choose the Lucene to do that.
Why do you need a clustered index in the database?
A non-clustered would do the job as well.
--
Hi,
looking into the code of IndexMergeTool I saw this:
IndexWriter writer = new IndexWriter(mergedIndex, new SimpleAnalyzer(),
true);
Then the indexes are added to this new index.
My question is:
How does the Analyzer of this IndexWriter instance effect the merge process?
It seems that is do
> firstly, I submit the query like "select * from [tablename]".
> And in this
> table, there are around 30th columns and 40,000 rows data.
> And I use the
> standrandAnalyzer to generate the index.
Why don't you use a database index?
-
> But it also seems that the parallel/not parallel decision is
> something you control on the back end, so I'm not sure the user
> is involved in the merge question at all. In other words, you could
> easily split the indexing task up amongst several machines and/or
> processes and combine all the
> Then why would you want to combine them?
>
> I really think you need to explain what you're trying to accomplish
> rather then obsess on the details.
I have to create indexes in parallel because the amount of data is very
high.
Then I want to merge them into bigger indexes an move them to the s
> You can answer an awful lot of this much faster than waiting
> for someone
> to reply by getting a copy of Luke and look at the parse results using
> various
> analyzers.
Ah cool, you mean the "explain structure" button.
> Try KeywordAnalyzer for your query.
>
> Combine queries programmatica
> I admit I've never used IndexMergeTool, I've always used
> IndexWriter.AddIndexex and then execute
> IndexWriter.optimize().
>
> And I've seen no problems. That call takes no
> analyzer.
So you take the first index an add a remaining indexes via addIndexes?
What happens if the indexes were crea
> The caution to use the same analyzer at index and query time is,
> in my experience, simply good advice to follow until you are
> familiar enough with how Lucene uses analyzers to keep from
> getting really, really, really confused. Once you understand
> when analyzers are used and how they effec
> > How can I search for fields stored with Field.Index.UN_TOKENIZED?
>
> Use TermQuery.
>
> > Why do I need an analyzer for searching?
>
> Consider a full-text field that will be tokenized removing special
> characters and lowercased, and then a user querying for an uppercase
> word. The
> OG: again, it depends. If the index you'd get by merging is
> of manageable size, then merge your indices.
OK, this is what I tought.
A single index should be faster than multiple indexes with a MultiSearcher,
right?
But what about the ParallelMultiSearcher? As I understand the docs it
searc
> See org.apache.lucene.misc.IndexMergeTool
Thank you.
But this uses a hardcoded analyzer and deprecated API-Calls.
How does the used analyzer effect the merge process?
Is everything reindexed with this new analyzer again? Does this make sense?
What if the sources indexes had other analyzers us
Hi,
is there any maximum size for an index?
Are there any recommendations for a useful max size?
I want to index in parallel. So I have to create multiple indexes.
Shall I merge them together or shall I let them as they are using
(Parallel)MultiSearcher?
Thank you.
---
> I think that method was renamed somewhere along the way to
> setMaxBufferedDocs.
>
> However, in 2.3 (to be released in a few weeks), it's better to use
> setRAMBufferSizeMB instead.
>
> For more ideas on speeding up indexing, look here:
>
> http://wiki.apache.org/lucene-java/ImproveI
Hi,
are there any ready to use tools out there which I can use for merging and
optimzing?
I have seen that Luke can optimize, but not merge?
Or do I have to write my own utility?
Thank you
-
To unsubscribe, e-mail: [EMAIL PRO
Hi,
I have some doubts about Analyzer usage. I read that one shall always use
the same analyzer for searching and indexing.
Why? How does the Analyzer effect the search process? What is analyzed here
again?
I have tried this out. I used a SimpleAnalyzer for indexing with
Field.Store.YES and Field
Hi,
http://wiki.apache.org/lucene-java/PainlessIndexing says that I shall use
setMinMergeDocs.
But I cannot find this method in lucene 2.2.
What is wrong here?
Thank you.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additi
OK, thank you! I will try this out.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
1 - 100 of 102 matches
Mail list logo