RE: best practice: 1.4 billions documents

2010-11-22 Thread spring
> of course I will distribute my index over many machines: > store everything on > one computer is just crazy, 1.4B docs is going to be an index > of almost 2T > (in my case) billion = giga in english billion = tera in non-english 2T docs = 2.000.000.000.000 docs... ;) AFAIK 2 ^ 32 - 1 docs is

RE: Unable to improve performance

2009-03-27 Thread spring
> > How can I open it "readonly"? > > See the javadocs for IndexReader. I did it already for 2.3 - cannot find readonly - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-use

RE: Unable to improve performance

2009-03-27 Thread spring
> Are you opening your IndexReader with readOnly=true? If not, you're > likely hitting contention on the "isDeleted" method. How can I open it "readonly"? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For ad

RE: relevance vs. score

2009-03-04 Thread spring
> It's the similarity scoring formula. EG see here: > >http://lucene.apache.org/java/2_4_0/scoring.html > > and here: > > > http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene > /search/Similarity.html OK; thank you -

RE: relevance vs. score

2009-03-04 Thread spring
> I think for "ordinary" Lucene queries, "score" and "relevance" mean > the same thing. > > But if you do eg function queries, or you "mixin" recency into your > scoring, etc., then "score" could be anything you computed, a value > from a field, etc. Hm, how is relevance then defined? ---

relevance vs. score

2009-03-04 Thread spring
Hi, When I say: sorted by relevance or sorted by score - are relevance and score synonym for each other or what is the difference in relation to sorting? Thank you - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apach

RE: Merging database index with fulltext index

2009-02-28 Thread spring
> Yes. DBSight helps to flatten database objects into Lucene's > documents. OK, thx for the advice. But back to my original question. When I have to merge both resultsets, what is the best approach to do this? - To unsubscrib

RE: Merging database index with fulltext index

2009-02-28 Thread spring
> Actually you can use DBSight(disclaimer:I work on it) to > collect the data > and keep them in sync. Hm... it fulltext-indexes a database? It supports document content outside the database (custom crawler)? What query-syntax it supports? --

RE: Merging database index with fulltext index

2009-02-28 Thread spring
> Contrariwise, look for anything by Marcelo Ochoa on the user list > about embedding Lucene in Oracle (which I confess I haven't looked > into at all, but seems interesting). I know this lucene-oracle text cartridge. But my solution has to work with any of the big databases (MS, IBM, Oracle). -

RE: Merging database index with fulltext index

2009-02-28 Thread spring
> I feel this may not be a good example. It was a very simple example. The real database query is very complex and joins serveral tables. It would be an absolute nightmare to copy all these tables into lucene and keep both in sync.

Merging database index with fulltext index

2009-02-28 Thread spring
Hi, what is the best approach to merge a database index with a lucene fulltext index? Both databases store a unique ID per doc. This is the join criteria. requirements: * both resultsets may be very big (100.000 and much more) * the merged resultset must be sorted by database index and/or releva

RE: TopDocCollector

2009-02-28 Thread spring
> > * How can a hit have a score of <=0? > > A function query, or a negative boost would do it. Ah ok. > Solr has always allowed all scores through w/o screening out <=0 Why? - To unsubscribe, e-mail: java-user-unsubscr...@lu

RE: TopDocCollector

2009-02-28 Thread spring
> That works fine, because hq.size() is still less than numHits. So > nomatter what, the first numHits hits will be added to the queue. > > > public void collect(int doc, float score) { > > 57 if (score > 0.0f) { > > 59 if (hq.size() < numHits || score >= minScore) { Oh damned... it'

TopDocCollector

2009-02-27 Thread spring
Looking into TopDocCollector code, I have some questions: * How can a hit have a score of <=0? * What happens if the first hit has the highest score of all hits? It seems that topDocs whould then contain only this doc!? public void collect(int doc, float score) { 57 if (score > 0.0f) { 58

FieldSelector

2009-02-16 Thread spring
Hi, what kind of fields loads IndexSearcher.Document doc(int i)? Only those with Field.Store.YES? I'm asking because I do not need to load the tokens - should I use a FieldSelector or are these fields not loaded? Thank you - To

RE: search(Query query, HitCollector results)

2009-02-15 Thread spring
> The HitCollector used will determine how things are ordered. > In 2.4, the > TopDocCollector will order by relevancy and the > TopFieldDocCollector can > order by > relevancy, index order, or by field. Lucene delivers the hit > ids to the > HitCollector and it can order as it pleases. So

search(Query query, HitCollector results)

2009-02-15 Thread spring
Hi, in what order does search(Query query, HitCollector results) return the results? By relevance? Thank you. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lu

RE: Multiple indexes vs single index

2009-02-14 Thread spring
Hi, > You get one answer if each document is 1K, another if it's > 1G. If you have 2 users or 10,000 users. If you require > 100 queries/sec response time or 1 query can take 10 > seconds. If you require an update to the index every > second or month... Each doc has up to 10 A4 pages text. There

Multiple indexes vs single index

2009-02-14 Thread spring
Hi, We have have an application which manages the data of multiple customers. A customer can only search its own data, never the data of other customers. So what is more efficent in respect of performance and resources: One big single index filtered by an index field (customer-Id) or multiple sm

RE: Re-combining already indexed documents

2009-01-03 Thread spring
> The fastest way to reconstruct the token > stream would > be to use the TermFreqVector but if you didn't store it at > index time > you would have traverse the inverted index using TermEnum and > TermPositions in order to pick up the term values and > positions. This > can be a rather

Re-combining already indexed documents

2009-01-02 Thread spring
Hi, I have already indexed documents. I want to recombine them into new documents. Is this possible without the original documents - only with the index? Example: doc1, doc2, doc3 are indexed. I want a new indexed doc4 which is indexed as if I had concatenated doc1, doc2, doc3 into doc4 and then

Re: Searching sets of documents

2008-10-14 Thread spring
The problem is the logical combination of documents in folders not of terms in documents. See original post. Original-Nachricht > Datum: Tue, 14 Oct 2008 16:29:15 +0530 > Von: "Ganesh" <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: Re: Searching sets of documen

Re: Searching sets of documents

2008-10-14 Thread spring
The folder name and the document name are stored for each document. Original-Nachricht > Datum: Tue, 14 Oct 2008 14:11:09 +0530 > Von: "Ganesh" <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: Re: Searching sets of documents > You should have stored the foldernam

RE: Searching sets of documents

2008-10-13 Thread spring
The docs are already indexed. > -Original Message- > From: ??? [mailto:[EMAIL PROTECTED] > Sent: Montag, 13. Oktober 2008 02:28 > To: java-user@lucene.apache.org > Subject: Re: Searching sets of documents > > all folders which match "A AND Y", do you search for file name? > If yes, A or

Searching sets of documents

2008-10-12 Thread spring
Hi, I want to search for sets of documents. For instance I index some folders with documents in it and now I do not want to find certain documents but folders. Sample: folder A doc 1, contains X, Y doc 2, contains Y, Z folder B doc 3, contains X, Y doc 4, contains A, Z Now I want to fi

Re: Indexing questions

2008-07-15 Thread spring
> This isn't quite true. If you open IndexWriter with autoCommit=false, > then none of the changes you do with it will be visible to an > IndexReader, even one reopened while IndexWriter is doing its work, > until you close the IndexWriter. Where are the docs for this transaction buffered?

Re: Indexing questions

2008-07-15 Thread spring
> How about just copying and performing your indexing (or index write > related) > operations on the copy and then performing a rename operation followed by > reopening of the index readers. This is how we did it until now. But the indexes become bigger and bigger (50 GB and more) and so we are

Indexing questions

2008-07-13 Thread spring
Hi, I have some questions about indexing: 1. Is it possible to open indexes with Multireader+IndexSearcher and add documents to these indexes simultaneously? 2. Is it possible to open indexes with Multireader+IndexSearcher and optimize these indexes simultaneously? 3. Is it possible to open index

RE: Does Lucene Supports Billions of data

2008-05-01 Thread spring
> Even if they're in multiple indexes, the doc IDs being ints > will still prevent > it going past 2Gi unless you wrap your own framework around it. Hm. Does this mean that a MultiReader has the int-limit too? I thought that this limit applies to a single index only...

RE: Biggest index

2008-03-14 Thread spring
Yes of course, the answers to your questions are important too. But no anwser at all until now :( For me I can say (not production yet): 2 ID-Fields and one content field per doc. Seach on content field only. Simple searches like "content:foo" or "content:foo*". 1,5 GB index per 1 million docs. A

Biggest index

2008-03-10 Thread spring
Hi, I have some question about the index size on a single machine: What is your biggest index you use in production? Do you use MultiReader/Searcher? What hardware do you need to serve it? What kind of application is it? Thank you. --

RE: MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-08 Thread spring
> Right... but trust me, you really wouldn't want to. You need > distributed search at that level anyway. Hm, 2 billion small docs are not so much. Why do I need distributed search and what exactly do you means with distributed search? Multiple IndexSearchers? Multiple processes? Multiple machin

RE: MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-08 Thread spring
Does this mean that I cannot search indexes with more than 2 billion docs at all with a single IndexSearcher? > -Original Message- > From: Mark Miller [mailto:[EMAIL PROTECTED] > Sent: Samstag, 8. März 2008 18:57 > To: java-user@lucene.apache.org > Subject: Re: MultiSearcher to overcome

RE: Swapping between indexes

2008-03-07 Thread spring
> With a commit after every add: (286 sec / 10,000 docs) 28.6 ms. > With a commit after every 100 add: (12 sec / 10,000 docs) 1.2 ms. > Only one commit: (8 sec / 10,000 docs) 0.8 ms. Of couse. If you need so less time to create a document than a commit which may take, lets say 10 - 500 ms, will s

RE: Swapping between indexes

2008-03-06 Thread spring
> > With a commit after every add: 30 min. > > With a commit after 100 add: 23 min. > > Only one commit: 20 min. > > All of these times look pretty slow... perhaps lucene is not the > bottleneck here? Therefore I wrote: "(including time to get the document from the archive)" Not the absolute

RE: Swapping between indexes

2008-03-06 Thread spring
> Since Lucene buffers in memory, you will always have the risk of > losing recently added documents that haven't been flushed yet. > Committing on every document would be too slow to be practical. Well it is not sooo slw... I have indexed 10.000 docs, resulting in 14 MB index. The index has

RE: NO_NORM and TOKENIZED

2008-03-05 Thread spring
Hm, what exactly does NO_NORM mean? Thank you - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: How do i get a text summary

2008-02-28 Thread spring
> If you want something from an index it has to be IN the > index. So, store a > summary field in each document and make sure that field is part of the > query. And how could one create automatically such a summary? Taking the first 2 lines of a document makes not always much sense. How does goog

RE: Transactions in Lucene

2008-02-25 Thread spring
> I don't think creating an IndexWriter is very expensive at all. Ah ok. I tested it. Creating an IndexWriter on an index with 10.000 docs (about 15 MB) takes about 200 ms. This is a very cheap operation for me ;) I only saw the many calls in init() which reads files and so on and therefore I to

RE: Transactions in Lucene

2008-02-25 Thread spring
> > For what time is the 2.4 release planned? > > Not really sure at this point ... Hm. Digging into IndexWriter#init it seems that this is a really expensive operation and thus my self made "commit" too. Isn't it? - To unsubsc

RE: Transactions in Lucene

2008-02-25 Thread spring
> In 2.4, commit() sets the rollback point. So abort() will > roll index > back to the last time you called commit() (or to when the writer was > opened if you haven't called commit). > > In 2.3, your only choice is to close & re-open the writer to reset > the rollback point. OK, thank yo

RE: Transactions in Lucene

2008-02-25 Thread spring
> Then, you can call close() to commit the changes to the index, or > abort() to rollback the index to the starting state (when the writer > was opened). As I understand the docs, the index will get rolled back to the state as it was when the index was opened. How can I achieve a rollback which o

Changing wildcard characters

2008-02-23 Thread spring
Hi, is it possible to change the wildcard charaters which are used by QueryParser? Or do I have to replace them myself in the query string? Thank you - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

RE: Suffix search

2008-02-22 Thread spring
> That will let you do it, be warned however there is most definitely a > significant performance degradation associated with doing this. Yes of course. Like in a relational database with a leading wildcard. - To unsubscribe, e

RE: Suffix search

2008-02-22 Thread spring
> 1) See setAllowLeadingWildcard in QP. Oh damned... late in the evening ;) Hm, just tested it: Searching for "format" works. Searching for "form*" works. Searching for "*ormat" works NOT. Confused again ;) - To unsubscribe,

Suffix search

2008-02-22 Thread spring
Hi, using WildcardQuery directly it is possible to search for suffixes like "*foo". The QueryParser throws an exception that this is not allowed in a WildcardQuery. Hm, now I'm confused ;) How can I configure the QueryParser to allow a wildcard as first character? Thank you -

RE: Rebuilding Document from index?

2008-02-22 Thread spring
You can use Luke to rebuild the document. It will show you the terms of the analyzed document, not the original content. And this is what you want, if I understood you correctly. > -Original Message- > From: Itamar Syn-Hershko [mailto:[EMAIL PROTECTED] > Sent: Freitag, 22. Februar 2008 1

RE: How to construct a MultiReader?

2008-02-21 Thread spring
Thank you. > -Original Message- > From: Shai Erera [mailto:[EMAIL PROTECTED] > Sent: Donnerstag, 21. Februar 2008 14:11 > To: java-user@lucene.apache.org > Subject: Re: How to construct a MultiReader? > > Hi > > You can use IndexReader.open() static method to open a reader over > direc

How to construct a MultiReader?

2008-02-21 Thread spring
Hi, how can I construct a MultiReader? There is only a constructor with an IndexReader-array. But IndexReader is abstract and all other IndexReader-implementations also need an IndexReader as constructor param. Now I'm a bit confused... I want to construct a MultiReader which reads multiple FDD

RE: Searching multiple indexes

2008-02-19 Thread spring
No ideas? :( > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Samstag, 16. Februar 2008 15:42 > To: java-user@lucene.apache.org > Subject: Searching multiple indexes > > Hi, > > I have some questions about searching multiple indexes. > > 1. IndexSearche

Searching multiple indexes

2008-02-16 Thread spring
Hi, I have some questions about searching multiple indexes. 1. IndexSearcher with a MultiReader will search the indexes sequentially? 2. ParallelMultiSearcher searches in parallel. How is this done? One thread per index? When will it return? When the slowest search is fineshed? 3. When I have t

RE: Design questions

2008-02-15 Thread spring
> You need to watch both the positionincrementgap > (which, as I remember, gets added for each new field of the > same name you add to the document). Make it 0 rather than > whatever it is currently. You may have to create a new analyzer > by subclassing your favorite analyzer and overriding the >

RE: Design questions

2008-02-15 Thread spring
Well, it seems that this may be a solution for me too. But I'm afraid that someone one day will change this string. And then my app will not work anymore... > -Original Message- > From: Adrian Smith [mailto:[EMAIL PROTECTED] > Sent: Freitag, 15. Februar 2008 13:02 > To: java-user@lucene

RE: Design questions

2008-02-15 Thread spring
> > Document doc = new Document() > > for (int i = 0; i < pages.length; i++) { > > doc.add(new Field("text", pages[i], Field.Store.NO, > > Field.Index.TOKENIZED)); > > doc.add(new Field("text", "$$", Field.Store.NO, > > Field.Index.UN_TOKENIZED)); > > } > > UN_TOKENIZED. Nice idea!

RE: Design questions

2008-02-15 Thread spring
> Document doc = new Document() > for (int i = 0; i < pages.length; i++) { > doc.add(new Field("text", pages[i], Field.Store.NO, > Field.Index.TOKENIZED)); > doc.add(new Field("text", "$$", Field.Store.NO, > Field.Index.UN_TOKENIZED)); > } UN_TOKENIZED. Nice idea! I will check this

RE: Design questions

2008-02-15 Thread spring
> Why not just use ? Because nearly every analyzer removes it (SimpleAnalyzer, German, Russian, French...) Just tested it with luke in the search dialog. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comman

RE: Design questions

2008-02-14 Thread spring
> Rather than index one doc per page, you could index a special > token between pages. Say you index $ as the special > token. I have decided to use this version, but... What token can I use? It must be a token which gets never removed by an analyzer or altered in a way that it not uniqu

RE: design: merging resultset from RDBMS with lucene search results

2008-02-13 Thread spring
The metadata is quite offen altered and there are millions of documents. Also document access is secured by complex sql statements which lucene might not support. So this is not an option I think. > -Original Message- > From: John Byrne [mailto:[EMAIL PROTECTED] > Sent: Mittwoch, 13. Febr

design: merging resultset from RDBMS with lucene search results

2008-02-13 Thread spring
Hi, I have the following scenario: RDBMS which contains the metadata for documents (ID, customer number, doctype etc.). Now I want to add fulltext search support. So I will index the documents content in lucene and add the documents ID as a stored field in lucene. Now somebody wants to search l

RE: Lukes document hitlist display

2008-02-12 Thread spring
OK, understood. Maybe a little hint in the legend, like "Only for stored fields". > -Original Message- > From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] > Sent: Dienstag, 12. Februar 2008 19:13 > To: java-user@lucene.apache.org > Subject: Re: Lukes document hitlist display > > [EMAIL PR

Lukes document hitlist display

2008-02-12 Thread spring
Hi, using Luke 0.7.1. The document hitlist has a column header ITSVop0LBC. When I add a field like this: new Field("CONTENT", contentReader, TermVector.WITH_OFFSETS) Luke shows only "--". Why? Shouldn't it be "IT-Vo-"? Thank you -

RE: TermPositionVector

2008-02-12 Thread spring
This would be really nice! > -Original Message- > From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] > Sent: Dienstag, 12. Februar 2008 16:41 > To: java-user@lucene.apache.org > Subject: Re: TermPositionVector > > [EMAIL PROTECTED] wrote: > > Hi, > > > > could somebody please explain wha

RE: TermPositionVector

2008-02-12 Thread spring
TermA TermB TermA has position 0 and offset 0 TermB has position 1 and offset 6 Right? > -Original Message- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] > Sent: Dienstag, 12. Februar 2008 15:16 > To: java-user@lucene.apache.org > Subject: Re: TermPositionVector > > Position is jus

TermPositionVector

2008-02-12 Thread spring
Hi, could somebody please explain what the difference between positions and offsets is? And: Is there a trick to show theses infos in luke? Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail

RE: IndexWriter: setRAMBufferSizeMB

2008-02-10 Thread spring
Thank you. So I will call flush in 2.3 (and may lose data when machine dies) and commit() in 2.4+ (here a sync() will save the data). > -Original Message- > From: Michael McCandless [mailto:[EMAIL PROTECTED] > Sent: Freitag, 8. Februar 2008 21:01 > To: java-user@lucene.apache.org > Subjec

RE: IndexWriter: setRAMBufferSizeMB

2008-02-08 Thread spring
OK, so there is nothing in 2.3 besides IndexWriter.close to ensure that the docs are written to disk and that the index will survive an application / machine death? > -Original Message- > From: Michael McCandless [mailto:[EMAIL PROTECTED] > Sent: Freitag, 8. Februar 2008 19:34 > To: java

IndexWriter: setRAMBufferSizeMB

2008-02-08 Thread spring
Hi, if I understand this property correctly every time the ram buffer is full it gets automaticaly written to disk. Something like a commit in a database. Thus if my application dies, all docs in the buffer get lost. Right? If so, is there any event/callback etc. which informs my application that

RE: Which analyzer

2008-02-08 Thread spring
OK, I will try it. Thank you. > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Freitag, 8. Februar 2008 14:25 > To: java-user@lucene.apache.org > Subject: Re: Which analyzer > > WhitespaceAnalyzer should do the trick. Give it a try... > > My point was that

RE: Which analyzer

2008-02-08 Thread spring
Hello, lets say the document contains 01.02.1999 and 152,45 Then I want to search for: 01.02.1999 AND 152,45 01.02.1999 152,45 1999 152 Thank you. > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Freitag, 8. Februar 2008 00:20 > To: java-user@lucene.apa

Which analyzer

2008-02-07 Thread spring
Hi, I have a huge number of documents which contain mainly numbers and dates (german format dd.MM.), like this: Tgr. gilt ab 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 01.01.99 46X0 01 0480101080512070010 Gefahren

RE: TermVector

2008-01-29 Thread spring
> > And how can I find the offsets of something like "foo bar"? > I think > > this > > will get tokenized into 2 terms and thus I have no chance to find > > it, right? > > I wouldn't say no chance... TermVectorMapper would be good > for this, > as you can watch the terms as they are being

RE: TermVector

2008-01-28 Thread spring
> Also, search the archives for Term Vector, as you will find > discussion > of it there. Ah I see, I need to cast it to TermPositionVector. OK. > You may also, eventually, be interested in the new > TermVectorMapper capabilities in 2.3 which should help speed up the > processing of term

RE: TermVector

2008-01-28 Thread spring
Sorry, this was a bit nonsense ;) I store a document with a content field like this: Document#add(new Field("content", someReader, TermVector.WITH_OFFSETS)); Later I search this document with an IndexSearcher and want the TermPositions from this single document. There is a IndexReader#termPosit

TermVector

2008-01-28 Thread spring
Hi, how do I get the TermVector from a document which I have gotten from an IndexSearcher via IndexSearcher#search(Query q). Luke can do it, but I do not know how... Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For a

RE: Design questions

2008-01-24 Thread spring
> Or, you could just do things twice. That is, send your text through > a TokenStream, then call next() and count. Then send it all > through doc.add(). Hm. This means read the content twice, doesn't matter using an own analyzer oder overriding/wrapping the main analyzer. Is there anywhere a hoo

RE: Design questions

2008-01-24 Thread spring
OK, I will give this a try. Now I have the problem that I do not know how to get the offsets (or positions? What is the difference?) back from the searched document... There is a IndexReader#termPositions (Term t) - but this returns the positions for the whole index, not a single document. > -

RE: Design questions

2008-01-24 Thread spring
> -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Freitag, 11. Januar 2008 16:16 > To: java-user@lucene.apache.org > Subject: Re: Design questions > But you could also vary this scheme by simply storing in your document > the offsets for the beginning of each p

RE: Creating search query

2008-01-24 Thread spring
Yes, sorry, that's the case. Thank you! > -Original Message- > From: Erick Erickson [mailto:[EMAIL PROTECTED] > Sent: Donnerstag, 24. Januar 2008 19:49 > To: java-user@lucene.apache.org > Subject: Re: Creating search query > > That should work fine, assuming that foo and bar are the un

RE: Compass

2008-01-24 Thread spring
Thank you. > -Original Message- > From: Lukas Vlcek [mailto:[EMAIL PROTECTED] > Sent: Mittwoch, 23. Januar 2008 08:23 > To: java-user@lucene.apache.org > Subject: Re: Compass > > Hi, > > I am using Compass with Spring and JPA. It works pretty nice. &

Creating search query

2008-01-24 Thread spring
Hi, I have an index with some fields which are indexed and un_tokenized (keywords) and one field which is indexed and tokenized (content). Now I want to create a Query-Object: TermQuery k1 = new TermQuery(new Term("foo", "some foo")); TermQuery k2 = new TermQuery(new Term("bar",

Compass

2008-01-21 Thread spring
Hi, compass (http://www.opensymphony.com/compass/content/lucene.html) promisses many nice things in my opinion. Has anybody production experiences with it? Especially Jdbc Directory and Updates? Thank you. - To unsubscribe, e-

RE: IndexWriter#addIndexes

2008-01-21 Thread spring
> Genau! Indices are simply merged on disk, their content is > not re-analyzed. Thank you! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: How?

2008-01-17 Thread spring
> A non-clustered and clustered index has resovle the problem, > but Lucene can > not do the same thing like that? Well, I bet the database solution is the best, as long as you do not search in big text fields or you need special fulltext features like fuzzy search etc. Synchronizing a lucene in

RE: How?

2008-01-16 Thread spring
> I can use the cluster index on the table. But you can create only one > cluster index in a table. In this table , lots of data need > to search, so I > choose the Lucene to do that. Why do you need a clustered index in the database? A non-clustered would do the job as well. --

IndexWriter#addIndexes

2008-01-16 Thread spring
Hi, looking into the code of IndexMergeTool I saw this: IndexWriter writer = new IndexWriter(mergedIndex, new SimpleAnalyzer(), true); Then the indexes are added to this new index. My question is: How does the Analyzer of this IndexWriter instance effect the merge process? It seems that is do

RE: How?

2008-01-16 Thread spring
> firstly, I submit the query like "select * from [tablename]". > And in this > table, there are around 30th columns and 40,000 rows data. > And I use the > standrandAnalyzer to generate the index. Why don't you use a database index? -

RE: Index merging and optimizing

2008-01-15 Thread spring
> But it also seems that the parallel/not parallel decision is > something you control on the back end, so I'm not sure the user > is involved in the merge question at all. In other words, you could > easily split the indexing task up amongst several machines and/or > processes and combine all the

RE: Index merging and optimizing

2008-01-14 Thread spring
> Then why would you want to combine them? > > I really think you need to explain what you're trying to accomplish > rather then obsess on the details. I have to create indexes in parallel because the amount of data is very high. Then I want to merge them into bigger indexes an move them to the s

RE: When to use which Analyzer

2008-01-14 Thread spring
> You can answer an awful lot of this much faster than waiting > for someone > to reply by getting a copy of Luke and look at the parse results using > various > analyzers. Ah cool, you mean the "explain structure" button. > Try KeywordAnalyzer for your query. > > Combine queries programmatica

RE: Index merging and optimizing

2008-01-14 Thread spring
> I admit I've never used IndexMergeTool, I've always used > IndexWriter.AddIndexex and then execute > IndexWriter.optimize(). > > And I've seen no problems. That call takes no > analyzer. So you take the first index an add a remaining indexes via addIndexes? What happens if the indexes were crea

RE: When to use which Analyzer

2008-01-14 Thread spring
> The caution to use the same analyzer at index and query time is, > in my experience, simply good advice to follow until you are > familiar enough with how Lucene uses analyzers to keep from > getting really, really, really confused. Once you understand > when analyzers are used and how they effec

RE: When to use which Analyzer

2008-01-14 Thread spring
> > How can I search for fields stored with Field.Index.UN_TOKENIZED? > > Use TermQuery. > > > Why do I need an analyzer for searching? > > Consider a full-text field that will be tokenized removing special > characters and lowercased, and then a user querying for an uppercase > word. The

RE: Max size of index (FSDirectory )

2008-01-14 Thread spring
> OG: again, it depends. If the index you'd get by merging is > of manageable size, then merge your indices. OK, this is what I tought. A single index should be faster than multiple indexes with a MultiSearcher, right? But what about the ParallelMultiSearcher? As I understand the docs it searc

RE: Index merging and optimizing

2008-01-14 Thread spring
> See org.apache.lucene.misc.IndexMergeTool Thank you. But this uses a hardcoded analyzer and deprecated API-Calls. How does the used analyzer effect the merge process? Is everything reindexed with this new analyzer again? Does this make sense? What if the sources indexes had other analyzers us

Max size of index (FSDirectory )

2008-01-13 Thread spring
Hi, is there any maximum size for an index? Are there any recommendations for a useful max size? I want to index in parallel. So I have to create multiple indexes. Shall I merge them together or shall I let them as they are using (Parallel)MultiSearcher? Thank you. ---

RE: IndexWriter minMergeDocs

2008-01-13 Thread spring
> I think that method was renamed somewhere along the way to > setMaxBufferedDocs. > > However, in 2.3 (to be released in a few weeks), it's better to use > setRAMBufferSizeMB instead. > > For more ideas on speeding up indexing, look here: > > http://wiki.apache.org/lucene-java/ImproveI

Index merging and optimizing

2008-01-13 Thread spring
Hi, are there any ready to use tools out there which I can use for merging and optimzing? I have seen that Luke can optimize, but not merge? Or do I have to write my own utility? Thank you - To unsubscribe, e-mail: [EMAIL PRO

When to use which Analyzer

2008-01-13 Thread spring
Hi, I have some doubts about Analyzer usage. I read that one shall always use the same analyzer for searching and indexing. Why? How does the Analyzer effect the search process? What is analyzed here again? I have tried this out. I used a SimpleAnalyzer for indexing with Field.Store.YES and Field

IndexWriter minMergeDocs

2008-01-13 Thread spring
Hi, http://wiki.apache.org/lucene-java/PainlessIndexing says that I shall use setMinMergeDocs. But I cannot find this method in lucene 2.2. What is wrong here? Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additi

RE: Design questions

2008-01-13 Thread spring
OK, thank you! I will try this out. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

  1   2   >