Phrase Query

2008-09-15 Thread Cam Bazz
Hello, Lets say I have two documents, both containing field F. document 0 has the string "a b" as F document 1 has the string "b a" as F I am trying to make a phrasequery like: PhraseQuery pq = new PhraseQuery(); pq.add(new Term("F", "a")); pq.add(new Term("F", "b"));

Re: Phrase Query

2008-09-15 Thread Cam Bazz
I noticed this was because I was using a KeywordAnalyzer. Is it possible to write a document with different analyzers in different fields? Best. On Tue, Sep 16, 2008 at 8:33 AM, Cam Bazz <[EMAIL PROTECTED]> wrote: > Hello, > > Lets say I have two documents, both containing field F. > > document

Re: TopDocs question

2008-09-15 Thread Cam Bazz
yes, I made it that way. but still have to port some of my code. thanks a lot. On Tue, Sep 16, 2008 at 6:28 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > I think Daniel was suggesting you write your own HitCollector with its own > "int hits" counter var. > > Otis > -- > Sematext -- http://se

Re: IndexSearcher.search

2008-09-15 Thread Cam Bazz
In cases where we dont know the possible number of hits -- and wanting to test the new 2.4 way of doing things, could I use custom hitcollectors for everything? any performance penalty for this? from what I understand both TopDocCollector and TopDocs will try to allocate an array of Integer.MAX_V

Re: TopDocs question

2008-09-15 Thread Otis Gospodnetic
I think Daniel was suggesting you write your own HitCollector with its own "int hits" counter var. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Cam Bazz <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, September 15,

Re: TopDocs question

2008-09-15 Thread Cam Bazz
Yes, I looked into implementing a custom collector that would return number of hits, but - I could not. collect() can not access anything that is final, and final can not be incremented. Any ideas? Best. On Tue, Sep 16, 2008 at 6:05 AM, Daniel Noll <[EMAIL PROTECTED]> wrote: > Cam Bazz wrote: >

Re: warming up searchers

2008-09-15 Thread Otis Gospodnetic
I don't think the "exists vs. doesn't exist" matters (but I should really try it and see) as much as using Sort vs. not using it if you use sorting because sorting required FieldCache loading. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > Fr

Re: TopDocs question

2008-09-15 Thread Daniel Noll
Cam Bazz wrote: Hello, Could it harm if I make a searcher.search(query, Integer.MAX_VALUE) ? I just need to make a query to get the number of hits in this case, but I dont know what the max hits will be. PriorityQueue will attempt to allocate an array of that size. But if you only need to k

Re: IndexSearcher.search

2008-09-15 Thread Daniel Noll
Otis Gospodnetic wrote: Hi, Check the Hits javadoc: * @deprecated Hits will be removed in Lucene 3.0. * Instead e. g. [EMAIL PROTECTED] TopDocCollector} and [EMAIL PROTECTED] TopDocs} can be used: * * TopDocCollector collector = new TopDocCollector(hitsPerPage); * searcher.search(qu

TopDocs question

2008-09-15 Thread Cam Bazz
Hello, Could it harm if I make a searcher.search(query, Integer.MAX_VALUE) ? I just need to make a query to get the number of hits in this case, but I dont know what the max hits will be. Also When I make a topdocs.totalHits is that same as topdocs.scoreDocs.length()? Best. -C.A. ---

warming up searchers

2008-09-15 Thread Cam Bazz
Hello, What kind of query is best to warm up a searcher? How many searches should I do? Are we supposed to search for things we know do exist, or is it better to make queries we know they dont exist? Best. -C.B. - To unsubscrib

Re: more on isDeleted

2008-09-15 Thread Michael McCandless
Sorry, I was talking about the future (when we can get realtime search working with Lucene). You have to change your code below to open a new reader (or reopen the reader from your IndexSearcher) call isDeleted on the new reader, to see the deletion you did with the writer. Or, you have

Re: patching lucene-1314

2008-09-15 Thread Jason Rutherglen
I am updating it to work with trunk. On Mon, Sep 15, 2008 at 2:11 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Yes, probably out of sync with the 2.3.2 code. Have you tried applying it to > the trunk? > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Or

Re: instantiated index in 2.4

2008-09-15 Thread Cam Bazz
Hello Karl; This is good good good news. It works. However, I added a document like doc.add(new Field("f", "a", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); and then searched. The score is 0.3~ for the found document. should not it be 1.0? also it will find when searched for "f","b" o

Re: more on isDeleted

2008-09-15 Thread Cam Bazz
Well, Document da = new Document(); da.add(new Field("word", "a", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); writer.addDocument(da); writer.commit(); searcher = new IndexSearcher(dir); IndexReader reader = searcher.getIndexReader();

Re: more on isDeleted

2008-09-15 Thread Michael McCandless
It will return true if the provided docID was deleted, by term or query or docID (due to exception, privately) prior to when you asked IndexWriter to give you a "realtime" IndexReader. Mike Cam Bazz wrote: ok. but then under what circumstances isDeleted() will return true? Best. On Mon

Re: 2.4 questions

2008-09-15 Thread Michael McCandless
We are in the [slowish] process of releasing 2.4 now -- we are down to 3 2.4 issues: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310110&fixfor=12312681 Once these are resolved then we'll work

Re: instantiated index in 2.4

2008-09-15 Thread Karl Wettin
15 sep 2008 kl. 18.51 skrev Karl Wettin: Are the adds reflected directly to the index? Yes. An InstantiatedIndexReader is always current. You will probably still have to reconstruct your searcher. I never really looked in to what happends if you don't. The second statement was wrong. There

Re: more on isDeleted

2008-09-15 Thread Cam Bazz
ok. but then under what circumstances isDeleted() will return true? Best. On Mon, Sep 15, 2008 at 10:57 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Until we can get realtime search integrated into Lucene (which I'm gradually > trying to working on) I think the answer is no -- for now yo

Re: 2.4 questions

2008-09-15 Thread Cam Bazz
out of curiousity and somewhat unrelated to this thread. when can we expect to see 2.4? it seems much much as changed. so people would want to port their code? Best. On Mon, Sep 15, 2008 at 10:56 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > >> well, I did not understan

Re: IndexWriter commit

2008-09-15 Thread Cam Bazz
Hello Dipen, I think what he meant is that if power is off the last transaction is trashed, but your index is not. Best. On Mon, Sep 15, 2008 at 10:55 PM, Dipen <[EMAIL PROTECTED]> wrote: > hi michael, > this is rather hard for me to understand, if a system loses power > (electricity), how can

Re: IndexWriter commit

2008-09-15 Thread Michael McCandless
It's only if power is lost *after* the call to IndexWriter.commit() has successfully returned, that the guarantee holds. commit() does not return until all newly written and referenced files in the index have been successfully fsync'd (and the OS does not return from fsync until all bytes

Re: more on isDeleted

2008-09-15 Thread Michael McCandless
Until we can get realtime search integrated into Lucene (which I'm gradually trying to working on) I think the answer is no -- for now you have to keep your own record of which docIDs you've deleted. Because IndexWriter allows deletes by query and term (and also by docID, privately, when

Re: 2.4 questions

2008-09-15 Thread Michael McCandless
Cam Bazz wrote: well, I did not understand here. so there is a no way of using the new constructor - and specify autoCommit = false ? That's right, until 3.0. I would prefer to have a new API, introduced in 2.4 and kept in 3.0, that has autoCommit=false as its default (without being speci

Re: IndexWriter commit

2008-09-15 Thread Dipen
hi michael, this is rather hard for me to understand, if a system loses power (electricity), how can it be ensured that fsync() call will happen at all, this commit function relies on fsync() but what if OS doesnt have time or power in this case to actually call fsync() and synchronize. I read ab

Re: 2.4 questions

2008-09-15 Thread Cam Bazz
well, I did not understand here. so there is a no way of using the new constructor - and specify autoCommit = false ? Best On Mon, Sep 15, 2008 at 10:30 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Cam Bazz wrote: > >> However the documentation states that autoCommit=true. > > For now,

Re: more on isDeleted

2008-09-15 Thread Cam Bazz
So, apart from the searcher, is there anyway to access the deletion marks in an indexWriter. I have a live cache - and I was keeping two caches, ones for new adds, other for deletes. I am trying to get rid of deleted cache, and ask the index if a fetched document is marked deleted. Best. -C.B.

Re: 2.4 questions

2008-09-15 Thread Michael McCandless
Cam Bazz wrote: However the documentation states that autoCommit=true. For now, keep using the deprecated API and specify autoCommit=false. Then in 3.0, when IndexWriter switches to autoCommit=false, remove the boolean autoCommit from your constructor. How do we disable this? In 2.3 I

Re: more on isDeleted

2008-09-15 Thread Michael McCandless
You'll have to open a new IndexReader after the delete is committed. An IndexReader (or IndexSearcher) only searches the point-in-time snapshot of the index as of when it was opened. Mike Cam Bazz wrote: Hello, Here is what I am trying to do: dir = FSDirectory.getDirectory("/test

Re: 2.4 questions

2008-09-15 Thread Cam Bazz
great! well I never use autoCommit=true. However the documentation states that autoCommit=true. How do we disable this? In 2.3 I used to do a: writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); would that totally disable autoCommit, or will it autoCommit when the ram usage reaches a ce

Re: 2.4 questions

2008-09-15 Thread Michael McCandless
Cam Bazz wrote: Hello, I see that IndexWriter.flush() is depreciated in 2.4. What do we use? Looks like you already found it, but the javadoc says this: * @deprecated please call [EMAIL PROTECTED] #commit()}) instead Also I used to make a: try { nodeWriter = new I

more on isDeleted

2008-09-15 Thread Cam Bazz
Hello, Here is what I am trying to do: dir = FSDirectory.getDirectory("/test"); writer = new IndexWriter(dir, analyzer, true, new IndexWriter.MaxFieldLength(2)); writer.setMaxBufferedDocs(IndexWriter.DISABLE_AUTO_FLUSH); Document da = new Document(); da.ad

Re: IndexWriter commit

2008-09-15 Thread Cam Bazz
Hello, Thanks a bunch Michael. Its been a long time I wanted to upgrade to 2.4. It seems major change has been done. Best. On Mon, Sep 15, 2008 at 9:49 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Oh and I just committed a fix to IndexWriter's javadocs -- commit(long) is a > private met

Re: IndexWriter commit

2008-09-15 Thread Michael McCandless
Oh and I just committed a fix to IndexWriter's javadocs -- commit(long) is a private method that should never have been in the javadocs. Thanks for raising this! Mike Cam Bazz wrote: Hello, What is the difference between flush in <2.4 and commit? Also I have been looking over docs, an

Re: IndexWriter commit

2008-09-15 Thread Michael McCandless
There is no difference, unless your computer/OS crashes or loses power shortly after you had call the method. In that case, there's a big difference: commit() guarantees your index will be intact (assuming the storage system holding your index was not damaged) but with flush(), which does

IndexReader.isDeleted

2008-09-15 Thread Cam Bazz
Hello, I would like to get advantage of isDeleted. If I delete a document from index, and not commit, and index searcher is not reinstantiated, how can I check if a document is marked for deletion? I tried it with both commit() and without committing, the isDeleted(mydeleteddocid) returns always f

Re: patching lucene-1314

2008-09-15 Thread Otis Gospodnetic
Yes, probably out of sync with the 2.3.2 code. Have you tried applying it to the trunk? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Cam Bazz <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Monday, September 15, 2008 11:14

Re: IndexSearcher.search

2008-09-15 Thread Otis Gospodnetic
Hi, Check the Hits javadoc: * @deprecated Hits will be removed in Lucene 3.0. * Instead e. g. [EMAIL PROTECTED] TopDocCollector} and [EMAIL PROTECTED] TopDocs} can be used: * * TopDocCollector collector = new TopDocCollector(hitsPerPage); * searcher.search(query, collector); * Scor

IndexSearcher.search

2008-09-15 Thread Cam Bazz
Hello, What is the new favorable way of searching a query? I understand Hits will be depreciated. So how do we do it the new way? With hit collector? Best. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-

IndexWriter commit

2008-09-15 Thread Cam Bazz
Hello, What is the difference between flush in <2.4 and commit? Also I have been looking over docs, and they mention commit(long) but there is no commit(long) method but only commit() Best. - To unsubscribe, e-mail: [EMAIL PROT

2.4 questions

2008-09-15 Thread Cam Bazz
Hello, I see that IndexWriter.flush() is depreciated in 2.4. What do we use? Also I used to make a: try { nodeWriter = new IndexWriter(nodeDir, true, analyzer, false); } catch(FileNotFoundException e) { nodeWriter = new IndexWriter(nodeDir, true, analyzer,

Re: instantiated index in 2.4

2008-09-15 Thread Karl Wettin
15 sep 2008 kl. 18.45 skrev Cam Bazz: I have been looking at instantiated index in the trunk. Does this come with a searcher? Pass an InstantiatedIndexReader to the constructor of an IndexSearcher. Are the adds reflected directly to the index? Yes. An InstantiatedIndexReader is always cur

instantiated index in 2.4

2008-09-15 Thread Cam Bazz
Hello, I have been looking at instantiated index in the trunk. Does this come with a searcher? Are the adds reflected directly to the index? Or is it just an experimental thing only with reader and writer? Best. - To unsubscrib

RE: Sorting in lucene through Document boosting

2008-09-15 Thread Dragan Jotanovic
Hm, probably that is not needed. I thought that tf would influence the score if I don't set it to constant value, but it seems that it is sufficient to override just lengthNorm. -Original Message- From: Karl Wettin [mailto:[EMAIL PROTECTED] Sent: Monday, September 15, 2008 4:56 PM To:

Re: Sorting in lucene through Document boosting

2008-09-15 Thread Karl Wettin
15 sep 2008 kl. 14.08 skrev Dragan Jotanovic: I made simple Similarity implementation: public float tf(float arg0) { return 1f; } Why do you touch the term frequency? Is that prehaps unrelated to what's discussed in this thread? karl

Re: search with Filter

2008-09-15 Thread Erick Erickson
Filters aren't really specified per field. All they are is a bitmask, one bit per *document*. You can construct the filter any way you want, in your case by inspecting the date-time field and passing it along with your query. You can even combine several fields into one filter by twiddling the bits

search with Filter

2008-09-15 Thread Dino Korah
Hi All, I am trying to utilize Filter to see if I can get a bit more performance out of my application that searches over 100million document lucene index. On all my documents I have a two fields over which I will have to scope my searches. One is a date-time field (MMDDHHMMSS) and a user-i

Re: patching lucene-1314

2008-09-15 Thread Cam Bazz
Well Hello, I made the patch inside trunk/src but I am getting failed errors. does this mean the lucene-1314 is buggy, or maybe I applied it to the wrong version? Best. joker src # pwd /root/lucene/lucene-2.3.2/src joker src # patch -p0 < ../../lucene-1314.patch patching file java/org/apache/l

Re: About The Lucene Query Syntax

2008-09-15 Thread M. Fatih Soydan
We are using Abbyy (FineReader) Index&Search Libraries and Morpology SDK since 1999. Our SearchString are likes these : ** *borusan* | Soruşan* | bbrusan* | "borusan istanbul filarmo*" | "gürer aykal*" | "borusan oda orkestras*" | "borusan sanat gale*" | "zehra * *nurhan kocabıyık ilköretim*"* **

Re: About The Lucene Query Syntax

2008-09-15 Thread Erick Erickson
The unsatisfactory answer is "because that's the way it works". I suspect that the underlying issue is what happens when you try to expand phrase searches via wildcards. Wildcard searches are already plagued by "TooManyClauses" exceptions, which would only get worse with phrases In fact, downright

Re: About The Lucene Query Syntax

2008-09-15 Thread M. Fatih Soydan
I read. But i didn't understand why not ? 15 Eylül 2008 Pazartesi 16:56 tarihinde Erick Erickson <[EMAIL PROTECTED]> yazdı: > wildcards are NOT supported within double quotes, so if > you are submitting your query > "Technology Gunlugu*" > WITH the double quotes, you are searching for > that liter

Re: About The Lucene Query Syntax

2008-09-15 Thread Erick Erickson
wildcards are NOT supported within double quotes, so if you are submitting your query "Technology Gunlugu*" WITH the double quotes, you are searching for that literal phrase. Best Erick P.S. See: http://lucene.apache.org/java/docs/queryparsersyntax.html the first line under "wildcard searches"