Re: Stemming and exact phrases

2005-10-10 Thread Erik Hatcher
On Oct 10, 2005, at 1:44 AM, Anand Kishore wrote: Does stemming result in failure of exact phrase matches??? It shouldn't. Please provide a simple scenario where you're seeing such a failure. Stemming will allow you to find more than the exact phrase, but it should always match an exact

Lucene and remote index and java applet, with no java app server

2005-10-10 Thread J. David Boyd
Here's my dilemma. For years, we have supplied paper documentation to our customers. Many pages of paper. All together, it makes a 3 foot stack when printed. Also for many years, customers have been asking for docs in electronic format, so, recently, I wrote some Perl scripts that convert our m

Re: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread Dan Armbrust
J. David Boyd wrote: Here's my dilemma. For years, we have supplied paper documentation to our customers. Many pages of paper. All together, it makes a 3 foot stack when printed. Also for many years, customers have been asking for docs in electronic format, so, recently, I wrote some Perl scr

Lucene search is very slow

2005-10-10 Thread Harini Raghavan
Hi, I am using lucene for search functionality in my j2ee application using JBoss as app server. The lucene index directory size is almsot 10G. The performance has been quite good until now. But after the last deploy, when the server was restarted , the lucene search has become very slow. It t

Re: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread J. David Boyd
Dan Armbrust wrote: > J. David Boyd wrote: > >> Here's my dilemma. >> >> For years, we have supplied paper documentation to our customers. Many >> pages of paper. All together, it makes a 3 foot stack when printed. >> >> Also for many years, customers have been asking for docs in electronic >> f

RE: Re: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread Peter Kim
I'm not sure about Perl or PHP--perhaps there are some ports of Lucene that'll let you do that. But the most straightforward way is to just write a simple Java web application with a servlet that uses an IndexSearcher to execute a form-entered query and have it return results. It seems like you m

Re: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread Dan Armbrust
I see your words, but I hate to admit that I don't understand them in totality! When you say that the search is executed on the web server, that means that we would need to code it it Perl or some such, no? I don't see (except for a Perl or PHP script) how the search could execute on the website

Re: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread Sameer Shisodia
or php. here's help http://www.devx.com/Java/Article/20509/1954?pf=true rgds, sameer On 10/10/05, Dan Armbrust <[EMAIL PROTECTED]> wrote: > > > serving java/jsp applications) would be to write the necessary code to > make perl talk to java - We have done this before (for a different > purpos

Re: Custom sort with multiple fields?

2005-10-10 Thread Yonik Seeley
You can use the FieldCache to access the values of multiple fields (the same source default sorting uses). Alternately, if you want to generate a score based on a function of multiple fields rather than doing an absolute sort, you can use FunctionQuery: http://issues.apache.org/jira/browse/LUCENE-

Re: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread J. David Boyd
Peter Kim wrote: > I'm not sure about Perl or PHP--perhaps there are some ports of Lucene > that'll let you do that. But the most straightforward way is to just > write a simple Java web application with a servlet that uses an > IndexSearcher to execute a form-entered query and have it return > res

ParallelReader

2005-10-10 Thread John Smith
Hi I am using the ParallelReader feature from Lucene 1.9. I have 2 indexes, one that doesn’t change and the other that changes often. I delete and re-index documents from the dynamic index often. I am indexing the documents with a keyword field “id” and giving it a unique number. Th

Re: ParallelReader

2005-10-10 Thread Daniel Naber
On Montag 10 Oktober 2005 20:24, John Smith wrote: > My understanding is ParallelReader works for situations where you have a > static index and a dynamic index. That's no correct. Quoting the documentation: It is up to you to make sure all indexes are created and modified the same way. For exam

Re: ParallelReader

2005-10-10 Thread John Smith
A while ago I had asked a question on what would be a good solution for a situation mentioned below and I was pointed in the direction of Parallel Reader. Looks like that will not work. Thank you for alerting me on this. So other than delete and reindex the document to a single index, there is

RE: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread Jon Schuster
The suggestion that others have made to make the search web based is generally the preferred route. But it is fairly straightforward to make an unsigned applet use a remote Lucene index. You wouldn't need to write the index and PDF files to the local disk; you only need to be able to open an input

Re: ParallelReader

2005-10-10 Thread John Smith
Sorry to bug people on this again and again. I might be missing something or confused totally, But what is the use case for a ParallelReader if the use case is not addressing the situation where we have a index changing frequently( meaning deletes and reindex) and index not changing , but has s

Re: ParallelReader

2005-10-10 Thread Erik Hatcher
The use case is when there is some data that changes frequently, but some data is static, _and_ that the volatile index can be rebuilt in the same order that the static one was built. The indexes must be "parallel" in terms of the document index order. If you delete, then you should delet

Re: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread J. David Boyd
Jon Schuster wrote: > The suggestion that others have made to make the search web based is > generally the preferred route. > > But it is fairly straightforward to make an unsigned applet use a remote > Lucene index. You wouldn't need to write the index and PDF files to the > local disk; you only

query across fields?

2005-10-10 Thread Marc Hadfield
hello - i am looking to perform queries efficiently across multiple fields that have their token order synchronized, ie: Field_A[100] has some relationship to Field_B[100] for example, consider two fields, one the full text of an article and the other the "type" of the token where type could

Re: query across fields?

2005-10-10 Thread Doug Cutting
Marc Hadfield wrote: I would prefer not to mix the full text and "types" in the same field as it would make the term positions inconsistent which i depend on for other queries. Why not store them in the same field using positionIncrement=0 for the types? Then they won't change positions of n

RE: Is Lucene right for me?

2005-10-10 Thread Sharma, Siddharth
Hoss Thanks for the reply. The posting was an excellent write-up and helped me visualize my problem domain and solution better. I like the idea about storing filter information in the contract index indexed by company. It might work in my case. I am not sure if I understand the BitSet solution t

Re: query across fields?

2005-10-10 Thread Marc Hadfield
Doug Cutting wrote: Why not store them in the same field using positionIncrement=0 for the types? Then they won't change positions of non-type tokens. You should distinguish the types syntactically, e.g., prefix them with a space or other character that does not occur within words. That way

RE: Is Lucene right for me?

2005-10-10 Thread Chris Hostetter
: I am not sure if I understand the BitSet solution though. Can you give me : implementation specifics around that? : Are you suggesting storing BitSet information in the document of each : cat/subcat and that the boolean value of each bit will correspond to whether : the product is blocked or not

Re: query across fields?

2005-10-10 Thread Doug Cutting
Marc Hadfield wrote: I actually mention your option in my email: In principle I could store the full text in two fields with the second field containing the types without incrementing the token index. Then, do a SpanQuery for "Johnson" and "name" with a distance of 0. The resulting match w

Hitcollectors and remotesearchables

2005-10-10 Thread Jeff Rodenburg
Doug Cutting once said, back in 2003: " The *HitCollector*-based search API is not meant to work remotely. To do so would involve an RPC-callback for every non-zero score, which would be extremely expensive. Also, just making *HitCollector* serializable would not be sufficient. You'd also need to

Re: query across fields?

2005-10-10 Thread Marc Hadfield
Thanks Doug - I'll give Span Query's a try as they can handle the 0 increment issue. My original desire to have more than one field comes from my document represention which includes multiple fields containing (the same) document text using different stemmers, as, depending on the type of que

Re: query across fields?

2005-10-10 Thread Doug Cutting
Marc Hadfield wrote: I'll give Span Query's a try as they can handle the 0 increment issue. Note that PhraseQuery can now handle this too. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMA

RE: Re: Lucene and remote index and java applet, with no java app server

2005-10-10 Thread Jon Schuster
Sorry about that, "download" was a poor word choice. By download, I meant that after the applet opens an input stream to the URL, it will need to read from the stream to get all the index data from the web server to the user's machine so the applet can perform the search. Whether the index files a

RE: Lucene search is very slow

2005-10-10 Thread Koji Sekiguchi
Is it really the part of Lucene slow? Please take thread dumps every 15 secs, 3 to 4 times. What can you look at them? Koji > -Original Message- > From: Harini Raghavan [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 11, 2005 12:38 AM > To: java-user@lucene.apache.org > Subject: Lucene

What is MMapDirectory?

2005-10-10 Thread Koji Sekiguchi
Hello, What is MMapDirectory? I've searched mailing list archive, but cannot find it. I could find the following explanation at Lucene 1.9 CHANGES.txt: 8. Add MMapDirectory, which uses nio to mmap input files. This is still somewhat slower than FSDirectory. However it uses less memory

Re: Lucene search is very slow

2005-10-10 Thread Chris Lu
Harini, Did you close the IndexReader every time your search is finished? If so, 10G data will take a long time to warm up the IndexReader. Chris -- Full-Text Search on Any Databases http://www.dbsight.net On 10/10/05, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: > I

Re: Optimization

2005-10-10 Thread Erik Hatcher
Tom, Very cool! Thanks for sharing your technique, which works well for prefixed and suffixed wildcard queries. However, it doesn't address an * in the middle of a term, say W*D. Obviously your usage doesn't require better performance for a wildcard in the middle, so you've done well -