Re: Lucene index in memcache

2007-07-02 Thread Cathy Murphy
Hi Erick & Chris , Thanks for your response. I have done some profiling , and it seems the response is slow when there are long queries(more than 5-6 words per query). The way I have implemented is : I pass in the search query and lucene returns the total number of hits, along with ids . I then fe

Re: highlighting phrase query

2007-07-02 Thread sandeep chawla
Thanks a lot Mark, has any one used Lucene-794? how stable it it. is it widely used in industry. These are some of my questions :) Thanks Sandeep On 03/07/07, Renaud Waldura <[EMAIL PROTECTED]> wrote: Mark: Thanks a million for this comprehensive analysis. This is going straight to my manage

RE: Pagination

2007-07-02 Thread Lee Li Bin
Hi Mark, How do I display results on the second page? I manage to display on one page using your coding. Regards, Lee Li Bin -Original Message- From: Alixandre Santana [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 03, 2007 12:55 AM To: java-user@lucene.apache.org Subject: Re: Paginat

RE: Pagination

2007-07-02 Thread Lee Li Bin
Hi, Thanks Mark! I do have the same question as Alixandre. How do I get the content of the document instead of the document id? Thanks. Regards, Lee Li Bin -Original Message- From: Alixandre Santana [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 03, 2007 12:55 AM To: java-user@lucene.a

multi-term query weighting

2007-07-02 Thread Tim Sturge
I have an index with two different sources of information, one small but of high quality (call it "title"), and one large, but of lower quality (call it "body"). I give boosts to certain documents related to their popularity (this is very similar to what one would do indexing the web). The pr

RE: highlighting phrase query

2007-07-02 Thread Renaud Waldura
Mark: Thanks a million for this comprehensive analysis. This is going straight to my manager. :) --Renaud -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 2:11 PM To: java-user@lucene.apache.org Subject: Re: highlighting phrase query There ha

Reusing Document Objects (was Auto Slop)

2007-07-02 Thread Walt Stoneburner
If I create a Document object, can I pass it to multiple index writers without harm? Or, does the process of being handed to an Index Writer somehow mutate the state of the Document object, say during tokenizing, that would cause it's re-use with a totally separate index to cause problems ...such

Re: Lucene index in memcache

2007-07-02 Thread Chris Hostetter
: Is there a way to store lucene index in memcache. During high traffic search : becomes very slow. :( http://people.apache.org/~hossman/#xyproblem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" w

Re: highlighting phrase query

2007-07-02 Thread Mark Miller
There has been a lot of Highlighter discussion lately, but just to try and sum up the state of Highlighting in the Lucene world: There are four Highlighter implementations that I know of. From what I can tell, only the original Contrib Highlighter has received sustained active development by m

Re: Lucene index in memcache

2007-07-02 Thread Erick Erickson
You can always read the current index into a RAMdir, but I really wonder if that will make much of a difference, as your op system should be taking care of this kind of thing for you. How big is your index? What kind of performance are you seeing? What else is running on that box? I'd do some pr

Lucene index in memcache

2007-07-02 Thread Cathy Murphy
Is there a way to store lucene index in memcache. During high traffic search becomes very slow. :( -- Cathy www.nachofoto.com

Modify search results

2007-07-02 Thread Robert Mullin
I have managed to download and install Lucene. In addition, I have reached the point at which I am able to generate an index and run a search. The search returns a 'raw' list of the HTML pages in which my search term occurs. . . . chapter17, chapter18, etc. Question: how do I go about manipulat

Re: Pagination

2007-07-02 Thread Alixandre Santana
Mark, The ScoreDoc[] contains only the IDs of each lucene document. what would be the best way of getting the entire (lucene)document ? Should i do a new search with the ID retrivied by hpc.getScores() - (searcher.doc(idDoc))? thanks. Alixandre On 7/2/07, mark harwood <[EMAIL PROTECTED]> wrot

highlighting phrase query

2007-07-02 Thread sandeep chawla
Hi All, I am developing a search tool using lucene. I am using lucene 2.1. i have a requirement to highlight query words in the results. .Lucene-highlighter 2.1 doesn't work well in highlighting phase query. For example - if i have a query string "lucene Java" .It highlights not only occurrence

RE: Auto Slop

2007-07-02 Thread Ard Schrijvers
> I just ran into an interesting problem today, and wanted to know if it > was my understanding or Lucene that was out of whack -- right now I'm > leaning toward a fault between the chair and the keyboard. > > I attempted to do a simple phrase query using the StandardAnalyzer: > "United States"

Re: Auto Slop

2007-07-02 Thread Mark Miller
Examine your indexes and analyzers. The default slop is 0, which means allow 0 terms between the terms in the phrase. That would be an exact match. A slop of 1 is not the default and would allow a term movement of one position to match the phrase. - Mark Walt Stoneburner wrote: I just ran in

Auto Slop

2007-07-02 Thread Walt Stoneburner
I just ran into an interesting problem today, and wanted to know if it was my understanding or Lucene that was out of whack -- right now I'm leaning toward a fault between the chair and the keyboard. I attempted to do a simple phrase query using the StandardAnalyzer: "United States" Against my c

Re: Exchange/PST/Mail parsing

2007-07-02 Thread Christiaan Fluit
Hello Grant (cc-ing aperture-devel), I am one of the Aperture admins, I can tell you a bit more about Aperture's mail facilities. Short intro: Aperture is a framework for crawling and full-text and metadata extraction of a growing number of sources and file formats. We try to select the best

Re: Geneology, nicknames, levenstein, soundex/metaphone, etc

2007-07-02 Thread Grant Ingersoll
On Jul 2, 2007, at 8:07 AM, Darren Hartford wrote: Thank you for the link to the previous thread, lot of information there! *Synonym use of nicknames - that sounds quite feasible. Do you specifically mean the WordNet module in the Sandbox, or something different? No, I think I was thinkin

RE: Geneology, nicknames, levenstein, soundex/metaphone, etc

2007-07-02 Thread Darren Hartford
Thank you for the link to the previous thread, lot of information there! *Synonym use of nicknames - that sounds quite feasible. Do you specifically mean the WordNet module in the Sandbox, or something different? > -Original Message- > From: Grant Ingersoll [mailto:[EMAIL PROTECTED] >

Re: Exchange/PST/Mail parsing

2007-07-02 Thread Nick Burch
On Sun, 1 Jul 2007, Grant Ingersoll wrote: Anyone have any recommendations on a decent, open (doesn't have to be Apache license, but would prefer non-GPL if possible), extractor for MS Exchange and/or PST files? There has been an offer to contribute a PST parser to Apache POI. We're hoping th

Re: Pagination

2007-07-02 Thread mark harwood
The Hits class is OK but can be inefficient due to re-running the query unnecessarily. The class below illustrates how to efficiently retrieve a particular page of results and lends itself to webapps where you don't want to retain server side state (i.e. a Hits object) for each client. It would

RE: Pagination

2007-07-02 Thread Lee Li Bin
Hi, I still have no idea of how to get it done. Can give me some details? The web application is in jsp btw. Thanks a lot. Regards, Lee Li Bin -Original Message- From: Chris Lu [mailto:[EMAIL PROTECTED] Sent: Saturday, June 30, 2007 2:21 AM To: java-user@lucene.apache.org Subject: Re

Re: Exchange/PST/Mail parsing

2007-07-02 Thread jm
We had to develop vb code to convert pst to eml files. I am using mbox, works fine for me. And I am also using aperture, but only for extracting text from non-mail files (like office etc), works fine too. On 7/2/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: Anyone have any recommendations on