date:20050906

Re: Hits document offset information? Span query or Surround? - thanks

2005-09-06 Thread Sean O'Connor

Thanks for the input. I am looking at the suggested links now. If I make any progress I will return to see if any of my work would be appropriate to contribute back. Sean Paul Elschot wrote: On Tuesday 06 September 2005 08:52, markharw00d wrote: >>I believe I have heard that Span queries

Re: limit return results

2005-09-06 Thread Otis Gospodnetic

Hello (redirecting to java-user@), If you want to have more control over scoring and dealing with hits, use HitCollector. Then you can break out when you accumulate enough results. Note that scores in HitCollector are not normalized as are the one coming from IndexSearcher's search(...) methods.

Re: Switching from FSDirectory to RAMDirectory

2005-09-06 Thread Chris Hostetter

: Hi, : I find that unit tests that modify an existing record in the Lucene : index by removing it , modifying it and re-adding it, fails if I switch : from an FSDirectory to a RAMDirectory. Could you please post a full and complete unit test that demonstrates the problem. Based on your descript

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Chris Hostetter

: I don't know if the developpers of lucene would agree, but from what : I've been browsing on the ML archives, those multiple language issues : seems to arrise quite often in the mailing list, and maybe some articles : like "best practices", "do's and don'ts" or "Lucene Architecture in : multiple

Optimizing insertion of duplicate documents

2005-09-06 Thread Robichaud, Jean-Philippe

Hi Everyone, I have a special scenario where I frequently want to insert duplicates documents in the index. For example, I know that I want 400 copies of the same document. (I use the docboost of something else so I can't just add one document and set the docboost to 400). I would like to hac

Re: how to Find more than one spell check alternative?

2005-09-06 Thread Olivier Jaquemet

Legolas Woodland wrote: Hi Thank you for reading mu post. how i can have more than one spell check suggestion ? for example if some one entered puore it return : pore pour pure poor poer pire or something similar ? I really need to implement this feature Thank you Have a look here: http://toda

Re: how to Find more than one spell check alternative?

2005-09-06 Thread Erik Hatcher

See the contrib/spellchecker area of Lucene's Subversion repository. Erik On Sep 6, 2005, at 10:09 AM, Legolas Woodland wrote: Hi Thank you for reading mu post. how i can have more than one spell check suggestion ? for example if some one entered puore it return : pore pour pure poor poer

how to Find more than one spell check alternative?

2005-09-06 Thread Legolas Woodland

Hi Thank you for reading mu post. how i can have more than one spell check suggestion ? for example if some one entered puore it return : pore pour pure poor poer pire or something similar ? I really need to implement this feature Thank you

Re: indexing/searching on C++

2005-09-06 Thread Erik Hatcher

Have a look at your analyzer (check out my java.net article for starters), and the "Analysis Paralysis" section of the Lucene wiki. You will need to adjust your analyzer (and query parser perhaps) to tokenize things as you'd like. For a quick fix, try using the WhitespaceAnalyzer, though

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Erik Hatcher

On Sep 6, 2005, at 7:15 AM, Hacking Bear wrote: On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote: As far as your usage is concerned, it seems to be the right approach, and I think the StandardAnalyzer does the job pretty right when it has to deal with whatever language you want.

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Gusenbauer Stefan

Olivier Jaquemet wrote: > Gusenbauer Stefan wrote: > >> I think nutch uses ngramj for language classification but i don't know >> what type of saving language information they use. In our application >> for example i save the language in an extra field in the document >> because lucene is supporti

RE: Multiple Language Indexing and Searching

2005-09-06 Thread James Adams

Surely it's best to have a specific analyzer for each language? Would support for multiple Analyzers with a single index require a different IndexWriter for each Analzser/language? Would you then need to manage the disk access of these regarding locking etc, so two IndexWriter's can not do so at

Switching from FSDirectory to RAMDirectory

2005-09-06 Thread Peter Gelderbloem

Hi, I find that unit tests that modify an existing record in the Lucene index by removing it , modifying it and re-adding it, fails if I switch from an FSDirectory to a RAMDirectory. This code gives me a Directory that works: FSDirectory fsDirectory = FSDirectory.getDirectory(physicalDirectoryNam

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Olivier Jaquemet

Gusenbauer Stefan wrote: I think nutch uses ngramj for language classification but i don't know what type of saving language information they use. In our application for example i save the language in an extra field in the document because lucene is supporting multiple fields with the same names

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Gusenbauer Stefan

James Adams wrote: >Does anyone know what approach does Nutch uses? > > >-Original Message- >From: Hacking Bear [mailto:[EMAIL PROTECTED] >Sent: 06 September 2005 12:15 >To: java-user@lucene.apache.org >Subject: Re: Multiple Language Indexing and Searching > >On 9/6/05, Olivier Jaquemet <

RE: Multiple Language Indexing and Searching

2005-09-06 Thread James Adams

Does anyone know what approach does Nutch uses? -Original Message- From: Hacking Bear [mailto:[EMAIL PROTECTED] Sent: 06 September 2005 12:15 To: java-user@lucene.apache.org Subject: Re: Multiple Language Indexing and Searching On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote: >

RE: Highlighter apply to Japanese

2005-09-06 Thread Koji Sekiguchi

Hi Mark, With the change, the problem was completely solved! Sample: (JapaneseAnalyzer) Text: AMeetingWillBeHeldInTheCityHall TokenStream: [A][Meeting][Will][Be][Held][In][The][City][Hall] Query Text: Meeting Output: AMeetingWillBeHeldInTheCityHall Query Text: CityHall Output: AMeetingWillBeHe

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Hacking Bear

On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote: > > As far as your usage is concerned, it seems to be the right approach, > and I think the StandardAnalyzer does the job pretty right when it has > to deal with whatever language you want. I should look into exactly what it does. Does this

indexing/searching on C++

2005-09-06 Thread Daniel Massie

Hi I would like to be able to index and search on technical terms such as C++ and C#, but I am finding that both are being reduced to just C. These terms can be enterred from a free text box on the search interface. Is there a recommended way of doing this? Thanks Daniel -

RE: Highlighter apply to Japanese

2005-09-06 Thread mark harwood

Try change TokenGroup.isDistinct(); Maybe the offset test code should be >= rather than > ie boolean isDistinct(Token token) { return token.startOffset()>=endOffset; } I've just tried the change with the Junit test and all seems well still with the non CJK

RE: Highlighter apply to Japanese

2005-09-06 Thread Koji Sekiguchi

Hi Chris, Thank you for your info. With CJKAnalyzer, the diagnosis are as follows: pos start end Inc OfstOfst [Aa]1 0 2 [aa]1 1 3 [aB]1 2 4 [BC]1 3 5 [Cc]1 4 6 [cD]1 5 7 [

RE: Highlighter apply to Japanese

2005-09-06 Thread Koji Sekiguchi

I added some code you advised and the result is as follows: Text: AaaBCcDdEFGgHhIiJKkLMmN Pos start end Inc OfstOfst [Aaa] 1 0 3 [B] 1 3 4 [Cc]1 4 6 [Dd]1 6 8 [E] 1 8 9 [F] 1 9

Re: Multiple Language Indexing and Searching

2005-09-06 Thread Olivier Jaquemet

As far as your usage is concerned, it seems to be the right approach, and I think the StandardAnalyzer does the job pretty right when it has to deal with whatever language you want. Though, note that it won't deal with all languages' stop words but the English ones, unless specified at index tim

Re: Hits document offset information? Span query or Surround?

2005-09-06 Thread Paul Elschot

On Tuesday 06 September 2005 08:52, markharw00d wrote: > >>I believe I have heard that Span queries provide some way to access > document offset information for their hits somehow. > > See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2 > > Faithfully selecting extracts based

Re: Hits document offset information? Span query or Surround?

2005-09-06 Thread Paul Elschot

On Tuesday 06 September 2005 08:21, Sean O'Connor wrote: > I believe I have heard that Span queries provide some way to access > document offset information for their hits somehow. Does anyone know if > this is true, and if so, how I would go about it? > > Alternatively (preferably actually) doe

Re: Hits document offset information? Span query or Surround? - thanks

Re: limit return results

Re: Switching from FSDirectory to RAMDirectory

Re: Multiple Language Indexing and Searching

Optimizing insertion of duplicate documents

Re: how to Find more than one spell check alternative?

Re: how to Find more than one spell check alternative?

how to Find more than one spell check alternative?

Re: indexing/searching on C++

Re: Multiple Language Indexing and Searching

Re: Multiple Language Indexing and Searching

RE: Multiple Language Indexing and Searching

Switching from FSDirectory to RAMDirectory

Re: Multiple Language Indexing and Searching

Re: Multiple Language Indexing and Searching

RE: Multiple Language Indexing and Searching

RE: Highlighter apply to Japanese

Re: Multiple Language Indexing and Searching

indexing/searching on C++

RE: Highlighter apply to Japanese

RE: Highlighter apply to Japanese

RE: Highlighter apply to Japanese

Re: Multiple Language Indexing and Searching

Re: Hits document offset information? Span query or Surround?

Re: Hits document offset information? Span query or Surround?

25 matches

Site Navigation

Mail list logo

Footer information