Thanks for the input. I am looking at the suggested links now. If I make
any progress I will return to see if any of my work would be appropriate
to contribute back.
Sean
Paul Elschot wrote:
On Tuesday 06 September 2005 08:52, markharw00d wrote:
>>I believe I have heard that Span queries
Hello (redirecting to java-user@),
If you want to have more control over scoring and dealing with hits,
use HitCollector. Then you can break out when you accumulate enough
results. Note that scores in HitCollector are not normalized as are
the one coming from IndexSearcher's search(...) methods.
: Hi,
: I find that unit tests that modify an existing record in the Lucene
: index by removing it , modifying it and re-adding it, fails if I switch
: from an FSDirectory to a RAMDirectory.
Could you please post a full and complete unit test that demonstrates the
problem. Based on your descript
: I don't know if the developpers of lucene would agree, but from what
: I've been browsing on the ML archives, those multiple language issues
: seems to arrise quite often in the mailing list, and maybe some articles
: like "best practices", "do's and don'ts" or "Lucene Architecture in
: multiple
Hi Everyone,
I have a special scenario where I frequently want to insert duplicates
documents in the index. For example, I know that I want 400 copies of the
same document. (I use the docboost of something else so I can't just add one
document and set the docboost to 400).
I would like to hac
Legolas Woodland wrote:
Hi
Thank you for reading mu post.
how i can have more than one spell check suggestion ?
for example if some one entered puore
it return :
pore
pour
pure
poor
poer
pire
or something similar ?
I really need to implement this feature
Thank you
Have a look here:
http://toda
See the contrib/spellchecker area of Lucene's Subversion repository.
Erik
On Sep 6, 2005, at 10:09 AM, Legolas Woodland wrote:
Hi
Thank you for reading mu post.
how i can have more than one spell check suggestion ?
for example if some one entered puore
it return :
pore
pour
pure
poor
poer
Hi
Thank you for reading mu post.
how i can have more than one spell check suggestion ?
for example if some one entered puore
it return :
pore
pour
pure
poor
poer
pire
or something similar ?
I really need to implement this feature
Thank you
Have a look at your analyzer (check out my java.net article for
starters), and the "Analysis Paralysis" section of the Lucene wiki.
You will need to adjust your analyzer (and query parser perhaps) to
tokenize things as you'd like. For a quick fix, try using the
WhitespaceAnalyzer, though
On Sep 6, 2005, at 7:15 AM, Hacking Bear wrote:
On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote:
As far as your usage is concerned, it seems to be the right approach,
and I think the StandardAnalyzer does the job pretty right when it
has
to deal with whatever language you want.
Olivier Jaquemet wrote:
> Gusenbauer Stefan wrote:
>
>> I think nutch uses ngramj for language classification but i don't know
>> what type of saving language information they use. In our application
>> for example i save the language in an extra field in the document
>> because lucene is supporti
Surely it's best to have a specific analyzer for each language?
Would support for multiple Analyzers with a single index require a
different IndexWriter for each Analzser/language? Would you then need
to manage the disk access of these regarding locking etc, so two
IndexWriter's can not do so at
Hi,
I find that unit tests that modify an existing record in the Lucene
index by removing it , modifying it and re-adding it, fails if I switch
from an FSDirectory to a RAMDirectory.
This code gives me a Directory that works:
FSDirectory fsDirectory =
FSDirectory.getDirectory(physicalDirectoryNam
Gusenbauer Stefan wrote:
I think nutch uses ngramj for language classification but i don't know
what type of saving language information they use. In our application
for example i save the language in an extra field in the document
because lucene is supporting multiple fields with the same names
James Adams wrote:
>Does anyone know what approach does Nutch uses?
>
>
>-Original Message-
>From: Hacking Bear [mailto:[EMAIL PROTECTED]
>Sent: 06 September 2005 12:15
>To: java-user@lucene.apache.org
>Subject: Re: Multiple Language Indexing and Searching
>
>On 9/6/05, Olivier Jaquemet <
Does anyone know what approach does Nutch uses?
-Original Message-
From: Hacking Bear [mailto:[EMAIL PROTECTED]
Sent: 06 September 2005 12:15
To: java-user@lucene.apache.org
Subject: Re: Multiple Language Indexing and Searching
On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote:
>
Hi Mark,
With the change, the problem was completely solved!
Sample: (JapaneseAnalyzer)
Text: AMeetingWillBeHeldInTheCityHall
TokenStream:
[A][Meeting][Will][Be][Held][In][The][City][Hall]
Query Text: Meeting
Output: AMeetingWillBeHeldInTheCityHall
Query Text: CityHall
Output: AMeetingWillBeHe
On 9/6/05, Olivier Jaquemet <[EMAIL PROTECTED]> wrote:
>
> As far as your usage is concerned, it seems to be the right approach,
> and I think the StandardAnalyzer does the job pretty right when it has
> to deal with whatever language you want.
I should look into exactly what it does. Does this
Hi
I would like to be able to index and search on technical terms such as C++ and
C#, but I am finding that both are being reduced to just C. These terms can be
enterred from a free text box on the search interface. Is there a recommended
way of doing this?
Thanks
Daniel
-
Try change TokenGroup.isDistinct();
Maybe the offset test code should be >= rather than >
ie
boolean isDistinct(Token token)
{
return token.startOffset()>=endOffset;
}
I've just tried the change with the Junit test and all
seems well still with the non CJK
Hi Chris,
Thank you for your info.
With CJKAnalyzer, the diagnosis are as follows:
pos start end
Inc OfstOfst
[Aa]1 0 2
[aa]1 1 3
[aB]1 2 4
[BC]1 3 5
[Cc]1 4 6
[cD]1 5 7
[
I added some code you advised and the result is as follows:
Text: AaaBCcDdEFGgHhIiJKkLMmN
Pos start end
Inc OfstOfst
[Aaa] 1 0 3
[B] 1 3 4
[Cc]1 4 6
[Dd]1 6 8
[E] 1 8 9
[F] 1 9
As far as your usage is concerned, it seems to be the right approach,
and I think the StandardAnalyzer does the job pretty right when it has
to deal with whatever language you want.
Though, note that it won't deal with all languages' stop words but the
English ones, unless specified at index tim
On Tuesday 06 September 2005 08:52, markharw00d wrote:
> >>I believe I have heard that Span queries provide some way to access
> document offset information for their hits somehow.
>
> See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
>
> Faithfully selecting extracts based
On Tuesday 06 September 2005 08:21, Sean O'Connor wrote:
> I believe I have heard that Span queries provide some way to access
> document offset information for their hits somehow. Does anyone know if
> this is true, and if so, how I would go about it?
>
> Alternatively (preferably actually) doe
25 matches
Mail list logo