karl wettin wrote:
30 apr 2007 kl. 02.05 skrev Kun Hong:
I'm not sure if you mean that it should treat all repetative tokens
as only one token? Then you are better of using a filter when
analyzing text you insert to the index: rather than creating one
token for each the in "the the the the
Essentially what I am trying to do is boost every document by a certain
factor, so that
the boost is between 1.0 and 2.0. After this, I we are trying to do a search
across multiple fields
and have a computation based purely on tf. Example -
if (field1)
tf = some function
else if (field2)
tf =
: However, it does not look like upgrading is an option, so I wonder if my
: current approach of mapping a property that a client app creates to one
: field name is workable at all. Maybe I have to introduce some sort of
: mapping of client properties to a fixed number of indexable fields.
:
: ...
: Thanks Hoss. Suppose, I go ahead and modify Similarity.java from
...
: Should this work ?
it depends on your definition of "work" ... if that code is what you want
it to do, then yes: it will do what you want it to do.
: P.S. This is a very custom implementation. For the specific probl
Thanks Hoss. Suppose, I go ahead and modify Similarity.java from
static {
for (int i = 0; i < 256; i++)
NORM_TABLE[i] = SmallFloat.byte315ToFloat((byte)i);
}
TO
static {
for (int i = 0; i < 256; i++)
NORM_TABLE[i] = (float) i * 100.0 /256.0;
}
Should this work ?
Thanks
I thought about using ulimit, but it does not scale. In the scenario that the
app has to support, client applications could create hundreds of thousands of
unique properties, which would result in this many indexable fields.
Based on previous answers, the way out of this problem while still bein
On 4/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Thanks for you reply.
We are still using Lucene v1.4.3 and I'm not sure if upgrading is an option. Is
there another way of disabling length normalization/document boosts to get rid
of those files?
Why not raise the limit of open files
Hi All:
Does someone have compared NFS versus OCFS2 in a Lucene grid installation?
The Oracle Cluster Filesystem 2 is shipped by default since linux
kernel 2.6.16-rc1+
OCFS2 is a cluster optimized file system used by the Oracle RAC
configuration (http://oss.oracle.com/projects/ocfs2/).
One o
: I want to modify the norms to only include values between 0 and 100.
: Currently, I have a custom implementation of the default similarity. Is it
: sufficient to override the encodeNorm and decodeNorm methods from the base
: implementation in my custom Similarity class ? Please let me know if th
I want to modify the norms to only include values between 0 and 100.
Currently, I have a custom implementation of the default similarity. Is it
sufficient to override the encodeNorm and decodeNorm methods from the base
implementation in my custom Similarity class ? Please let me know if there
are
El sáb, 28-04-2007 a las 19:43 -0400, Erick Erickson escribió:
> You actually wouldn't have to maintain two versions. You could,
> instead, inject the accentless (stemmed) terms in your single
> index as synonyms (See Lucene In Action). This is easier
> to search and maintain
>
> But it also b
Thanks for you reply.
We are still using Lucene v1.4.3 and I'm not sure if upgrading is an option. Is
there another way of disabling length normalization/document boosts to get rid
of those files?
Thanks,
Rico
: >From what I read in the Lucene docs, these .f files store the
: normalization fac
If you only knew how many times I've looked at code I've written and
wondered "What was I thinking" ...
Anyway, glad it's working for you
Erick
On 4/30/07, axel.reymonet <[EMAIL PROTECTED]> wrote:
Hello,
Thank you for your piece of advice. Indeed, my mistake was to use HashSet
instead of an
Hello,
Thank you for your piece of advice. Indeed, my mistake was to use HashSet
instead of an ArrayList (for instance). I must have been really distracted
when I wrote my code, even more when I checked it! Anyway, thank you again,
Axel Reymonet
-Message d'origine-
De : Erick Erickson [m
The first thing I'd do is not use a HashSet when you collect your
SpanTermQuerys since the iteration order is not guaranteed. That is,
the order when putting them in is not necessarily the same as
when getting them out. So you may be searching for
"automatique climatisation" rather then "climatisa
I believe the code Otis is referring to is here:
http://issues.apache.org/jira/browse/LUCENE-474
This is index-level analysis but could be adapted to work for just a single
document.
The implementation is optimised for speed rather than being a thorough
examination of phrase significance.
Che
29 apr 2007 kl. 18.33 skrev saikrishna venkata pendyala:
Where does the lucene compute term frequency vector ?
{filename,function
name}
DocumentWriter.java
private final void invertDocument(Document doc)
Actually the task is to replace the all term frequencies with some
constant number(
30 apr 2007 kl. 02.05 skrev Kun Hong:
I'm not sure if you mean that it should treat all repetative
tokens as only one token? Then you are better of using a filter
when analyzing text you insert to the index: rather than creating
one token for each the in "the the the the the the" you only
Hello,
I am having some issues with the SpanQuery functionality. As a matter of
fact, I index a single french file containing for instance "climatisation
automatique" (which means automatic air-conditioning) with the classical
FrenchAnalyzer, and when I search in this index with SpanQuery, I have
19 matches
Mail list logo