Re: Search for docs containing only a certain word in a specified field?

2007-04-30 Thread Kun Hong
karl wettin wrote: 30 apr 2007 kl. 02.05 skrev Kun Hong: I'm not sure if you mean that it should treat all repetative tokens as only one token? Then you are better of using a filter when analyzing text you insert to the index: rather than creating one token for each the in "the the the the

Re: Modifying norms...

2007-04-30 Thread escher2k
Essentially what I am trying to do is boost every document by a certain factor, so that the boost is between 1.0 and 2.0. After this, I we are trying to do a search across multiple fields and have a computation based purely on tf. Example - if (field1) tf = some function else if (field2) tf =

Re: Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

2007-04-30 Thread Chris Hostetter
: However, it does not look like upgrading is an option, so I wonder if my : current approach of mapping a property that a client app creates to one : field name is workable at all. Maybe I have to introduce some sort of : mapping of client properties to a fixed number of indexable fields. : : ...

Re: Modifying norms...

2007-04-30 Thread Chris Hostetter
: Thanks Hoss. Suppose, I go ahead and modify Similarity.java from ... : Should this work ? it depends on your definition of "work" ... if that code is what you want it to do, then yes: it will do what you want it to do. : P.S. This is a very custom implementation. For the specific probl

Re: Modifying norms...

2007-04-30 Thread escher2k
Thanks Hoss. Suppose, I go ahead and modify Similarity.java from static { for (int i = 0; i < 256; i++) NORM_TABLE[i] = SmallFloat.byte315ToFloat((byte)i); } TO static { for (int i = 0; i < 256; i++) NORM_TABLE[i] = (float) i * 100.0 /256.0; } Should this work ? Thanks

Re: Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

2007-04-30 Thread pbm-rico
I thought about using ulimit, but it does not scale. In the scenario that the app has to support, client applications could create hundreds of thousands of unique properties, which would result in this many indexable fields. Based on previous answers, the way out of this problem while still bein

Re: Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

2007-04-30 Thread Mike Klaas
On 4/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Thanks for you reply. We are still using Lucene v1.4.3 and I'm not sure if upgrading is an option. Is there another way of disabling length normalization/document boosts to get rid of those files? Why not raise the limit of open files

Does someone have compared NFS versus OCFS2 in a Lucene grid installation?

2007-04-30 Thread Marcelo Ochoa
Hi All: Does someone have compared NFS versus OCFS2 in a Lucene grid installation? The Oracle Cluster Filesystem 2 is shipped by default since linux kernel 2.6.16-rc1+ OCFS2 is a cluster optimized file system used by the Oracle RAC configuration (http://oss.oracle.com/projects/ocfs2/). One o

Re: Modifying norms...

2007-04-30 Thread Chris Hostetter
: I want to modify the norms to only include values between 0 and 100. : Currently, I have a custom implementation of the default similarity. Is it : sufficient to override the encodeNorm and decodeNorm methods from the base : implementation in my custom Similarity class ? Please let me know if th

Modifying norms...

2007-04-30 Thread escher2k
I want to modify the norms to only include values between 0 and 100. Currently, I have a custom implementation of the default similarity. Is it sufficient to override the encodeNorm and decodeNorm methods from the base implementation in my custom Similarity class ? Please let me know if there are

Re: Snowball and accents filter...? (solved)

2007-04-30 Thread Andrew Green
El sáb, 28-04-2007 a las 19:43 -0400, Erick Erickson escribió: > You actually wouldn't have to maintain two versions. You could, > instead, inject the accentless (stemmed) terms in your single > index as synonyms (See Lucene In Action). This is easier > to search and maintain > > But it also b

Re: Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

2007-04-30 Thread pbm-rico
Thanks for you reply. We are still using Lucene v1.4.3 and I'm not sure if upgrading is an option. Is there another way of disabling length normalization/document boosts to get rid of those files? Thanks, Rico : >From what I read in the Lucene docs, these .f files store the : normalization fac

Re: Spanquery problem

2007-04-30 Thread Erick Erickson
If you only knew how many times I've looked at code I've written and wondered "What was I thinking" ... Anyway, glad it's working for you Erick On 4/30/07, axel.reymonet <[EMAIL PROTECTED]> wrote: Hello, Thank you for your piece of advice. Indeed, my mistake was to use HashSet instead of an

RE: Spanquery problem

2007-04-30 Thread axel.reymonet
Hello, Thank you for your piece of advice. Indeed, my mistake was to use HashSet instead of an ArrayList (for instance). I must have been really distracted when I wrote my code, even more when I checked it! Anyway, thank you again, Axel Reymonet -Message d'origine- De : Erick Erickson [m

Re: Spanquery problem

2007-04-30 Thread Erick Erickson
The first thing I'd do is not use a HashSet when you collect your SpanTermQuerys since the iteration order is not guaranteed. That is, the order when putting them in is not necessarily the same as when getting them out. So you may be searching for "automatique climatisation" rather then "climatisa

Re: Keyphrase Extraction

2007-04-30 Thread mark harwood
I believe the code Otis is referring to is here: http://issues.apache.org/jira/browse/LUCENE-474 This is index-level analysis but could be adapted to work for just a single document. The implementation is optimised for speed rather than being a thorough examination of phrase significance. Che

Re: term frequency calculation in Lucene

2007-04-30 Thread karl wettin
29 apr 2007 kl. 18.33 skrev saikrishna venkata pendyala: Where does the lucene compute term frequency vector ? {filename,function name} DocumentWriter.java private final void invertDocument(Document doc) Actually the task is to replace the all term frequencies with some constant number(

Re: Search for docs containing only a certain word in a specified field?

2007-04-30 Thread karl wettin
30 apr 2007 kl. 02.05 skrev Kun Hong: I'm not sure if you mean that it should treat all repetative tokens as only one token? Then you are better of using a filter when analyzing text you insert to the index: rather than creating one token for each the in "the the the the the the" you only

Spanquery problem

2007-04-30 Thread axel.reymonet
Hello, I am having some issues with the SpanQuery functionality. As a matter of fact, I index a single french file containing for instance "climatisation automatique" (which means automatic air-conditioning) with the classical FrenchAnalyzer, and when I search in this index with SpanQuery, I have