Re: Term Collection Frequency?

2004-08-04 Thread Julien Nioche
if you need the number of docs in which a given term appears you can use the method docfreq http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term) otherwise you can use http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/ind

new feature not commited :-(

2004-08-02 Thread Julien Nioche
I submitted a patch a few days ago (see http://issues.apache.org/bugzilla/show_bug.cgi?id=30232). Has someone tested this feature? Is there a reason not to commit it? Julien - To unsubscribe, e-mail: [EMAIL PROTECTED] For additi

Re: release & migration plan

2004-07-20 Thread Julien Nioche
DocumentWriter is typically created with the ramDirectory field of IndexWriter and not the actual directory field. So getDirectory() should return this ramDirectory in order to work, which is not very intuitive (one could expect it to return the real directory). One could change the visibility of r

Re: release & migration plan

2004-07-13 Thread Julien Nioche
Of course I'd be pleased to make a draft of the javadoc and a patch file. I'll try to do it, but I can't promise to deliver it soon Julien - Original Message - From: "Doug Cutting" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Monday, July 12, 2004 7:34 PM

Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL

2004-07-01 Thread Julien Nioche
The xls files did not pass. You can download them from the following URLs : http://jnioche.freesurf.fr/shortQueries.xls http://jnioche.freesurf.fr/longQueries.xls - Original Message - From: "Julien Nioche" <[EMAIL PROTECTED]> To: "Lucene Developers List"

Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL

2004-07-01 Thread Julien Nioche
;~3^3.0)) ((descr:"burgundy wines"~3^4.0 descr:"burgundy wine"~3^4.0)) ((kw:"burgundy wines"~3^4.0 kw:"burgundy wine"~3^4.0)) - Original Message - From: "Julien Nioche" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAI

Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL

2004-07-01 Thread Julien Nioche
s the best way to set up this value in IndexWriter? Maybe we could limit to a few possible values like : DEFAULT = 128 AVERAGE = 64 HIGH = 32 in order to avoid too low settings. Any comments or suggestions? Can anyone give feedback on this? Julien - Original Message - From: "Julien

Re: Help with scoring, coordination factor?

2004-04-30 Thread Julien Nioche
[I move this discussion to the dev list] > Then use this in place of BooleanQuery when you don't want coordination > scoring. I think that should do the trick. In my case it works perfectly. As we generate multilingual and semantic expansions of the original words of a query, the coordination fa

coord factor and MultiTermQuery

2004-02-02 Thread Julien Nioche
Just a question : Classes implementing MultiTermQuery (i-e WildcardQuery and FuzzyQuery) are changed into BooleanQueries by the 'rewrite()' method before a Search. The default coord() method of Similarity implies that the score of this BQ is multiplied by the (ratio number of Terms found / number o

LIMO new release (v0.3)

2004-01-22 Thread Julien Nioche
There's a new release of limo available ! This new version : - includes lucene-1.3-final.jar - fixes a bug with index loading - detects when index changes and auto refreshes the information (as proposed by Jakob Flierl) - uses css for easier customisation (as proposed by E Hatcher) - escapes HTML

Re: suggestion for a CustomDirectory

2003-12-05 Thread Julien Nioche
> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Thursday, December 04, 2003 7:28 PM Subject: Re: suggestion for a CustomDirectory > Julien Nioche wrote: > > However in most cases the > > application would be faster because : > > - tree access to the Te

suggestion for a CustomDirectory

2003-12-04 Thread Julien Nioche
Here is a use case : - my Lucene application is running under W2K - I have (just) a gigabyte RAM - my index is quite big, let's say 1.7 Gb (with a .tis of 31Mb an a .tii of 479 Kb) Using RAMDirectory is impossible, FSDirectory works but is quite slow. Could it be possible to create a custom Direc

Re: suggestion for a CustomDirectory

2003-12-04 Thread Julien Nioche
IL PROTECTED]> Sent: Thursday, December 04, 2003 3:56 PM Subject: Re: suggestion for a CustomDirectory > On Thursday, December 4, 2003, at 09:45 AM, Julien Nioche wrote: > > Here is a use case : > > - my Lucene application is running under W2K > > - I have (just) a gigab

Re: Directory implementation using NIO (moved from Lucene User List)

2003-07-09 Thread Julien Nioche
ith less RAM available FSDirectory should be faster? BTW modifications are to be made in the org.apache.lucene.store.InputStream class, not in Directory. Has anybody else tried it? Do you find similar results? What does it bring on a bigger index? Cheers --- J

Re: [FAQ] Finding number of occurrences of a given word in a document

2003-07-08 Thread Julien Nioche
ry much. > > -- > > - Original Message - > > DATE: Mon, 7 Jul 2003 09:32:22 > From: "Julien Nioche" <[EMAIL PROTECTED]> > To: "Lucene Developers List" <[EMAIL PROTECTED]>,<[EMAIL PROTECTED]> > Cc: > > >No, TermD

Re: [FAQ] Finding number of occurrences of a given word in a document

2003-07-07 Thread Julien Nioche
No, TermDocs operates only on Terms, not on PhraseQueries. A PhraseQuery is a query and is not stored in an index. - Original Message - From: "none none" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Monday, July 07, 2003 7:00 AM Subject: Re: [FAQ] Finding

Re: Document scoring - should I file a bug report?

2003-06-03 Thread Julien Nioche
I observed a change in the ordering of the results since I moved from 1.2 to the 1.3 RC1 version of the API ( with the extensible Scoring API) Maybe it's related? - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Monday

Re: computing size() in frequently used methods

2002-11-26 Thread Julien Nioche
d for something else > originally), but that will run a few things of this nature and print > out the timings. Not a good way to benchmark, but I think it gives an > idea. > > Otis > > > --- Julien Nioche <[EMAIL PROTECTED]> wrote: > > Hello, > > > >

computing size() in frequently used methods

2002-11-26 Thread Julien Nioche
during the execution of these methods, this kind of change must be pretty harmless... Any opinion on that? Could it have a side effect? Julien Nioche www.lingway.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Re: Possible features for next release

2002-05-23 Thread Julien Nioche
What about the Support for Search Term Highlighting? (see Maik Schreiber's paper) It seems to have vanished from the list of features? - Original Message - From: "Peter Carlson" <[EMAIL PROTECTED]> To: "Lucene Developers List" <[EMAIL PROTECTED]> Sent: Thursday, May 23, 2002 8:29 AM Subj

Re: Call for features in next release

2002-05-22 Thread Julien Nioche
Another feature could be the ability to retrieve the number of occurences not only for a term but also for a Phrase (see http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00101.html) - Original Message - From: <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, May 20, 2

Re: Adding a TermExpansionQuery

2002-05-15 Thread Julien Nioche
Hi folks, Just a little advertising message for those who are interested in semantic expansions : http://kant.lingway.com/DemoUN is a demo of a multilingual IR system based on Lucene Please take a look at it - feedback is welcome! Julien - Original Message - From: "Peter Carlson" <

Re: number of term match in a search ...

2002-05-06 Thread Julien Nioche
Hi, We had a discussion on this topic a few month ago. http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00091.html In particular Doug's answer : http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg00101.html Retrieving this information requires a few changes in the code (

Re : How does Lucene handle phrases containing words that are not indexed?

2002-02-13 Thread Julien Nioche
By the way, I was wondering if there is any Analyzer that uses the following constructor public Token(String text, int start, int end, String typ) ? Maybe it could be interesting to build an analyzer that recognizes punctuation marks and keeps it in the index as Tokens with a given Type (say fo

Re: Industry Use of Lucene?

2002-02-11 Thread Julien Nioche
Any questions or comments are welcome. You can send them to [EMAIL PROTECTED] Please take a look at our site (www.lingway.com) for more information about our activities. Thank you Julien Nioche / www.lingway.com

Re: Add ability to get list of terms from a query ++

2002-02-08 Thread Julien Nioche
Hi all, I also implemented an Highlight functionality based on Maik Schreiber's code and modified the Lucene source. I agree with Peter that it will be great to have this changes done in the Lucene core code. The only difference in my functionality is that query.getTerms() returns both TermQuery

Re: Getting word count

2001-10-19 Thread julien . nioche
e modifying the score() methods of the different Scorers? Or > >is this information already computed somewhere else? > > > >Thanks a lot for your help > > > >Julien Nioche > >