Re: patch - DEFAULT_ vars in IndexWriter non-final and DEFAULT for useCompoundFile

2005-02-25 Thread Andrzej Bialecki
Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram do

ANN: Luke 0.6 - Lucene Index Toolbox

2005-02-20 Thread Andrzej Bialecki
this release, please keep nagging... ;-) -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram do

Re: URL to compare 2 Similarity's ready-- Re: Scoring benchmark evaluation. Was RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

2005-01-31 Thread Andrzej Bialecki
Folks, In the light of this discussion, I'm working slowly on a new release of Luke, which will include a BeanShell-driven Similarity designer. However, this particular module is not finished yet... given my current workload, this will take a week or two more... -- Best regards, An

Re: cvs commit: jakarta-lucene-sandbox/contributions/lucli build.xml

2005-01-24 Thread Andrzej Bialecki
ne support. Just stumbled upon this project on freshemat: http://jline.sourceforge.net/ It's BSD-licensed, and seems to provide a feature (if not API) replacement for readline. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|

Re: [PROPOSAL] Lucene to search.apache.org

2005-01-17 Thread Andrzej Bialecki
Scott Ganyo wrote: Not especially creative, but "index.apache.org" looks to be available. S On Jan 17, 2005, at 3:29 AM, Erik Hatcher wrote: Looks like we should consider alternate names. Suggestions?? ir.apache.org (not Infra-Red, but Information Retrieval) -- Best regards, Andrze

Re: ANN: Lucene benchmark tool

2004-12-22 Thread Andrzej Bialecki
CVS HEAD. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigr

Re: ANN: Lucene benchmark tool

2004-12-18 Thread Andrzej Bialecki
Otis Gospodnetic wrote: Hi Andrzej, Can we slap ASL 2.0 on top of this and put it in the Sandbox? Yes, I'd appreciate it. This is just the very first version, which certainly could use some improvements... -- Best regards, Andrzej Bia

Re: DefaultSimilarity 2.0?

2004-12-17 Thread Andrzej Bialecki
benchmark code: http://www.getopt.org/lb/LuceneBenchmark.java This collection has the benefit that it's relatively easy to judge the relative relevance scores, because the nature and structure of the corpus is well understood.

Re: potential new Lucene logo

2004-12-13 Thread Andrzej Bialecki
So perhaps this is a good opportunity to bring it into the new era... : __ ___ / __ / _ / / / ___/ _ \_ __ \ _ \ _ /___/ /_/ // /__ / __/ / / / __/ /_/\__,_/ \___/ \___//_/ /_/\___/ (courtesy of figlet(6) ) Just my 0.02 P

Re: two versioning problems with Lucene

2004-12-08 Thread Andrzej Bialecki
ulting jar - not only to protect my proprietary code, but also to reduce the size of the deployment package - both for standard installations and for WebStart. -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/IS

ANN: Lucene benchmark tool

2004-12-06 Thread Andrzej Bialecki
k? You can cut down the number of input parameters to reduce the overall time, or use the mini* document collection (but this reduces the number of documents in index). See the comments in source. Comments and patches are welcome! -- Best regar

Re: [PATCH]multiple wildcards ? at the end of search pattern return incorrect hits

2004-11-10 Thread Andrzej Bialecki
redefines the usual meaning of '?' wildcard, which means "exactly one or zero characters" - and that is the way it's working now. I'm not sure if this change is good, it is certainly surprising... What the original poster wanted is commonly known as '.' wil

Re: lucene.net no more?

2004-09-15 Thread Andrzej Bialecki
-- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://

Re: Binary fields and data compression

2004-08-30 Thread Andrzej Bialecki
ed fields, they are already "compressed" in a highly-optimized way, so adding another level of compression to this part wouldn't make much sense IMHO. [...] ... thus my request that any compression support be optional. Absolutely. :-) -- Best

ANN: Stempel - An algorithmic stemmer for Polish language

2004-07-15 Thread Andrzej Bialecki
corpus of contemporary Polish. Please visit the following page for more details: http://www.getopt.org/stempel/index.html Distribution package contains classes for stemming, benchmarking, and for integration with Lucene (Analyzer and TokenFilter). -- Best regards, Andrzej Bialecki

ANN: Luke v. 0.5 released

2004-06-22 Thread Andrzej Bialecki
tter reflect the current functionality of the tool. Any feedback, patches for enhancements or bufixes are welcome! If you want to provide a patch, please use "diff -bdruN" - this will help me to integrate it. Thank you! -- Best regards, Andrzej Bialecki

Re: suggestions for a student project

2004-05-27 Thread Andrzej Bialecki
n an index-wide compression system, akin to a zip file. That would be useful, indeed. Another related useful addition would be to implement specific API for handling numeric fields (searching for values, ranges, and comparator operators). -- Best regards, Andrze

Re: looking for a large test corpus for a lucene presentation

2004-04-07 Thread Andrzej Bialecki
long as you use SAX... (unless, of course, you run it on Cray or something.. :-) ) -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator

Re: N-gram layer

2004-03-13 Thread Andrzej Bialecki
languages like the Slavic family. However, you need to always know the language of the document in advance - my belief is that it's impossible to build a "universal stemmer good for any language". -- Best regards, Andrzej Bialecki - Soft

Re: AW: N-gram layer and language guessing

2004-02-03 Thread Andrzej Bialecki
he guesser works with nearly perfect accuracy for texts longer than 10 words. Below that - it depends.. :-) -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF pro

Re: N-gram layer

2004-02-03 Thread Andrzej Bialecki
karl wettin wrote: On Tue, 03 Feb 2004 09:27:25 +0100 Andrzej Bialecki <[EMAIL PROTECTED]> wrote: If I run the above example, I get the following: "jag heter kalle" - SV: 0.7197875 What is index 1.0 ? 1.0 - completely dissimilar language profiles 0.0 - completely similar l

Re: N-gram layer

2004-02-03 Thread Andrzej Bialecki
"vad heter du" (what's your name) the detection fails... :-) A question: what was your source for the representative hi-frequency words in various languages? Was it your training corpus or some publication? -- Best regards, Andrzej Bialecki

ANN: Luke 0.45 released

2004-01-17 Thread Andrzej Bialecki
view when pressing Search. * Fix the JNLP file to require J2SE 1.3+. * By popular demand, add a single self-contained JAR to the binary distribution. * Minor restructuring to increase reuse. Screenshots have been updated, too. Enjoy! -- Best regards, Andrze

ANN: Luke 0.4 released

2004-01-11 Thread Andrzej Bialecki
bug could result in mysterious "No Results" on the search page. Spotted by Erik Hatcher. Thank you for your comments and contributions! -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop,

Re: subclassing of IndexReader

2003-11-19 Thread Andrzej Bialecki
well, has been heavily criticized for weak theoretical foundations. See the archives of Nilsimsa mailing list for details. I have yet to find an open source alternative to it, though ... -- Best regards, Andrzej Bialecki - Software Archite

Re: Adding lock timeouts to write.lock

2003-08-14 Thread Andrzej Bialecki
delete()) are cached, and don't report immediately that they will ultimately fail... Any ideas here? -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop,

Luke v 0.2 - Lucene Index Browser

2003-08-11 Thread Andrzej Bialecki
rsion. * Add Read-Only mode. * Fix spinbox bug (really a bug in the Thinlet toolkit - fixed there). * Allow to browse hidden directories. * Add a combobox to choose the default field for searching. * Other minor code cleanups. Thanks to all who provided their comments and suggestions! -- Best regards, An

Re: Ok to add method IndexWriter.addDocument( Analyzer, Document) ?

2003-06-26 Thread Andrzej Bialecki
zer for queries. -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD

Re: FSDirectory patch for file renaming

2003-02-17 Thread Andrzej Bialecki
ld of course mean a gross violation of File.delete() contract, but JVM is just a program and it may contain bugs... Or maybe it's Windows that contains bugs, I don't remember... ;-) Does it behave the same way in JDK 1.3.x as in JDK 1.4.x? -- Best regards, A