Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram do
this release,
please keep nagging... ;-)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram do
Folks,
In the light of this discussion, I'm working slowly on a new release of
Luke, which will include a BeanShell-driven Similarity designer.
However, this particular module is not finished yet... given my current
workload, this will take a week or two more...
--
Best regards,
An
ne support.
Just stumbled upon this project on freshemat:
http://jline.sourceforge.net/
It's BSD-licensed, and seems to provide a feature (if not API)
replacement for readline.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|
Scott Ganyo wrote:
Not especially creative, but "index.apache.org" looks to be available.
S
On Jan 17, 2005, at 3:29 AM, Erik Hatcher wrote:
Looks like we should consider alternate names. Suggestions??
ir.apache.org
(not Infra-Red, but Information Retrieval)
--
Best regards,
Andrze
CVS HEAD.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigr
Otis Gospodnetic wrote:
Hi Andrzej,
Can we slap ASL 2.0 on top of this and put it in the Sandbox?
Yes, I'd appreciate it.
This is just the very first version, which certainly could use some
improvements...
--
Best regards,
Andrzej Bia
benchmark code:
http://www.getopt.org/lb/LuceneBenchmark.java
This collection has the benefit that it's relatively easy to judge the
relative relevance scores, because the nature and structure of the
corpus is well understood.
So perhaps this is a good opportunity to bring it into
the new era... :
__
___ /
__ / _ / / / ___/ _ \_ __ \ _ \
_ /___/ /_/ // /__ / __/ / / / __/
/_/\__,_/ \___/ \___//_/ /_/\___/ (courtesy of figlet(6) )
Just my 0.02 P
ulting jar - not only to protect my proprietary code, but also to
reduce the size of the deployment package - both for standard
installations and for WebStart.
--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/IS
k?
You can cut down the number of input parameters to reduce the overall
time, or use the mini* document collection (but this reduces the number
of documents in index). See the comments in source.
Comments and patches are welcome!
--
Best regar
redefines the usual meaning of '?'
wildcard, which means "exactly one or zero characters" - and that is the
way it's working now. I'm not sure if this change is good, it is
certainly surprising...
What the original poster wanted is commonly known as '.' wil
--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://
ed
fields, they are already "compressed" in a highly-optimized way, so
adding another level of compression to this part wouldn't make much
sense IMHO.
[...]
... thus my request that any compression support be optional.
Absolutely. :-)
--
Best
corpus of contemporary Polish.
Please visit the following page for more details:
http://www.getopt.org/stempel/index.html
Distribution package contains classes for stemming, benchmarking, and
for integration with Lucene (Analyzer and TokenFilter).
--
Best regards,
Andrzej Bialecki
tter reflect the current functionality of the tool.
Any feedback, patches for enhancements or bufixes are welcome! If you
want to provide a patch, please use "diff -bdruN" - this will help me to
integrate it. Thank you!
--
Best regards,
Andrzej Bialecki
n an index-wide
compression system, akin to a zip file.
That would be useful, indeed. Another related useful addition would be
to implement specific API for handling numeric fields (searching for
values, ranges, and comparator operators).
--
Best regards,
Andrze
long as you use SAX... (unless, of course, you
run it on Cray or something.. :-) )
--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
languages like the Slavic family.
However, you need to always know the language of the document in advance
- my belief is that it's impossible to build a "universal stemmer good
for any language".
--
Best regards,
Andrzej Bialecki
-
Soft
he guesser works with nearly perfect accuracy for texts
longer than 10 words. Below that - it depends.. :-)
--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF pro
karl wettin wrote:
On Tue, 03 Feb 2004 09:27:25 +0100
Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
If I run the above example, I get the following:
"jag heter kalle"
- SV: 0.7197875
What is index 1.0 ?
1.0 - completely dissimilar language profiles
0.0 - completely similar l
"vad heter du" (what's your name) the detection
fails... :-)
A question: what was your source for the representative hi-frequency
words in various languages? Was it your training corpus or some publication?
--
Best regards,
Andrzej Bialecki
view when pressing Search.
* Fix the JNLP file to require J2SE 1.3+.
* By popular demand, add a single self-contained JAR to the binary
distribution.
* Minor restructuring to increase reuse.
Screenshots have been updated, too. Enjoy!
--
Best regards,
Andrze
bug could result in
mysterious "No Results" on the search page. Spotted by Erik Hatcher.
Thank you for your comments and contributions!
--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop,
well,
has been heavily criticized for weak theoretical foundations. See the
archives of Nilsimsa mailing list for details.
I have yet to find an open source alternative to it, though ...
--
Best regards,
Andrzej Bialecki
-
Software Archite
delete()) are cached, and don't report immediately that they
will ultimately fail... Any ideas here?
--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop,
rsion.
* Add Read-Only mode.
* Fix spinbox bug (really a bug in the Thinlet toolkit - fixed there).
* Allow to browse hidden directories.
* Add a combobox to choose the default field for searching.
* Other minor code cleanups.
Thanks to all who provided their comments and suggestions!
--
Best regards,
An
zer for queries.
--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD
ld of course mean a gross violation of File.delete() contract,
but JVM is just a program and it may contain bugs... Or maybe it's
Windows that contains bugs, I don't remember... ;-) Does it behave the
same way in JDK 1.3.x as in JDK 1.4.x?
--
Best regards,
A
29 matches
Mail list logo