[
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849114#action_12849114
]
Dawid Weiss commented on LUCENE-2341:
-
Oh, I forgot about this -- yes, you're right
[
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848648#action_12848648
]
Dawid Weiss commented on LUCENE-2298:
-
The dictionary's author states
[
https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848649#action_12848649
]
Dawid Weiss commented on LUCENE-2341:
-
Robert, should I wait for Stempel patch first
[
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848054#action_12848054
]
Dawid Weiss commented on LUCENE-2298:
-
Staszek suggested that perhaps it would
[
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848139#action_12848139
]
Dawid Weiss commented on LUCENE-2298:
-
I agree about classpath issues, they're a pain
[
https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848270#action_12848270
]
Dawid Weiss commented on LUCENE-2298:
-
The answer from the developer is: pick any
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss resolved LUCENE-2221.
-
Resolution: Later
I'm done with these benchmarks. The results so far indicate
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804166#action_12804166
]
Dawid Weiss edited comment on LUCENE-2221 at 1/23/10 10:57 PM
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: (was: benchmark.jar)
Micro-benchmarks for ntz and pop (BitUtils) operations
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: benchmark.jar
An updated set of benchmarks (simple loops and JRE ntz/pop).
Micro
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: (was: lucene-bitset-benchmarks.zip)
Micro-benchmarks for ntz and pop (BitUtils
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: lucene-bitset-benchmarks.zip
Updated source code for the benchmarks.
Micro
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803251#action_12803251
]
Dawid Weiss commented on LUCENE-2221:
-
Confirmed, with a simple loop it is even faster
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: benchmarks.txt
Benchmark results for array operations and iterators comparing
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: benchmark.jar
Executable Java JAR with benchmarking code for anybody that wishes
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802823#action_12802823
]
Dawid Weiss commented on LUCENE-2221:
-
I wrote a set of micro-benchmarks comparing
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: lucene-bitset-benchmarks.zip
Benchmarks, source code.
Micro-benchmarks for ntz
Hi there,
Is there anyone with access to an Intel I7-machine? I'd be curious
what the results of this benchmark are, given the new JVM intrinsics
introduced in HotSpot 1.7:
https://issues.apache.org/jira/browse/LUCENE-2221
There is an executable JAR file attached to the issue. Run with (must
be
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802900#action_12802900
]
Dawid Weiss commented on LUCENE-2221:
-
I do have a bunch of dinosaur-age computers
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802927#action_12802927
]
Dawid Weiss commented on LUCENE-2221:
-
Results from Intel I7 -- an improvement
FYI, the AMD Phenom also has the POPCNT instruction.
Don't have access to a computer with this one either. Seems like I
need to invest in hardware a bit.
I have found a person with I7 though -- the results are attached to
the JIRA issue, about 20% speedup.
Dawid
Interested in some Core i5 benchmarks?
Sure, add them to the JIRA issue if you can, please.
I just ran the benchmark locally on latest JDK 6 and it was slightly better
than the I7 results you posted which made me wonder..
Well, like I said -- they may depend on the architecture of the
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801733#action_12801733
]
Dawid Weiss commented on LUCENE-2221:
-
Yes, this would be my initial suggestion
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801858#action_12801858
]
Dawid Weiss commented on LUCENE-2221:
-
Look closely at the results above, Yonik. I
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: (was: results-popntz.txt)
Micro-benchmarks for ntz and pop (BitUtils) operations
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: results-popntz.txt
Plain ASCII results.
Micro-benchmarks for ntz and pop (BitUtils
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801888#action_12801888
]
Dawid Weiss commented on LUCENE-2221:
-
I had a suspicion this must be the case. I even
[
https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801371#action_12801371
]
Dawid Weiss commented on LUCENE-2213:
-
How about if you assert that minTargetSize
Components: Other
Reporter: Dawid Weiss
Priority: Trivial
As suggested by Yonik, I performed a suite of micro-benchmarks to investigate
the following:
* pop() (bitCount) seems to be implemented in the same way (hacker's delight)
as in the BitUtils class (SUN's standard library
[
https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2221:
Attachment: results-popntz.txt
Performance test results.
Micro-benchmarks for ntz and pop
Components: Other
Affects Versions: 3.0, 2.9.1, 2.9
Reporter: Dawid Weiss
Priority: Minor
OpenBitSet uses an internal buffer of long variables to store set bits and an
additional 'wlen' index that points
to the highest used component inside {...@link #bits} buffer
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-2216:
Attachment: openbitset.patch
OpenBitSet#hashCode() may return false for identical sets
[
https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801160#action_12801160
]
Dawid Weiss commented on LUCENE-2213:
-
Not to be picky, Michael, but is long promotion
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801195#action_12801195
]
Dawid Weiss commented on LUCENE-2216:
-
Hi Yonik,
This class is not thread-safe anyway
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801221#action_12801221
]
Dawid Weiss commented on LUCENE-2216:
-
This is only true if there is happens-before
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801230#action_12801230
]
Dawid Weiss commented on LUCENE-2216:
-
This is not entirely what I had in mind (it's
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801230#action_12801230
]
Dawid Weiss edited comment on LUCENE-2216 at 1/16/10 5:26 PM
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801240#action_12801240
]
Dawid Weiss commented on LUCENE-2216:
-
uff, I started having doubts in my own
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801263#action_12801263
]
Dawid Weiss commented on LUCENE-2216:
-
Chances of this happening are really slim
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801265#action_12801265
]
Dawid Weiss commented on LUCENE-2216:
-
For what it's worth, I checked the mentioned
[
https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801269#action_12801269
]
Dawid Weiss commented on LUCENE-2216:
-
Ok, argument accepted.
OpenBitSet#hashCode
[
https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801278#action_12801278
]
Dawid Weiss commented on LUCENE-2213:
-
What Yonik suggested is yet another alternative
Feature
Components: contrib/*
Reporter: Dawid Weiss
Priority: Minor
Attachments: synonyms.patch
It would be useful to have a filter that provides support for indexing-time
synonym expansion, especially for multi-word synonyms (with multi-word matching
[
https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-1622:
Attachment: synonyms.patch
Token filter implementing synonyms. Java 1.5 is required to compile
Apologies for the delay, guys. I tried to solve certain issues that didn't pop
up in my application (as Kirill said, the problem is indeed quite complex). I
didn't find all the answers I had been looking for, but nonetheless -- the patch
that works for my needs is in JIRA. I would be really
It'd be great to get multi-word synonyms fully working...
I agree -- this is something that seems to be useful for a wider bunch of
people.
How would you change how Lucene indexes token positions to do this correctly?
Kirill has some interesting points to this. I have a busy day today,
Hello everyone,
I'm looking for feedback and thoughts on the following problem (it's more of
development than user-centered problem, hope the dev list is appropriate):
- a token stream is given,
- a set of synonyms is given, where synonyms are token sequences to be matched
and token
Your synonyms will break if you try searching for phrases.
Good point, I did write that filter, but I never actually got to searching for
exact phrases in it (there was a very specific scenario and we used prefix
queries which worked quite well).
Building on your example, food place in
Well, everyone has his own requirements for the search quality. For us
it was a problem.
The topic is subjective... I don't see this as a deterioration in search
quality. Let me explain.
Your example concerns phrase queries, so somebody would have to keep adding
terms to a phrase. My
engine. So guys looking for MSU CMC really want to get Московский
Государственный Университет, факультет ВМиК and his friends.
And? How often do they extend this particular phrase with further terms? It must
be fun to have an index running concurrently on multi language synonyms, mixing
the
I'm putting together a Google Web Toolkit-based version of Luke:
http://www.inperspective.com/lucene/Luke.war
This is neat, Mark!
At first I thought: darn, how the heck is he accessing the filesystem from
JavaScript (GWT or otherwise)?! Then it became clear to me that it's actually
the
This gets even more complicated when you throw Polish in. We do have diacritics
(such as ó, ż, ź or ą)
http://www.fileformat.info/info/unicode/char/0105/index.htm
but we _also_ have things like ł (l with a stroke):
http://www.fileformat.info/info/unicode/char/0142/index.htm
I don't think
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521350
]
Dawid Weiss commented on LUCENE-871:
Funny -- I just did the same, but my compiler (Eclipse JDT) generated
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521353
]
Dawid Weiss commented on LUCENE-871:
To clarify: depending on the compiler/ hotspot you may get linear time
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-871:
---
Attachment: ISOLatin1AccentFilterAlt.java
A table-lookup version of ISO latin filter
I like it too. And I'm wondering what the response to this will be -- it will
in a way show if TREC really stands up to their mission, won't it?
D.
Grant Ingersoll wrote:
How does this sound:
Dear ,
My name is Grant Ingersoll and I am committer on the Lucene Java search
library
[
https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521201
]
Dawid Weiss commented on LUCENE-871:
Not exactly true, Mike. Switch statements are implemented as table lookups
[
http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436972 ]
Dawid Weiss commented on LUCENE-675:
First -- I think it's a good initiative. Grant, when you're thinking about the
infrastructure, it would be pretty neat
Please contact Dawid Weiss (in CC:), he had a well-advanced port,
perhaps it just needs a little polishing (Polish-ing? :) .
Yes, this project is in fact still on my list... I do have a partial
implementation of Thinlet API that emulates it in Swing. With a JGoodies
look and feel
59 matches
Mail list logo