[jira] Commented: (LUCENE-2341) explore morfologik integration

2010-03-24 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849114#action_12849114 ] Dawid Weiss commented on LUCENE-2341: - Oh, I forgot about this -- yes, you're right

[jira] Commented: (LUCENE-2298) Polish Analyzer

2010-03-23 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848648#action_12848648 ] Dawid Weiss commented on LUCENE-2298: - The dictionary's author states

[jira] Commented: (LUCENE-2341) explore morfologik integration

2010-03-23 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848649#action_12848649 ] Dawid Weiss commented on LUCENE-2341: - Robert, should I wait for Stempel patch first

[jira] Commented: (LUCENE-2298) Polish Analyzer

2010-03-22 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848054#action_12848054 ] Dawid Weiss commented on LUCENE-2298: - Staszek suggested that perhaps it would

[jira] Commented: (LUCENE-2298) Polish Analyzer

2010-03-22 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848139#action_12848139 ] Dawid Weiss commented on LUCENE-2298: - I agree about classpath issues, they're a pain

[jira] Commented: (LUCENE-2298) Polish Analyzer

2010-03-22 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848270#action_12848270 ] Dawid Weiss commented on LUCENE-2298: - The answer from the developer is: pick any

[jira] Resolved: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-23 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-2221. - Resolution: Later I'm done with these benchmarks. The results so far indicate

[jira] Issue Comment Edited: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-23 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804166#action_12804166 ] Dawid Weiss edited comment on LUCENE-2221 at 1/23/10 10:57 PM

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-21 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: (was: benchmark.jar) Micro-benchmarks for ntz and pop (BitUtils) operations

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-21 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: benchmark.jar An updated set of benchmarks (simple loops and JRE ntz/pop). Micro

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-21 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: (was: lucene-bitset-benchmarks.zip) Micro-benchmarks for ntz and pop (BitUtils

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-21 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: lucene-bitset-benchmarks.zip Updated source code for the benchmarks. Micro

[jira] Commented: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-21 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803251#action_12803251 ] Dawid Weiss commented on LUCENE-2221: - Confirmed, with a simple loop it is even faster

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-20 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: benchmarks.txt Benchmark results for array operations and iterators comparing

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-20 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: benchmark.jar Executable Java JAR with benchmarking code for anybody that wishes

[jira] Commented: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-20 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802823#action_12802823 ] Dawid Weiss commented on LUCENE-2221: - I wrote a set of micro-benchmarks comparing

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-20 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: lucene-bitset-benchmarks.zip Benchmarks, source code. Micro-benchmarks for ntz

Intel I7 benchmark request.

2010-01-20 Thread Dawid Weiss
Hi there, Is there anyone with access to an Intel I7-machine? I'd be curious what the results of this benchmark are, given the new JVM intrinsics introduced in HotSpot 1.7: https://issues.apache.org/jira/browse/LUCENE-2221 There is an executable JAR file attached to the issue. Run with (must be

[jira] Commented: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-20 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802900#action_12802900 ] Dawid Weiss commented on LUCENE-2221: - I do have a bunch of dinosaur-age computers

[jira] Commented: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-20 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802927#action_12802927 ] Dawid Weiss commented on LUCENE-2221: - Results from Intel I7 -- an improvement

Re: Intel I7 benchmark request.

2010-01-20 Thread Dawid Weiss
FYI, the AMD Phenom also has the POPCNT instruction. Don't have access to a computer with this one either. Seems like I need to invest in hardware a bit. I have found a person with I7 though -- the results are attached to the JIRA issue, about 20% speedup. Dawid

Re: Intel I7 benchmark request.

2010-01-20 Thread Dawid Weiss
Interested in some Core i5 benchmarks? Sure, add them to the JIRA issue if you can, please. I just ran the benchmark locally on latest JDK 6 and it was slightly better than the I7 results you posted which made me wonder.. Well, like I said -- they may depend on the architecture of the

[jira] Commented: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-18 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801733#action_12801733 ] Dawid Weiss commented on LUCENE-2221: - Yes, this would be my initial suggestion

[jira] Commented: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-18 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801858#action_12801858 ] Dawid Weiss commented on LUCENE-2221: - Look closely at the results above, Yonik. I

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-18 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: (was: results-popntz.txt) Micro-benchmarks for ntz and pop (BitUtils) operations

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-18 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: results-popntz.txt Plain ASCII results. Micro-benchmarks for ntz and pop (BitUtils

[jira] Commented: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-18 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801888#action_12801888 ] Dawid Weiss commented on LUCENE-2221: - I had a suspicion this must be the case. I even

[jira] Commented: (LUCENE-2213) Small improvements to ArrayUtil.getNextSize

2010-01-17 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801371#action_12801371 ] Dawid Weiss commented on LUCENE-2213: - How about if you assert that minTargetSize

[jira] Created: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-17 Thread Dawid Weiss (JIRA)
Components: Other Reporter: Dawid Weiss Priority: Trivial As suggested by Yonik, I performed a suite of micro-benchmarks to investigate the following: * pop() (bitCount) seems to be implemented in the same way (hacker's delight) as in the BitUtils class (SUN's standard library

[jira] Updated: (LUCENE-2221) Micro-benchmarks for ntz and pop (BitUtils) operations.

2010-01-17 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2221: Attachment: results-popntz.txt Performance test results. Micro-benchmarks for ntz and pop

[jira] Created: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
Components: Other Affects Versions: 3.0, 2.9.1, 2.9 Reporter: Dawid Weiss Priority: Minor OpenBitSet uses an internal buffer of long variables to store set bits and an additional 'wlen' index that points to the highest used component inside {...@link #bits} buffer

[jira] Updated: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-2216: Attachment: openbitset.patch OpenBitSet#hashCode() may return false for identical sets

[jira] Commented: (LUCENE-2213) Small improvements to ArrayUtil.getNextSize

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801160#action_12801160 ] Dawid Weiss commented on LUCENE-2213: - Not to be picky, Michael, but is long promotion

[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801195#action_12801195 ] Dawid Weiss commented on LUCENE-2216: - Hi Yonik, This class is not thread-safe anyway

[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801221#action_12801221 ] Dawid Weiss commented on LUCENE-2216: - This is only true if there is happens-before

[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801230#action_12801230 ] Dawid Weiss commented on LUCENE-2216: - This is not entirely what I had in mind (it's

[jira] Issue Comment Edited: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801230#action_12801230 ] Dawid Weiss edited comment on LUCENE-2216 at 1/16/10 5:26 PM

[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801240#action_12801240 ] Dawid Weiss commented on LUCENE-2216: - uff, I started having doubts in my own

[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801263#action_12801263 ] Dawid Weiss commented on LUCENE-2216: - Chances of this happening are really slim

[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801265#action_12801265 ] Dawid Weiss commented on LUCENE-2216: - For what it's worth, I checked the mentioned

[jira] Commented: (LUCENE-2216) OpenBitSet#hashCode() may return false for identical sets.

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801269#action_12801269 ] Dawid Weiss commented on LUCENE-2216: - Ok, argument accepted. OpenBitSet#hashCode

[jira] Commented: (LUCENE-2213) Small improvements to ArrayUtil.getNextSize

2010-01-16 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12801278#action_12801278 ] Dawid Weiss commented on LUCENE-2213: - What Yonik suggested is yet another alternative

[jira] Created: (LUCENE-1622) Multi-word synonym filter (synonym expansion at indexing time).

2009-04-28 Thread Dawid Weiss (JIRA)
Feature Components: contrib/* Reporter: Dawid Weiss Priority: Minor Attachments: synonyms.patch It would be useful to have a filter that provides support for indexing-time synonym expansion, especially for multi-word synonyms (with multi-word matching

[jira] Updated: (LUCENE-1622) Multi-word synonym filter (synonym expansion at indexing time).

2009-04-28 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-1622: Attachment: synonyms.patch Token filter implementing synonyms. Java 1.5 is required to compile

Re: Synonym filter with support for phrases?

2009-04-28 Thread Dawid Weiss
Apologies for the delay, guys. I tried to solve certain issues that didn't pop up in my application (as Kirill said, the problem is indeed quite complex). I didn't find all the answers I had been looking for, but nonetheless -- the patch that works for my needs is in JIRA. I would be really

Re: Synonym filter with support for phrases?

2009-04-23 Thread Dawid Weiss
It'd be great to get multi-word synonyms fully working... I agree -- this is something that seems to be useful for a wider bunch of people. How would you change how Lucene indexes token positions to do this correctly? Kirill has some interesting points to this. I have a busy day today,

Synonym filter with support for phrases?

2009-04-22 Thread Dawid Weiss
Hello everyone, I'm looking for feedback and thoughts on the following problem (it's more of development than user-centered problem, hope the dev list is appropriate): - a token stream is given, - a set of synonyms is given, where synonyms are token sequences to be matched and token

Re: Synonym filter with support for phrases?

2009-04-22 Thread Dawid Weiss
Your synonyms will break if you try searching for phrases. Good point, I did write that filter, but I never actually got to searching for exact phrases in it (there was a very specific scenario and we used prefix queries which worked quite well). Building on your example, food place in

Re: Synonym filter with support for phrases?

2009-04-22 Thread Dawid Weiss
Well, everyone has his own requirements for the search quality. For us it was a problem. The topic is subjective... I don't see this as a deterioration in search quality. Let me explain. Your example concerns phrase queries, so somebody would have to keep adding terms to a phrase. My

Re: Synonym filter with support for phrases?

2009-04-22 Thread Dawid Weiss
engine. So guys looking for MSU CMC really want to get Московский Государственный Университет, факультет ВМиК and his friends. And? How often do they extend this particular phrase with further terms? It must be fun to have an index running concurrently on multi language synonyms, mixing the

Re: Web-based Luke

2007-11-14 Thread Dawid Weiss
I'm putting together a Google Web Toolkit-based version of Luke: http://www.inperspective.com/lucene/Luke.war This is neat, Mark! At first I thought: darn, how the heck is he accessing the filesystem from JavaScript (GWT or otherwise)?! Then it became clear to me that it's actually the

Re: [jira] Updated: (LUCENE-1029) Illegal character replacements in ISOLatin1AccentFilter

2007-10-17 Thread Dawid Weiss
This gets even more complicated when you throw Polish in. We do have diacritics (such as ó, ż, ź or ą) http://www.fileformat.info/info/unicode/char/0105/index.htm but we _also_ have things like ł (l with a stroke): http://www.fileformat.info/info/unicode/char/0142/index.htm I don't think

[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-08-21 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521350 ] Dawid Weiss commented on LUCENE-871: Funny -- I just did the same, but my compiler (Eclipse JDT) generated

[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-08-21 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521353 ] Dawid Weiss commented on LUCENE-871: To clarify: depending on the compiler/ hotspot you may get linear time

[jira] Updated: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-08-21 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-871: --- Attachment: ISOLatin1AccentFilterAlt.java A table-lookup version of ISO latin filter

Re: TREC Collection, NIST and Lucene

2007-08-20 Thread Dawid Weiss
I like it too. And I'm wondering what the response to this will be -- it will in a way show if TREC really stands up to their mission, won't it? D. Grant Ingersoll wrote: How does this sound: Dear , My name is Grant Ingersoll and I am committer on the Lucene Java search library

[jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow

2007-08-20 Thread Dawid Weiss (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521201 ] Dawid Weiss commented on LUCENE-871: Not exactly true, Mike. Switch statements are implemented as table lookups

[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-09-22 Thread Dawid Weiss (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12436972 ] Dawid Weiss commented on LUCENE-675: First -- I think it's a good initiative. Grant, when you're thinking about the infrastructure, it would be pretty neat

Re: Luke - in need of maintainer

2006-06-01 Thread Dawid Weiss
Please contact Dawid Weiss (in CC:), he had a well-advanced port, perhaps it just needs a little polishing (Polish-ing? :) . Yes, this project is in fact still on my list... I do have a partial implementation of Thinlet API that emulates it in Swing. With a JGoodies look and feel