[jira] Commented: (LUCENE-504) FuzzyQuery produces a java.lang.NegativeArraySizeException in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2009-11-03 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773402#action_12773402 ] Nadav Har'El commented on LUCENE-504: - Hi Uwe, I think that even though PriorityQueue

[jira] Commented: (LUCENE-1899) Inefficient growth of OpenBitSet

2009-09-09 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752931#action_12752931 ] Nadav Har'El commented on LUCENE-1899: -- Hi Shai, I guess you're right that if there's

[jira] Commented: (LUCENE-1899) Inefficient growth of OpenBitSet

2009-09-09 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753019#action_12753019 ] Nadav Har'El commented on LUCENE-1899: -- Yes, you're right, 12.5%. Or actually, 11

[jira] Created: (LUCENE-1899) Inefficient growth of OpenBitSet

2009-09-08 Thread Nadav Har'El (JIRA)
Reporter: Nadav Har'El Priority: Minor Hi, I found a potentially serious efficiency problem with OpenBitSet. One typical (I think) way to build a bit set is to set() the bits one by one - e.g., have a HitCollector set() the bit for each matching document. The underlying array

Re: Modularization

2009-04-01 Thread Nadav Har'El
: this wouldn't proclude us from offering a bloated jar containing everything under the sun) Again, I wholeheartedly agree. -- Nadav Har'El| Wednesday, Apr 1 2009, 7 Nisan 5769 IBM Haifa Research Lab

Re: Is TopDocCollector's collect() implementation correct?

2009-03-22 Thread Nadav Har'El
be final. -- Nadav Har'El|Sunday, Mar 22 2009, 26 Adar 5769 IBM Haifa Research Lab |- |Did you sleep well? No, I made a http://nadav.harel.org.il |couple of mistakes

Re: IndexWriter.rollback() logic

2009-03-18 Thread Nadav Har'El
() followed by a new open(), but a person reading this javadoc wouldn't know that. -- Nadav Har'El| Wednesday, Mar 18 2009, 22 Adar 5769 IBM Haifa Research Lab |- |Hi! I'm a signature

Re: 2.9/3.0 plan Java 1.5

2008-12-14 Thread Nadav Har'El
not be backward compatible (although, for 3.0 we may decide that this is not absolutely necessary). -- Nadav Har'El| Sunday, Dec 14 2008, 18 Kislev 5769 IBM Haifa Research Lab

[jira] Commented: (LUCENE-1470) Add TrieRangeQuery to contrib

2008-12-01 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652002#action_12652002 ] Nadav Har'El commented on LUCENE-1470: -- Hi, I just wanted to comment that a similar

Re: [jira] Created: (LUCENE-1439) Inconsistent API

2008-11-11 Thread Nadav Har'El
this API is not at all inconsistent - maybe it is just a bit redundant and a bit confusing or not documented well enough (although I don't think the latter is true). Nadav. -- Nadav Har'El|Tuesday, Nov 11 2008, 14 Heshvan 5769 IBM Haifa Research Lab

Re: Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Nadav Har'El
behavior as an option. Anyway, this is just my opinion (not backed by any hard research or experimentation), so it might be wrong. -- Nadav Har'El| Monday, Oct 13 2008, 14 Tishri 5769 IBM Haifa Research Lab

Re: draft 2.4 announcement

2008-10-05 Thread Nadav Har'El
, Nadav. -- Nadav Har'El| Sunday, Oct 5 2008, 6 Tishri 5769 IBM Haifa Research Lab |- |Anyone who quotes me in their sig is an http://nadav.harel.org.il |idiot. -- Rusty

Re: draft 2.4 announcement

2008-10-05 Thread Nadav Har'El
. Searching with a Filter is now more efficient: now the filter is applied to a document before scoring is done. Thanks, it's better I think. Maybe it even deserves its own bullet - I don't think there's too much connection between the two improvements? Thanks, Nadav. -- Nadav Har'El

[jira] Commented: (LUCENE-1382) Allow storing user data when IndexWriter.commit() is called

2008-09-12 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12630561#action_12630561 ] Nadav Har'El commented on LUCENE-1382: -- Hi Mike, If you add this feature, I suggest

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread Nadav Har'El
want to develop apps with a small foot print. I agree that this is an important goal. At one point there was even talk of refactoring additional code out of the core and into a contrib (this was already done with some analyzers when Lucene became a TLP) -- Nadav Har'El

Re: Extending TopDocCollector

2008-08-13 Thread Nadav Har'El
, database data, or whatever). Does anyone disagree? Is there a reason why this change should not be done? -- Nadav Har'El| Wednesday, Aug 13 2008, 12 Av 5768 IBM Haifa Research Lab

The 2GB segment size limit

2008-06-25 Thread Nadav Har'El
are still quite common, so I think this is a problem we should solve properly. Thanks, Nadav. -- Nadav Har'El|Wednesday, Jun 25 2008, 22 Sivan 5768 [EMAIL PROTECTED] |- Phone +972-523-790466, ICQ 13349191 |Committee: A group

[jira] Commented: (LUCENE-1314) IndexReader.reopen(boolean force)

2008-06-24 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12607857#action_12607857 ] Nadav Har'El commented on LUCENE-1314: -- At first glance, my opinion was that adding

Re: WebLuke - include Jetty in Lucene binary distribution?

2008-04-27 Thread Nadav Har'El
but not of J2SE, so you need to include this as well if you want to use the servlet API). And that's it. I'm sure that similar tiny Web Servers can also be found on the Web, but if there's interest, I can see about publishing mine. -- Nadav Har'El| Sunday, Apr 27

[jira] Commented: (LUCENE-954) Toggle score normalization in Hits

2008-03-16 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579258#action_12579258 ] Nadav Har'El commented on LUCENE-954: - I hate to rain on the parade, but maybe instead

Re: Unique doc ids

2008-01-23 Thread Nadav Har'El
support the necessary calls (either a deleteDocuments(Query) or a deleteDocuments(int docid) call), but I don't see why this can't be fixed without adding new concepts (like UID) to the index. Or maybe I'm missing something? -- Nadav Har'El| Wednesday, Jan 23 2008, 17

[jira] Commented: (LUCENE-997) Add search timeout support to Lucene

2007-12-15 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552174 ] Nadav Har'El commented on LUCENE-997: - I'd like to add my 2 cents on this issue. The more I use Lucene

[jira] Commented: (LUCENE-1088) PriorityQueue 'wouldBeInserted' method

2007-12-12 Thread Nadav Har'El (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550964 ] Nadav Har'El commented on LUCENE-1088: -- Michael, I agree - the most important fix was to make heap protected

Re: search quality - assessment improvements

2007-06-26 Thread Nadav Har'El
, Ronny Lempel and Aya Soffer, SIGIR 2004, http://einat.webir.org/SIGIR_2004_Trels_p10-amitay.pdf -- Nadav Har'El| Tuesday, Jun 26 2007, 10 Tammuz 5767 IBM Haifa Research Lab

Re: IndexReader.isCurrent in presence of many files

2007-05-13 Thread Nadav Har'El
be great. -- Nadav Har'El| Sunday, May 13 2007, 25 Iyyar 5767 IBM Haifa Research Lab |- |How do you get holy water? Boil the hell http://nadav.harel.org.il |out

Re: Concurrent merge

2007-02-21 Thread Nadav Har'El
require significant system resources to add benefit. See my comments above on why multiple concurrent merges might be necessary, depending on what the benefit you were aiming at. Thanks, Nadav. -- Nadav Har'El| Wednesday, Feb 21 2007, 3 Adar 5767 IBM Haifa Research

Re: determining whether a directory is on NFS?

2007-01-22 Thread Nadav Har'El
as a ... file system on stderr, it's not NFS. If the result on stdout has one line, it's NFS. It's (very) ugly, but it can work. Of course, NFS is not the only network file system out there. -- Nadav Har'El| Monday, Jan 22 2007, 3 Shevat 5767 IBM Haifa Research Lab

Re: adding explicit commits to Lucene?

2007-01-17 Thread Nadav Har'El
does make sense, instead of repeating the same code and/or functionality in both IndexReader and IndexWriter. -- Nadav Har'El|Wednesday, Jan 17 2007, 27 Tevet 5767 [EMAIL PROTECTED] |- Phone +972-523-790466, ICQ

Re: Payloads

2007-01-10 Thread Nadav Har'El
(basically, the list of categories that this document belongs to). I'm not saying this is the best way to do it, and certainly not the cleanest, but it's just one of the things that payloads enable you to do. -- Nadav Har'El|Wednesday, Jan 10 2007, 20 Tevet 5767 IBM

Re: Payloads

2007-01-03 Thread Nadav Har'El
and requires writing some sort of new Analyzer - one that will do the regular analysis that I want for the regulr fields, and add the payload to the one specific field that lists the facets. Am I understanding correctly? Or am I missing a better way to do this? Thanks, Nadav. -- Nadav Har'El

Re: Controlling Hits

2006-11-26 Thread Nadav Har'El
, and recommending the TopDocs alternatives instead? -- Nadav Har'El| Sunday, Nov 26 2006, 5 Kislev 5767 IBM Haifa Research Lab |- |God created the world out of nothing, but http

[jira] Commented: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-26 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-695?page=comments#action_12444903 ] Nadav Har'El commented on LUCENE-695: - If given a null array? Is this ever done in Lucene? Which should be fixed, the testcase or the code? I don't know

[jira] Created: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
: Store Affects Versions: 2.0.0 Reporter: Nadav Har'El Priority: Minor During a profiling session, I discovered that BufferedIndexInput.readBytes(), the function which reads a bunch of bytes from an index, is very inefficient in many cases. It is efficient for one or two

[jira] Updated: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-695?page=all ] Nadav Har'El updated LUCENE-695: Attachment: readbytes.patch The patch, which includes the change to BufferedIndexInput.readBytes(), and a new unit test for that class. Improve

[jira] Commented: (LUCENE-695) Improve BufferedIndexInput.readBytes() performance

2006-10-24 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-695?page=comments#action_12444322 ] Nadav Har'El commented on LUCENE-695: - Sorry, I didn't notice that my fix broke this unit test. Thanks for catching that. What is happening is interesting

BufferedIndexInput performance improvement

2006-10-23 Thread Nadav Har'El
comments, or knows of any reason why the existing code was so inefficient (while the code in BufferedIndexOutput makes more sense), I'd love to hear. If a committer will agree to commit this change, even better :-) When JIRA is back online, I'll put the patches there too. Thanks, Nadav Har'El

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-07-07 Thread Nadav Har'El
this capability in an application which indexed emails and attachments, and when an email document was deleted I also had to delete the attached documents (listed in a field of the email) from the index. -- Nadav Har'El| Friday, Jul 7 2006, 11 Tammuz 5766 IBM Haifa Research Lab

Proximity-enhanced boolean scoring (was: Re: Flexible index format / Payloads Cont'd)

2006-07-06 Thread Nadav Har'El
this is possible, because the ideas you raised (adding weight to Spans or spans to Scorer) will require significant changes to many of Lucene's existing query types, or duplication of these query types, something which I'd rather avoid if possible. -- Nadav Har'El| Thursday, Jul 6

[jira] Updated: (LUCENE-623) RAMDirectory.close() should have a comment about not releasing any resources

2006-07-06 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-623?page=all ] Nadav Har'El updated LUCENE-623: Attachment: ramdirectory.diff I propose a trivial patch, which does two very simple things: 1. RAMDirectory.close(), instead of being a no-op, sets files=null

Re: Flexible index format / Payloads Cont'd

2006-07-04 Thread Nadav Har'El
this with fractions like 1.5 :-) -- Nadav Har'El| Wednesday, Jul 5 2006, 9 Tammuz 5766 IBM Haifa Research Lab |- |Why do we drive on a parkway and park on http://nadav.harel.org.il

Re: Flexible index format / Payloads Cont'd

2006-06-30 Thread Nadav Har'El
. A better solution would be like you said, to create a modified version of BooleanQuery's scoring. -- Nadav Har'El| Friday, Jun 30 2006, 4 Tammuz 5766 IBM Haifa Research Lab |- |Give Yogi

[jira] Commented: (LUCENE-504) FuzzyQuery produces a java.lang.NegativeArraySizeException in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2006-06-29 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-504?page=comments#action_12418446 ] Nadav Har'El commented on LUCENE-504: - Hi Otis, you did not comment on my patch (fuzzyquery.patch), which I think solves your objections to Doron's previous patch. Do you

Re: Limit of QueryParser ?

2006-06-29 Thread Nadav Har'El
in a query parser expression. The default limit is 1024, but you can change it with BooleanQuery.setMaxClauseCount() Note, however, that if you really use such huge queries, they may be extremely slow. -- Nadav Har'El| Thursday, Jun 29 2006, 3 Tammuz 5766 IBM

Combining Hits and HitCollector

2006-06-27 Thread Nadav Har'El
. -- Nadav Har'El| Tuesday, Jun 27 2006, 1 Tammuz 5766 IBM Haifa Research Lab |- |Unix is user friendly - it's just picky http://nadav.harel.org.il |about its friends

Re: Combining Hits and HitCollector

2006-06-27 Thread Nadav Har'El
thought as to whether we should continue demonstrating the use of Hits (rather than TopDocs) in most Lucene examples, and whether perhaps, the Hits API should be deprecated. Nadav. -- Nadav Har'El| Tuesday, Jun 27 2006, 2 Tammuz 5766 IBM Haifa Research Lab

Re: Scoring

2006-06-15 Thread Nadav Har'El
? A Similarity? Or what? I think this is an interesting topic. -- Nadav Har'El [EMAIL PROTECTED] +972-4-829-6326 Grant Ingersoll [EMAIL PROTECTED

[jira] Updated: (LUCENE-504) FuzzyQuery produces a java.lang.NegativeArraySizeException in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2006-06-14 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-504?page=all ] Nadav Har'El updated LUCENE-504: Attachment: fuzzyquery.patch This is my proposed patch described above. FuzzyQuery produces a java.lang.NegativeArraySizeException

Re: Lucene Planning

2006-05-31 Thread Nadav Har'El
is in. A document can either be, or not be, in a category, but there is no significance in the order of these categories in a document's list. -- Nadav Har'El - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL

Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-05-09 Thread Nadav Har'El
previously) and the rest end up not finding an old document and not deleting anything. I expect this benchmark to show an even greater improvment of your approach over the naive IndexModifier. -- Nadav Har'El - To unsubscribe, e-mail

[jira] Commented: (LUCENE-554) Possible index corruption if crashing while replacing segments file

2006-05-07 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-554?page=comments#action_12378295 ] Nadav Har'El commented on LUCENE-554: - Hi Otis, sorry about lingering with this patch (I've been very busy, not to mention a daughter two weeks ago :-) I still want

[jira] Created: (LUCENE-554) Possible index corruption if crashing while replacing segments file

2006-04-23 Thread Nadav Har'El (JIRA)
Versions: 1.9 Reporter: Nadav Har'El Priority: Minor Lucene's indexing is expected to be reasonably tolerant to computer crashes or the indexing process being killed. By reasonably tolerant, I mean that it is ok to lose a few documents (those currently buffered in memory), or have

Crash tolerance in Lucene

2006-04-20 Thread Nadav Har'El
it. And of course, the This replacement should be atomic comment in Directory.renameFile() must be revised. If what I'm saying sounds logical, I'll open a JIRA entry and propose a patch. Is anyone aware of other crash in-tolerance issues in Lucene that I should consider working on? Thanks, Nadav. -- Nadav

[jira] Commented: (LUCENE-130) org.apache.lucene.search.Query.toString(String field) ignores it's only parameter

2006-04-10 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-130?page=comments#action_12373847 ] Nadav Har'El commented on LUCENE-130: - toString(field) works very well, if you understand what it does. Perhaps the javadoc isn't explicit enough on what it does and need

[jira] Commented: (LUCENE-322) [PATCH] Add IndexSearcher.numDocs() method

2006-04-10 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-322?page=comments#action_12373849 ] Nadav Har'El commented on LUCENE-322: - I wonder, is this change at all necessary? After all, we have the IndexSearcher().getIndexReader() function, which returns

Re: 1.9 RC1

2006-02-19 Thread Nadav Har'El
constructor or convenience function that will do the right thing for opening a potentially-existing index. -- Nadav Har'El. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]