Re: A Comparison of Open Source Search Engines

2009-07-06 Thread John Wang
mg4j is a nice project. It is missing the incremental aspects as well.The "older" paper this experiment mentioned contains lucene-mg4j comparisons. -John On Mon, Jul 6, 2009 at 2:01 PM, Earwin Burrfoot wrote: > I'd say out of these libraries only Lucene and Sphinx are worth mentioning. > > Ther

[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-06 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727882#action_12727882 ] Jason Rutherglen commented on LUCENE-1726: -- When I moved the sync block around in

[jira] Updated: (LUCENE-1522) another highlighter

2009-07-06 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-1522: --- Attachment: LUCENE-1522.patch Thank you for your advice, Michael. bq. because they test mul

[jira] Assigned: (LUCENE-1717) IndexWriter does not properly account for the RAM consumed by pending deletes

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1717: -- Assignee: Michael McCandless > IndexWriter does not properly account for the R

[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727843#action_12727843 ] Michael McCandless commented on LUCENE-1726: Yes, we should eventually see a f

[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-06 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727823#action_12727823 ] Jason Rutherglen commented on LUCENE-1726: -- Shouldn't we be seeing an exception i

[jira] Updated: (LUCENE-1727) Order of stored Fields not maintained

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1727: --- Attachment: LUCENE-1727.patch Attached patch. I moved StoredFieldsWriter up in the

[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727813#action_12727813 ] Michael McCandless commented on LUCENE-1726: The hazard is something like this

[jira] Commented: (LUCENE-1718) IndexReader.setTermInfosIndexDivisor doesn't carry over to reopened readers

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727792#action_12727792 ] Michael McCandless commented on LUCENE-1718: Thanks Tim. This should be fixed

[jira] Commented: (LUCENE-1718) IndexReader.setTermInfosIndexDivisor doesn't carry over to reopened readers

2009-07-06 Thread Tim Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727791#action_12727791 ] Tim Smith commented on LUCENE-1718: --- perfect i had checked your last patch on LUCENE-16

[jira] Resolved: (LUCENE-1735) IndexReader.reopen() does not retain TermInfosIndexDivisor setting for newly opened segments

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1735. Resolution: Duplicate This is a dup of LUCENE-1718. > IndexReader.reopen() does n

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread eks dev
> Anybody knows other interesting open-source search engines? Minion (https://minion.dev.java.net/) - Original Message > From: Earwin Burrfoot > To: java-dev@lucene.apache.org > Sent: Monday, 6 July, 2009 23:01:52 > Subject: Re: A Comparison of Open Source Search Engines > > I'd sa

[jira] Created: (LUCENE-1735) IndexReader.reopen() does not retain TermInfosIndexDivisor setting for newly opened segments

2009-07-06 Thread Tim Smith (JIRA)
IndexReader.reopen() does not retain TermInfosIndexDivisor setting for newly opened segments Key: LUCENE-1735 URL: https://issues.apache.org/jira/browse/LUCENE-1735

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread Earwin Burrfoot
I'd say out of these libraries only Lucene and Sphinx are worth mentioning. There's also MG4J, which wasn't covered and has a nice algorithmic background. Anybody knows other interesting open-source search engines? On Tue, Jul 7, 2009 at 00:39, John Wang wrote: > Vik did a very nice job. > One th

Re: A Comparison of Open Source Search Engines

2009-07-06 Thread John Wang
Vik did a very nice job.One thing the experiment did not mention is that Lucene handles incremental updates, whereas many of the other "competitors" do not. So the indexing performance comparison is not really fair. -John On Mon, Jul 6, 2009 at 8:06 AM, Sean Owen wrote: > > http://zooie.wordpre

Re: Execute a testcase method via ant?

2009-07-06 Thread Jason Rutherglen
I'll make an issue for testing by method, it should be easier to implement than multithreading JUnit (which seems to require core ANT/JUnit work). On Mon, Jul 6, 2009 at 12:26 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > I would love to have the -Dtestmethod=XXX! > > Mike > > On T

[jira] Commented: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727740#action_12727740 ] Michael McCandless commented on LUCENE-1566: bq. I did hit the error while I d

[jira] Commented: (LUCENE-1704) org.apache.lucene.ant.HtmlDocument added Tidy config file passthrough availability

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727732#action_12727732 ] Michael McCandless commented on LUCENE-1704: OK the patch looks good -- I'll c

[jira] Resolved: (LUCENE-1704) org.apache.lucene.ant.HtmlDocument added Tidy config file passthrough availability

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1704. Resolution: Fixed Thanks Keith! > org.apache.lucene.ant.HtmlDocument added Tidy c

[jira] Commented: (LUCENE-1522) another highlighter

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727728#action_12727728 ] Michael McCandless commented on LUCENE-1522: Is it possible to decouple this i

[jira] Updated: (LUCENE-1704) org.apache.lucene.ant.HtmlDocument added Tidy config file passthrough availability

2009-07-06 Thread Keith Sprochi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Sprochi updated LUCENE-1704: -- Description: Parsing HTML documents using the org.apache.lucene.ant.HtmlDocument.Document met

[jira] Closed: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-06 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood closed LUCENE-1486. Resolution: Fixed Committed in 791579 - http://svn.apache.org/viewvc?rev=791579&view=rev > Wildc

[jira] Commented: (LUCENE-1704) org.apache.lucene.ant.HtmlDocument added Tidy config file passthrough availability

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727723#action_12727723 ] Michael McCandless commented on LUCENE-1704: There is a preview button (that s

Re: Execute a testcase method via ant?

2009-07-06 Thread Michael McCandless
I would love to have the -Dtestmethod=XXX! Mike On Tue, Jun 23, 2009 at 7:42 PM, Jason Rutherglen wrote: > More like ant test -Dtestcase=TestSort -Dtestmethod=testMultiSort > > or > > ant test -Dtestcase=TestSort.testMultiSort > > I Googled a lot for "ant junit test method" and variants.  Couldn'

[jira] Resolved: (LUCENE-1591) Enable bzip compression in benchmark

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1591. - Resolution: Fixed committed > Enable bzip compression in benchmark > --

[jira] Updated: (LUCENE-1609) Eliminate synchronization contention on initial index reading in TermInfosReader ensureIndexIsRead

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1609: --- Attachment: LUCENE-1609.patch Attached patch. This addresses this issue and LUCENE-

[jira] Updated: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-06 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1726: - Attachment: LUCENE-1726.patch * New SRMapValue is strongly typed * All tests pass {quo

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Tim Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727695#action_12727695 ] Tim Smith commented on LUCENE-1721: --- That looks like its pretty close, and is definitely

[jira] Assigned: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned LUCENE-1486: --- Assignee: Mark Harwood (was: Mark Miller) > Wildcards, ORs etc inside Phrase queries >

[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727692#action_12727692 ] Mark Miller commented on LUCENE-1486: - Please, by all means ! :) > Wildcards, ORs etc

[jira] Updated: (LUCENE-1650) Small fix in CustomScoreQuery JavaDoc

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1650: Affects Version/s: (was: 3.0) (was: 2.9) Fix Version/s:

[jira] Commented: (LUCENE-1650) Small fix in CustomScoreQuery JavaDoc

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727688#action_12727688 ] Mark Miller commented on LUCENE-1650: - bq. Not sure why you wanted me to take a peek -

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727683#action_12727683 ] Yonik Seeley commented on LUCENE-1721: -- bq. Absolutely nothing would have to have act

[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-06 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727685#action_12727685 ] Mark Harwood commented on LUCENE-1486: -- Hi Mark, Mind if I try committing this patch?

[jira] Commented: (LUCENE-1650) Small fix in CustomScoreQuery JavaDoc

2009-07-06 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727675#action_12727675 ] Yonik Seeley commented on LUCENE-1650: -- Not sure why you wanted me to take a peek - t

[jira] Commented: (LUCENE-1591) Enable bzip compression in benchmark

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727668#action_12727668 ] Michael McCandless commented on LUCENE-1591: Thank Mark! > Enable bzip compre

[jira] Updated: (LUCENE-1591) Enable bzip compression in benchmark

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1591: Attachment: LUCENE-1591.patch Looks like this spread a little in the docmaker/contentsource breaku

[jira] Reopened: (LUCENE-1591) Enable bzip compression in benchmark

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened LUCENE-1591: - Assignee: Mark Miller Lucene Fields: [New, Patch Available] (was: [New]) some java 1.5

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Tim Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727646#action_12727646 ] Tim Smith commented on LUCENE-1721: --- Absolutely nothing would have to have actually chan

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727641#action_12727641 ] Yonik Seeley commented on LUCENE-1721: -- bq. but some custom caches may not work on a

Re: addIndexesNoOptimize

2009-07-06 Thread Jason Rutherglen
> MergePolicy expects to receive SegmentInfo instances I ran into this implementing LUCENE-1589. On Mon, Jul 6, 2009 at 3:18 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Jul 6, 2009 at 2:18 AM, John Wang wrote: > > > Currently, addIndexesNoOptimize(Directory[] dir) is

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727639#action_12727639 ] Jason Rutherglen commented on LUCENE-1721: -- I'm still of the somewhat naive opini

A Comparison of Open Source Search Engines

2009-07-06 Thread Sean Owen
http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/ I imagine many of you already saw this -- Lucene does pretty well in this "shootout". The only area it tended to lag, it seems, is memory usage and speed in some cases. -

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Tim Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727623#action_12727623 ] Tim Smith commented on LUCENE-1721: --- bq. Sounds like you could perhaps use reopen() or t

[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1486: Attachment: LUCENE-1486.patch Whoops - almost let some 1.5 slip by: throw new IllegalArgumentExc

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727620#action_12727620 ] Yonik Seeley commented on LUCENE-1721: -- bq. obviously, this is rather impractical as

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Tim Smith (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727614#action_12727614 ] Tim Smith commented on LUCENE-1721: --- One thing that would be nice to see is a boolean re

[jira] Updated: (LUCENE-1650) Small fix in CustomScoreQuery JavaDoc

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1650: Attachment: LUCENE-1650.patch updated to trunk in any case. > Small fix in CustomScoreQuery JavaD

RE: small faults in new Numeric* class Javadoc

2009-07-06 Thread Uwe Schindler
Thanks, I fix. It is just copy'n'paste errors! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] > Sent: Monday, July 06, 2009 6:18 PM > To: java-dev@lucene.apach

[jira] Commented: (LUCENE-1650) Small fix in CustomScoreQuery JavaDoc

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727612#action_12727612 ] Mark Miller commented on LUCENE-1650: - No I'm not :) Yonik, could you take a peak at t

Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Michael McCandless
On Mon, Jul 6, 2009 at 11:40 AM, Uwe Schindler wrote: > Wonderful, and the tests (TestRussianStems) pass? Yup! Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...

[jira] Commented: (LUCENE-1721) IndexWriter to allow deletion by doc ids

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727610#action_12727610 ] Michael McCandless commented on LUCENE-1721: Right, a merge can commit at any

small faults in new Numeric* class Javadoc

2009-07-06 Thread Koji Sekiguchi
There seems to be trivial faults in javadoc. In NumericRangeQuery, "Filter" should be "Query": - * Filter f = NumericRangeQuery.newFloatRange(field, precisionStep, + * Query query = NumericRangeQuery.newFloatRange(field, precisionStep, And in NumericField, there is an incorrect sample code for Nu

Re: Bug in DocInvertedPerField?

2009-07-06 Thread Shai Erera
Ok. BTW, maybe we want to ensure then that the Analyzer passed to IndexWriter is not null, since it looks to be a required argument, unless I always addDocument w/ an Analyzer. Thanks for the replies guys. Shai On Mon, Jul 6, 2009 at 5:22 PM, Yonik Seeley wrote: > On Mon, Jul 6, 2009 at 7:12 AM

[jira] Resolved: (LUCENE-1730) TrecContentSource should use a fixed encoding, rather than system dependent

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1730. - Resolution: Fixed > TrecContentSource should use a fixed encoding, rather than system dependent

[jira] Commented: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-07-06 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727595#action_12727595 ] Simon Willnauer commented on LUCENE-1566: - bq. Could we move the fix down into Sim

RE: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Uwe Schindler
Wonderful, and the tests (TestRussianStems) pass? Thanks, Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Monday, July 06, 2009 5:37 PM >

Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Michael McCandless
contrib/analyzers/src/test/org/apache/lucene/analysis/ru/stemsUTF8.txt looks right on OpenSolaris (unix EOLs). Mike On Mon, Jul 6, 2009 at 9:53 AM, Uwe Schindler wrote: > I fixed the encoding problem by convertig the test files to UTF-8 and > changed the Reader charset parameter to UTF-8. All fil

[jira] Commented: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727579#action_12727579 ] Michael McCandless commented on LUCENE-1566: Could we move the fix down into S

[jira] Updated: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1726: --- Fix Version/s: (was: 2.9) 3.1 > IndexWriter.readerPool create

[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727567#action_12727567 ] Michael McCandless commented on LUCENE-1726: Can we make the MapValue strongly

Re: Bug in DocInvertedPerField?

2009-07-06 Thread Yonik Seeley
On Mon, Jul 6, 2009 at 7:12 AM, Shai Erera wrote: > If I want to create an IndexWriter w/o an > Analyzer, why should I be forced to do new IndexWriter(new SimpleAnalyzer() Passing an Analyzer really doesn't seem like a hardship... it's the current interface that defines analysis, and it would com

[jira] Assigned: (LUCENE-1730) TrecContentSource should use a fixed encoding, rather than system dependent

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned LUCENE-1730: --- Assignee: Mark Miller > TrecContentSource should use a fixed encoding, rather than system de

[jira] Commented: (LUCENE-1730) TrecContentSource should use a fixed encoding, rather than system dependent

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727533#action_12727533 ] Mark Miller commented on LUCENE-1730: - Okay, cool. I'll patch it in, run the tests, an

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727531#action_12727531 ] Mark Miller commented on LUCENE-1567: - I wonder if all of this was really necessary. M

RE: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Uwe Schindler
I fixed the encoding problem by convertig the test files to UTF-8 and changed the Reader charset parameter to UTF-8. All files now have old-style native again. Could somebody check if in unix, the files only have LF (and in windows the files have CRLF, which is the state how I committed it)? The o

[jira] Commented: (LUCENE-1730) TrecContentSource should use a fixed encoding, rather than system dependent

2009-07-06 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727529#action_12727529 ] Shai Erera commented on LUCENE-1730: if (encoding == null) happens in setConfig and th

[jira] Commented: (LUCENE-1730) TrecContentSource should use a fixed encoding, rather than system dependent

2009-07-06 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727524#action_12727524 ] Mark Miller commented on LUCENE-1730: - I havn't patched the code in, but looking at th

Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Robert Muir
uwe I completely agree. to add the icing on the cake the entire analyzer appears to be just a duplication of the contrib/snowball Russian functionality...! On Mon, Jul 6, 2009 at 9:19 AM, Uwe Schindler wrote: > The whole russian analyzer is very strange and works against all > charset/unicode con

RE: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Uwe Schindler
The whole russian analyzer is very strange and works against all charset/unicode conventions. It defines own "charsets" (the only valid one is UNICODE), which are all applied to standard java 16 bit chars. The test shows, how this works: It open a text file in KOI8 using the "ISO-88591-1" charset (

Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Robert Muir
Uwe, I think so too. This way it will not be prone to breakage again. On Mon, Jul 6, 2009 at 8:38 AM, Uwe Schindler wrote: > In my opinion, these files should be converted to UTF-8 and committed again > (and the Reader in the test recondigured for UTF-8). Then they can be native > EOL style again.

RE: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Uwe Schindler
In my opinion, these files should be converted to UTF-8 and committed again (and the Reader in the test recondigured for UTF-8). Then they can be native EOL style again. The problem is that SVN can only handle the EOL style for one-byte-per-char and UTF-8 files. I give it a try here (and I have a

Re: Bug in DocInvertedPerField?

2009-07-06 Thread Shai Erera
Yes they have the same field name. Can we use the default posIncr? If I want to create an IndexWriter w/o an Analyzer, why should I be forced to do new IndexWriter(new SimpleAnalyzer() /* for example */ ...), when the analyzer will never be used? It is an edge case though which I can easily reprod

Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Robert Muir
yeah, its fixed now. On Mon, Jul 6, 2009 at 7:06 AM, Michael McCandless wrote: > Is this the native vs LF svn:eol-style that Uwe already fixed? > > Mike > > On Thu, Jul 2, 2009 at 10:03 AM, Shai Erera wrote: >> Can somebody try to revert the change and test it on Windows? >> >> On Thu, Jul 2, 2009

[jira] Assigned: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1566: -- Assignee: Michael McCandless (was: Simon Willnauer) > Large Lucene index can

[jira] Updated: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1566: --- Fix Version/s: 2.9 > Large Lucene index can hit false OOM due to Sun JRE issue > ---

[jira] Commented: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727484#action_12727484 ] Michael McCandless commented on LUCENE-1566: Yes, I'll take this. Thanks Simo

Re: [jira] Commented: (LUCENE-1707) Don't use ensureOpen() excessively in IndexReader and IndexWriter

2009-07-06 Thread Michael McCandless
Is this the native vs LF svn:eol-style that Uwe already fixed? Mike On Thu, Jul 2, 2009 at 10:03 AM, Shai Erera wrote: > Can somebody try to revert the change and test it on Windows? > > On Thu, Jul 2, 2009 at 4:44 PM, Robert Muir wrote: >> >> well then I have no idea why it doesn't fail. Except

Re: Bug in DocInvertedPerField?

2009-07-06 Thread Michael McCandless
Were the two fields that you added to the doc the same field name? In which case, the pos incr gap is in fact needed, even if the fields are pre-analyzed (have TokenStream values)? Mike On Thu, Jul 2, 2009 at 10:25 AM, Shai Erera wrote: > I hit NPE in DocInvertedPerField in the following scenari

[jira] Assigned: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1726: -- Assignee: Michael McCandless > IndexWriter.readerPool create new segmentReader

[jira] Commented: (LUCENE-1727) Order of stored Fields not maintained

2009-07-06 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727473#action_12727473 ] Michael McCandless commented on LUCENE-1727: bq. If we start guaranteeing that

Re: addIndexesNoOptimize

2009-07-06 Thread Michael McCandless
On Mon, Jul 6, 2009 at 2:18 AM, John Wang wrote: > Currently, addIndexesNoOptimize(Directory[] dir) is really really > really fast! (I duplicated my index of 15k docs 200 times and created a 3M > doc index in less than a minute) Perhaps we should handle duplicate > directory names more gracef

[jira] Closed: (LUCENE-1734) CharReader should delegate reset/mark/markSupported

2009-07-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler closed LUCENE-1734. - Resolution: Fixed Committed revision 791415. Thanks Koji! > CharReader should delegate reset/

[jira] Assigned: (LUCENE-1734) CharReader should delegate reset/mark/markSupported

2009-07-06 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-1734: - Assignee: Uwe Schindler I think this patch looks good. I will commit shortly. > CharRea