[jira] Updated: (LUCENE-1479) TrecDocMaker skips over documents when "Date" is missing from documents

2009-01-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1479: --- Attachment: LUCENE-1479.patch Thanks Mike, you're right. The compilation error is a result of a refa

[jira] Updated: (LUCENE-1479) TrecDocMaker skips over documents when "Date" is missing from documents

2009-01-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1479: --- Attachment: (was: LUCENE-1479.patch) > TrecDocMaker skips over documents when "Date" is missing

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662247#action_12662247 ] Doug Cutting commented on LUCENE-1476: -- bq. To really tighten this loop, you have to

[jira] Commented: (LUCENE-1494) Additional features for searching for value across multiple fields (many-to-one style)

2009-01-08 Thread Paul Cowan (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662244#action_12662244 ] Paul Cowan commented on LUCENE-1494: Hi Hoss, I don't disagree that an inverted inher

Re: Realtime Search

2009-01-08 Thread John Wang
We have worked on this problem on the server level as well. We have also open sourced it at: http://code.google.com/p/zoie/ wiki on the realtime aspect: http://code.google.com/p/zoie/wiki/ZoieSystem -John On Fri, Dec 26, 2008 at 12:34 PM, Robert Engels wrote: > If you move to the "either embe

[jira] Updated: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marvin Humphrey updated LUCENE-1476: Attachment: quasi_iterator_deletions.diff Here's a patch implementing BitVector.nextSetBit

Re: Realtime Search

2009-01-08 Thread Jason Rutherglen
Based on our discussions, it seems best to get realtime search going in small steps. Below are some possible steps to take. Patch #1: Expose an IndexWriter.getReader method that returns the current reader and shares the write lock Patch #2: Implement a realtime ram index class Patch #3: Implement

Re: stored fields / unicode compression

2009-01-08 Thread Robert Muir
thanks for the response, this sounds great. some way to plug in arbitrary schemes would be helpful. I've experimented with a few for my case and unicode compression gave the best bang for the buck, but i remember some of the other schemes such as arithmetic coding seemed to provide wins for reason

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662214#action_12662214 ] Jason Rutherglen commented on LUCENE-1476: -- M.M.:" I think the transactions layer

Re: stored fields / unicode compression

2009-01-08 Thread Chris Hostetter
Catching up on my holiday email, I on't think there were any replies to this question yet. The low level file formats used by Lucene is an area I don't have time/expertise to follow carefully, but if i'm remember correctly the concensus is/was to more more towards pure (byte[] data, int star

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread robert engels
The way we've simplified this that every document has an OID. It simplifies updates and delete tracking (in the transaction log). On Jan 8, 2009, at 2:28 PM, Marvin Humphrey (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1476? page=com.atlassian.jira.plugin.system.issuetab

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey
> You can do that now by implementing BitVector.nextSetBit(int tick) and using > that in TermDocs to set a nextDeletion member var instead of checking every > doc num with BitVector.get(). This seems so easy, I should take a crack at it. :) Marvin Humphrey --

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662143#action_12662143 ] Marvin Humphrey commented on LUCENE-1476: - Mike McCandless: > So, net/net it seem

[jira] Updated: (LUCENE-1314) IndexReader.clone

2009-01-08 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1314: - Attachment: LUCENE-1314.patch LUCENE-1314.patch All tests pass. IndexReader.close wa

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662110#action_12662110 ] Marvin Humphrey commented on LUCENE-1476: - Mike McCandless: > if it's sparse, you

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662107#action_12662107 ] Marvin Humphrey commented on LUCENE-1476: - Mike McCandless: > Commit is for crash

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662102#action_12662102 ] Michael McCandless commented on LUCENE-1476: {quote} > How about if we model d

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662101#action_12662101 ] Michael McCandless commented on LUCENE-1476: {quote} > If we move the deletion

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662100#action_12662100 ] Marvin Humphrey commented on LUCENE-1476: - Mike McCandless: > I'm also curious wh

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662097#action_12662097 ] Michael McCandless commented on LUCENE-1476: {quote} > It would be exposed as

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662092#action_12662092 ] Michael McCandless commented on LUCENE-1476: {quote} > If Lucene crashed for s

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662089#action_12662089 ] Michael McCandless commented on LUCENE-1476: {quote} > There's going to be a c

[jira] Assigned: (LUCENE-1479) TrecDocMaker skips over documents when "Date" is missing from documents

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1479: -- Assignee: Michael McCandless > TrecDocMaker skips over documents when "Date" i

[jira] Commented: (LUCENE-1479) TrecDocMaker skips over documents when "Date" is missing from documents

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662073#action_12662073 ] Michael McCandless commented on LUCENE-1479: Shai, it seems like a doc that ha

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662065#action_12662065 ] Marvin Humphrey commented on LUCENE-1476: - Jason Rutherglen: > I found in making

[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-08 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662044#action_12662044 ] Yonik Seeley commented on LUCENE-1482: -- It seems we should take into consideration th

[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-08 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662043#action_12662043 ] Jason Rutherglen commented on LUCENE-1314: -- I executed on Eclipse Mac OS X on a 4

[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662038#action_12662038 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/8/09 9:15 AM: -

[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662039#action_12662039 ] Shai Erera commented on LUCENE-1482: Grant, given what I wrote below, having Lucene us

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662038#action_12662038 ] Mark Miller commented on LUCENE-1483: - Its the ORDSUBORD again (which I don't think we

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Jason Rutherglen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662033#action_12662033 ] Jason Rutherglen commented on LUCENE-1476: -- Marvin: "The whole tombstone idea aro

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662028#action_12662028 ] Mark Miller commented on LUCENE-1483: - bq. It runs legacy vs new sort and asserts that

[jira] Updated: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1483: --- Attachment: LUCENE-1483.patch Attached full patch (though you'll get failed hunks be

[jira] Resolved: (LUCENE-1497) Minor changes to SimpleHTMLFormatter

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1497. Resolution: Fixed Fix Version/s: (was: 2.4.1) Lucene Fields: [New, P

[jira] Commented: (LUCENE-1497) Minor changes to SimpleHTMLFormatter

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662020#action_12662020 ] Michael McCandless commented on LUCENE-1497: Ahh, OK, then let's leave your ap

[jira] Commented: (LUCENE-1497) Minor changes to SimpleHTMLFormatter

2009-01-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662004#action_12662004 ] Shai Erera commented on LUCENE-1497: If I understand you correctly, you propose to cha

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661998#action_12661998 ] Mark Miller commented on LUCENE-1476: - bq. I noticed that in one version of the patch

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661995#action_12661995 ] Marvin Humphrey commented on LUCENE-1476: - Mike McCandless: > For a TermQuery (on

[jira] Commented: (LUCENE-1497) Minor changes to SimpleHTMLFormatter

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661992#action_12661992 ] Michael McCandless commented on LUCENE-1497: In fact I think it may be faster

[jira] Assigned: (LUCENE-1497) Minor changes to SimpleHTMLFormatter

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1497: -- Assignee: Michael McCandless > Minor changes to SimpleHTMLFormatter >

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661982#action_12661982 ] Marvin Humphrey commented on LUCENE-1476: - How about if we model deletions-as-iter

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Marvin Humphrey (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661977#action_12661977 ] Marvin Humphrey commented on LUCENE-1476: - Paul Elschot: > How about a SegmentSea

[jira] Commented: (LUCENE-1510) InstantiatedIndexReader throws NullPointerException in norms() when used with a MultiReader

2009-01-08 Thread Robert Newson (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661956#action_12661956 ] Robert Newson commented on LUCENE-1510: --- Looks good to me. I wonder if you should ad

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661944#action_12661944 ] Paul Elschot commented on LUCENE-1476: -- bq. To minimize CPU cycles, it would theoreti

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661934#action_12661934 ] Michael McCandless commented on LUCENE-1476: {quote} > PostingList would be c

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread Michael McCandless
robert engels wrote: Then why not always write segment.del, where is incremented. This is what Lucene does today. It's "write once". Each file may be compressed or uncompressed based on the number of deletions it contains. Lucene also does this. Still, as Marvin pointed out,

[jira] Closed: (LUCENE-1510) InstantiatedIndexReader throws NullPointerException in norms() when used with a MultiReader

2009-01-08 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wettin closed LUCENE-1510. --- Resolution: Fixed Fix Version/s: 2.9 > InstantiatedIndexReader throws NullPointerException in

[jira] Commented: (LUCENE-1510) InstantiatedIndexReader throws NullPointerException in norms() when used with a MultiReader

2009-01-08 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661908#action_12661908 ] Karl Wettin commented on LUCENE-1510: - Thanks for the report Robert! I've committed a