[jira] Commented: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629115#action_12629115 ] Michael McCandless commented on LUCENE-914: --- How about we change the spec for all

[jira] Commented: (LUCENE-1357) SpanScorer does not respect ConstantScoreRangeQuery setting

2008-09-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629117#action_12629117 ] Michael McCandless commented on LUCENE-1357: Mark, do you have a concrete patc

[jira] Commented: (LUCENE-1131) Add numDeletedDocs to IndexReader

2008-09-08 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629125#action_12629125 ] Shai Erera commented on LUCENE-1131: I agree with the body, that's what I had in mind.

[jira] Commented: (LUCENE-1357) SpanScorer does not respect ConstantScoreRangeQuery setting

2008-09-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629127#action_12629127 ] Mark Miller commented on LUCENE-1357: - Ill put it up today...just wanted to make sure

[jira] Commented: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629130#action_12629130 ] Paul Elschot commented on LUCENE-914: - I had another look at these lines in Disjunction

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Jason Rutherglen
Hi Joaquin, Using HBase with realtime Lucene would be in line with what Google does. However the question is whether or not this is completely necessary or the most simple approach. That probably can only be answered by doing a live comparison of the two! Unfortunately that would require probab

[jira] Commented: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629137#action_12629137 ] Paul Elschot commented on LUCENE-914: - Well, how about changing the TermDocs interface

[jira] Created: (LUCENE-1379) SpanScorer fails when sloppyFreq() returns 0

2008-09-08 Thread Paul Elschot (JIRA)
SpanScorer fails when sloppyFreq() returns 0 Key: LUCENE-1379 URL: https://issues.apache.org/jira/browse/LUCENE-1379 Project: Lucene - Java Issue Type: Bug Components: Search

[jira] Commented: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629133#action_12629133 ] Doron Cohen commented on LUCENE-914: {quote} ... else what happens is undefined ... {qu

[jira] Issue Comment Edited: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629130#action_12629130 ] [EMAIL PROTECTED] edited comment on LUCENE-914 at 9/8/08 4:20 AM: ---

[jira] Commented: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629142#action_12629142 ] Paul Elschot commented on LUCENE-914: - See LUCENE-1379 for the SpanScorer bug when slop

[jira] Updated: (LUCENE-1379) SpanScorer fails when sloppyFreq() returns 0

2008-09-08 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-1379: - Attachment: LUCENE-1379.patch The patch of 20080908 compiles, but it is untested because of

[jira] Updated: (LUCENE-1357) SpanScorer does not respect ConstantScoreRangeQuery setting

2008-09-08 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1357: Attachment: LUCENE-1357.patch > SpanScorer does not respect ConstantScoreRangeQuery setting >

[jira] Issue Comment Edited: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629133#action_12629133 ] doronc edited comment on LUCENE-914 at 9/8/08 4:38 AM: {quote}

[jira] Commented: (LUCENE-1279) RangeQuery and RangeFilter should use collation to check for range inclusion

2008-09-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629131#action_12629131 ] Michael McCandless commented on LUCENE-1279: Grant, what's the game plan on th

[jira] Commented: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629157#action_12629157 ] Yonik Seeley commented on LUCENE-914: - bq. How about we change the spec for all skipTo'

[jira] Commented: (LUCENE-914) Scorer.skipTo(current) remains on current for some scorers

2008-09-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629164#action_12629164 ] Michael McCandless commented on LUCENE-914: --- Since we're still having healthy dis

[jira] Updated: (LUCENE-1327) TermSpans skipTo() doesn't always move forwards

2008-09-08 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1327: --- Fix Version/s: (was: 2.4) We're still iterating on the approach to resolve this,

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Ning Li
Hi, We experimented using HBase's scalable infrastructure to scale out Lucene: http://www.mail-archive.com/[EMAIL PROTECTED]/msg01143.html There is the concern on the impact of HDFS's random read performance on Lucene search performance. And we can discuss if HBase's architecture is best for scal

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Mark Miller
Ning Li wrote: I agree with Otis that the first step for Lucene is probably to support real-time search. The instantiated index in contrib seems to be something close.. Maybe we should start fleshing out what we want in realtime search on the wiki? Could it be as simple as making Instantiated

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Jason Rutherglen
InstantiatedIndex isn't quite realtime. Instead a new InstantiatedIndex is created per transaction in Ocean and managed thereafter. This however is fairly easy to build and could offer realtime in Lucene without adding the transaction logging. It would be good to find out what scope is acceptabl

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Michael McCandless
I'd also trying to make time to explore the approach of creating an IndexReader impl. that searches IndexWriter's RAM buffer. I think it's quite feasible, but, it'd still have a "reopen" cost in that any buffered delete by term or query would have to be "materialiazed" into docIDs on reop

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Yonik Seeley
On Mon, Sep 8, 2008 at 12:33 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > I'd also trying to make time to explore the approach of creating an > IndexReader impl. that searches IndexWriter's RAM buffer. That seems like it could possibly be the best performing approach in the long run. > I t

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Karl Wettin
I need to point out that the only thing I know InstantiatedIndex to be great at is read access in the inverted index. It consumes a lot more heap than RAMDirectory and InstantiatedIndexWriter is slightly less efficient than IndexWriter. Please let me know if your experience differs from the

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Michael McCandless
Yonik Seeley wrote: I think it's quite feasible, but, it'd still have a "reopen" cost in that any buffered delete by term or query would have to be "materialiazed" into docIDs on reopen. Though, if this somehow turns out to be a problem, in the future we could do this materializing immedi

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Ning Li
On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > But, how would you maintain a static view of an index...? > > IndexReader r1 = indexWriter.getCurrentIndex() > indexWriter.addDocument(...) > IndexReader r2 = indexWriter.getCurrentIndex() > > I assume r1 will have a view of

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Yonik Seeley
On Mon, Sep 8, 2008 at 3:56 PM, Ning Li <[EMAIL PROTECTED]> wrote: > On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> But, how would you maintain a static view of an index...? >> >> IndexReader r1 = indexWriter.getCurrentIndex() >> indexWriter.addDocument(...) >> IndexRead

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Jason Rutherglen
That sounds about correct and I don't think it matters much. I keep the documents by default stored in InstantiatedIndex to 100. So the heap size doesn't become a problem. On Mon, Sep 8, 2008 at 2:58 PM, Karl Wettin <[EMAIL PROTECTED]> wrote: > I need to point out that the only thing I know Inst

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Jason Rutherglen
Term dictionary? I'm curious how that would be solved? On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Yonik Seeley wrote: > >>> I think it's quite feasible, but, it'd still have a "reopen" cost in that >>> any buffered delete by term or query would have to be "m

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Yonik Seeley
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Right, getCurrentIndex would return a MultiReader that includes > SegmentReader for each segment in the index, plus a "RAMReader" that > searches the RAM buffer. That RAMReader is a tiny shell class that would > basica

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Jason Rutherglen
Perhaps an interesting project would be to integrate Ocean with H2 www.h2database.com to take advantage of both models. I'm not sure how exactly that would work, but it seems like it would not be too difficult. Perhaps this would solve being able to perform faster hierarchical queries and perhaps

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread J. Delgado
Yes, both Marcelo and I would be interested. We looked into H2 and it looks like something similar to Oracle's ODCI can be implemented. Plus the primitive full-text implementación is based on Lucene. I say primitive because looking at the code I saw that one cannot define an Analyzer and for each

Re: [jira] Commented: (LUCENE-1313) Ocean Realtime Search

2008-09-08 Thread Chris Hostetter
: : Is there a good place to place the javadocs on the Apache website once they are more complete? generated javadocs aren't really neccessary (at least not at this stage) just having javadoc comments in the code makes it a lot easier to review new contributions and patches (most people revi

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Jason Rutherglen
Cool. I mention H2 because it does have some Lucene code in it yes. Also according to some benchmarks it's the fastest of the open source databases. I think it's possible to integrate realtime search for H2. I suppose there is no need to store the data in Lucene in this case? One loses the multi

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Marcelo Ochoa
Hi: Integrating Lucene in a RDBMS has two separate concern: - Integrate it as index to receive notification when a row change and that the optimizer can choose a right execution plan based on the index statistics. - Replace Lucene file system store to align database changes with Lucene changes,

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Jason Rutherglen
I am wondering if in an integrated solution, things like sorting still require the field cache? What if untokenized fields could be stored in H2, normal tokenized fields in Lucene. Then somehow make the query work properly. Yes the rowid would need to be stored. Currently Lucene range queries a

Re: Realtime Search for Social Networks Collaboration

2008-09-08 Thread Marcelo Ochoa
> I am wondering if in an integrated solution, things like sorting still > require the field cache? What if untokenized fields could be stored > in H2, normal tokenized fields in Lucene. Then somehow make the query > work properly. Yes the rowid would need to be stored. Currently > Lucene range