> deletes made through reader (by docID) are immediately visible, but
through writer are buffered until a flush or reopen?
This is what I was thinking, IW buffers deletes, IR does not. Making
IW.deletes visible immediately by applying them to the IR makes sense
as well.
What should be the behavio
Jason Rutherglen wrote:
> > We'd also need to ensure when a merge kicks off, the SegmentReaders
> > used by the merging are not newly reopened but also "borrowed" from
>
> The IW merge code currently opens the SegmentReader with a 4096
> buffer size (different than the 1024 default), how will thi
> We'd also need to ensure when a merge kicks off, the SegmentReaders
used by the merging are not newly reopened but also "borrowed" from
The IW merge code currently opens the SegmentReader with a 4096
buffer size (different than the 1024 default), how will this case be
handled?
> reopen would th
Jason Rutherglen wrote:
> "But I think for realtime we don't want to be using IW's deletion at
all. We should do all deletes via the IndexReader. In fact if IW has
handed out a reader (via getReader()) and that reader (or a reopened
derivative) remains open we may have to block deletions via I
Just thinking out loud... haven't looked at your patch yet (one of
these days I will be back up for air)
My initial thought is that you would have a factory that produced both
the Reader and the Writer as a pair, or was at least aware of what to
go get from the Writer
Something like:
cl
Grant,
Do you have a proposal in mind? It would help to suggest something like
some classes and methods to help understand an alternative to what is being
discussed.
-J
On Fri, Jan 9, 2009 at 12:05 PM, Grant Ingersoll wrote:
> I realize we aren't adding read functionality to the Writer, but it
"Patch #2: Implement a realtime ram index class I think this one is
optional, or, rather an optimazation that we can swap in later
if/when necessary? Ie for starters little segments are written into
the main Directory."
John, Zoie could be of use for this patch. In addition, we may want to
impleme
I think the IW integrated IR needs a rule regarding the behavior of
IW.flush and IR.flush. There will need to be a flush lock that is
shared between the IW and IR. The lock is acquired at the beginning
of a flush and released immediately after a successful or
unsuccessful call. We will need to shar
> "But I think for realtime we don't want to be using IW's deletion at
all. We should do all deletes via the IndexReader. In fact if IW has
handed out a reader (via getReader()) and that reader (or a reopened
derivative) remains open we may have to block deletions via IW. Not
sure..."
Can't IW
I realize we aren't adding read functionality to the Writer, but it
would be coupling the Writer to the Reader nonetheless. I understand
it is brainstorming (like I said, not trying to distract from the
discussion), just saying that if the Reader and the Writer both need
access to the unde
Grant Ingersoll wrote:
We've spent a lot of time up until now getting write functionality
out of the Reader, and now we are going to add read functionality
into the Writer?
Well... we're not really adding read functionality into IW; instead,
we are asking IW to open the reader for us, exce
On Jan 9, 2009, at 8:39 AM, Michael McCandless wrote:
Jason Rutherglen wrote:
Patch #1: Expose an IndexWriter.getReader method that returns the
current reader and shares the write lock
I tentatively like this approach so far...
That reader is opened using IndexWriter's SegmentInfos insta
Jason Rutherglen wrote:
> Are you referring to the IW.pendingCommit SegmentInfos variable?
No, I'm referring to segmentInfos. (pendingCommit is the "snapshot"
of segmentInfos taken when committing...).
> When you say "flushed" you are referring to the IW.prepareCommit method?
No, I'm referrin
M.M.: "That reader is opened using IndexWriter's SegmentInfos instance, so
it
can read segments & deletions that have been flushed but not
committed. It's allowed to do its own deletions & norms updating.
When reopen() is called, it grabs the writers SegmentInfos again."
Are you referring to the
Marvin Humphrey wrote:
> The goal is to improve worst-case write performance.
> ...
> In between the time when the background merge writer starts up and the time
> it finishes consolidating segment data, we assume that the primary writer
> will have modified the index.
>
> * New docs have bee
Jason Rutherglen wrote:
Patch #1: Expose an IndexWriter.getReader method that returns the
current reader and shares the write lock
I tentatively like this approach so far...
That reader is opened using IndexWriter's SegmentInfos instance, so it
can read segments & deletions that have been f
This is the way MS Access worked, and
> everyone that wanted performance needed to move to SQL server for the server
> model.
>
>
> -Original Message-
> >From: Marvin Humphrey
> >Sent: Dec 26, 2008 12:53 PM
> >To: java-dev@lucene.apache.org
> >Su
Based on our discussions, it seems best to get realtime search going in
small steps. Below are some possible steps to take.
Patch #1: Expose an IndexWriter.getReader method that returns the current
reader and shares the write lock
Patch #2: Implement a realtime ram index class
Patch #3: Implement
+1 Agreed, the initial version should use RAMDirectory in order to keep
things simple and to benchmark against other MemoryIndex like index
representations.
On Fri, Dec 26, 2008 at 10:20 AM, Doug Cutting wrote:
> Michael McCandless wrote:
>
>> So then I think we should start with approach #2 (bu
Then your comments are misdirected.
On Jan 5, 2009, at 1:19 PM, Doug Cutting wrote:
Robert Engels wrote:
Do what you like. You obviously will. This is the problem with
the Lucene managers - the problems are only the ones they see -
same with the solutions. If the solution (or questions) p
Robert Engels wrote:
Do what you like. You obviously will. This is the problem with the Lucene
managers - the problems are only the ones they see - same with the solutions.
If the solution (or questions) put them outside their comfort zone, they are
ignored or dismissed in a tone that is des
Andrzej Bialecki wrote:
No matter whether you are right or wrong, please keep a civil tone on
this public forum.
+1 Ad-hominem remarks are anti-community.
Doug
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.or
Robert Engels wrote:
You are full of **beep** *beep* ...
No matter whether you are right or wrong, please keep a civil tone on
this public forum. We are professionals here, so let's discuss and
disagree if must be - but in a professional and grown-up way. Thank you.
--
Best regards,
Andrze
ey
>Sent: Dec 26, 2008 3:53 PM
>To: java-dev@lucene.apache.org, Robert Engels
>Subject: Re: Realtime Search
>
>Robert,
>
>Three exchanges ago in this thread, you made the incorrect assumption that the
>motivation behind using mmap was read speed, and that memory mapping was bein
Robert,
Three exchanges ago in this thread, you made the incorrect assumption that the
motivation behind using mmap was read speed, and that memory mapping was being
waved around as some sort of magic wand:
Is there something that I am missing? I see lots of references to
using "memory ma
8 2:34 PM
>To: java-dev@lucene.apache.org
>Subject: Re: Realtime Search
>
>If you move to the "either embedded, or server model", the post reopen is
>trivial, as the structures can be created as the segment is written.
>
>It is the networked shared access model that
way MS Access worked, and everyone that
wanted performance needed to move to SQL server for the server model.
-Original Message-
>From: Marvin Humphrey
>Sent: Dec 26, 2008 12:53 PM
>To: java-dev@lucene.apache.org
>Subject: Re: Realtime Search
>
>On Fri, Dec 26, 2008 at
needed), and works well for many dbs (i.e. derby)
-Original Message-
>From: Doug Cutting
>Sent: Dec 26, 2008 12:20 PM
>To: java-dev@lucene.apache.org
>Subject: Re: Realtime Search
>
>Michael McCandless wrote:
>> So then I think we should start with approach
One thing that I forgot to mention is that in our implementation the
real-time indexing took place with many "folder-based" listeners writing to
many tiny in-memory indexes partitioned by "sub-sources" with fewer
long-term and archive indexes per box. Overall distributed search across
various luc
On Fri, Dec 26, 2008 at 06:22:23AM -0500, Michael McCandless wrote:
> > 4) Allow 2 concurrent writers: one for small, fast updates, and one for
> > big background merges.
>
> Marvin can you describe more detail here?
The goal is to improve worst-case write performance.
Currently, writes
The addition of docs into tiny segments using the current data structures
seems the right way to go. Sometime back one of my engineers implemented
pseudo real-time using MultiSearcher by having an in-memory (RAM based)
"short-term" index that auto-merged into a disk-based "long term" index that
eve
Michael McCandless wrote:
So then I think we should start with approach #2 (build real-time on
top of the Lucene core) and iterate from there. Newly added docs go
into a tiny segments, which IndexReader.reopen pulls in. Replaced or
deleted docs record the delete against the right SegmentReader
to be significantly smaller (improving the write time, and the cache
efficiency).
-Original Message-
>From: Robert Engels
>Sent: Dec 26, 2008 11:30 AM
>To: java-dev@lucene.apache.org, java-dev@lucene.apache.org
>Subject: Re: Realtime Search
>
>That could very well be, but
1:31 PM
>To: java-dev@lucene.apache.org
>Subject: Re: Realtime Search
>
>On Wed, Dec 24, 2008 at 12:02:24PM -0600, robert engels wrote:
>> As I understood this discussion though, it was an attempt to remove
>> the in memory 'skip to' index, to avoid the reading of th
Marvin Humphrey wrote:
> 4) Allow 2 concurrent writers: one for small, fast updates, and one for
> big background merges.
Marvin can you describe more detail here? It sounds like this is your
solution for "decoupling" segments changes due to merges from changes
from docs being indexed, fro
I think the necessary low-level changes to Lucene for real-time are
actually already well underway...
The biggest barrier is how we now ask for FieldCache values a the
Multi*Reader level. This makes reopen cost catastrophic for a large
index.
Once we succeed in making FieldCache usage within Luc
On Wed, Dec 24, 2008 at 12:02:24PM -0600, robert engels wrote:
> As I understood this discussion though, it was an attempt to remove
> the in memory 'skip to' index, to avoid the reading of this during
> index open/reopen.
No. That idea was entertained briefly and quickly discarded. There se
On Tue, Dec 23, 2008 at 11:02:56PM -0600, robert engels wrote:
> Seems doubtful you will be able to do this without increasing the
> index size dramatically. Since it will need to be stored
> "unpacked" (in order to have random access), yet the terms are
> variable length - leading to using a
On Dec 24, 2008, at 12:23 PM, Jason Rutherglen wrote:
> Also, what are the requirements? Must a document be visible to
search within 10ms of being added?
0-5ms. Otherwise it's not realtime, it's batch indexing. The
realtime system can support small batches by encoding them into
RAMDir
> Also, what are the requirements? Must a document be visible to search
within 10ms of being added?
0-5ms. Otherwise it's not realtime, it's batch indexing. The realtime
system can support small batches by encoding them into RAMDirectories if
they are of sufficient size.
> Or must it be visibl
As I pointed out in another email, I understand the benefits of
compression (compressed disks vs. uncompressed, etc.). PFOR is
definitely a winner !
As I understood this discussion though, it was an attempt to remove
the in memory 'skip to' index, to avoid the reading of this during
index
Jason Rutherglen wrote:
2) Implement realtime search by incrementally creating and merging
readers in memory. The system would use MemoryIndex or
InstantiatedIndex to quickly (more quickly than RAMDirectory) create
indexes from added documents.
As a baseline, how fast is it to simply use RAM
Op Wednesday 24 December 2008 17:51:04 schreef robert engels:
> Thinking about this some more, you could use fixed length pages for
> the term index, with a page header containing a count of entries, and
> use key compression (to avoid the constant entry size).
>
> The problem with this is that yo
Thinking about this some more, you could use fixed length pages for
the term index, with a page header containing a count of entries, and
use key compression (to avoid the constant entry size).
The problem with this is that you still have to decode the entries
(slowing the processing - sinc
Also, if you are thinking that accessing the "buffer" directly will
be faster than "parsing" the packed structure, I'm not so sure.
You can review the source for the various buffers, and since the is
no "struct" support in Java, you end up combining bytes to make
longs, etc. Also, a lot of
Seems doubtful you will be able to do this without increasing the
index size dramatically. Since it will need to be stored
"unpacked" (in order to have random access), yet the terms are
variable length - leading to using a maximum=minimum size for every
term.
In the end I highly doubt it
On Tue, Dec 23, 2008 at 08:36:24PM -0600, robert engels wrote:
> Is there something that I am missing?
Yes.
> I see lots of references to using "memory mapped" files to "dramatically"
> improve performance.
There have been substantial discussions about this design in JIRA,
notably LUCENE-1458
Is there something that I am missing? I see lots of references to
using "memory mapped" files to "dramatically" improve performance.
I don't think this is the case at all. At the lowest levels, it is
somewhat more efficient from a CPU standpoint, but with a decent OS
cache the IO performanc
On Tue, Dec 23, 2008 at 05:51:43PM -0800, Jason Rutherglen wrote:
> Are there other implementation options?
Here's the plan for Lucy/KS:
1) Design index formats that can be memory mapped rather than slurped,
bringing the cost of opening/reopening an IndexReader down to a
negligible l
es) in order to
>> >>> > blend
>> >>> > IR
>> >>> > and ORM for once and for all.
>> >>> >
>> >>> > -- Joaquin
>> >>> >
>> >>> >
>> >>> >>
>> &
;> > Database engine projects, written in Java (see
>>> >>> > http://java-source.net/open-source/database-engines) in order to
>>> blend
>>> >>> > IR
>>> >>> > and ORM for once and for all.
>>> >>> &g
t;
>> >>> > -- Joaquin
>> >>> >
>> >>> >
>> >>> >>
>> >>> >> I've read Jason's Wiki as well. Actually, I had to read it a
>> number of
>> >>> >> times to understand bits and pieces of it. I have to
; >>> >> so,
> >>> >> and if you are working on getting it integrated into Lucene, would
> it
> >>> >> make
> >>> >> it less confusing to just refer to it as "real-time search", so
> there
> >>> >>
t
>>>> >> make
>>>> >> it less confusing to just refer to it as "real-time search", so there
>>>> >> is no
>>>> >> confusion?
>>>> >>
>>>> >> If this is to be initially integrated into
refer to it as "real-time search", so there
>>> >> is no
>>> >> confusion?
>>> >>
>>> >> If this is to be initially integrated into Lucene, why are things like
>>> >> replication, crowding/field collapsing, locallucene,
Jason Rutherglen wrote:
How does column stride fields work for StringIndex field caching?
I'm not sure -- Michael Busch is working on column-stride fields.
I
have been working on the tag index which may be more suitable for
field caching and makes range queries faster. It is something that
Hi Mike,
How does column stride fields work for StringIndex field caching? I
have been working on the tag index which may be more suitable for
field caching and makes range queries faster. It is something that
would be good to integrate into core Lucene as well. It may be more
suitable for many
Jason Rutherglen wrote:
Mike,
The other issue that will occur that I addressed is the field caches.
The underlying smaller IndexReaders will need to be exposed because of
the field caching. Currently in ocean realtime search the individual
readers are searched on using a MultiSearcher in orde
Mike,
The other issue that will occur that I addressed is the field caches.
The underlying smaller IndexReaders will need to be exposed because of
the field caching. Currently in ocean realtime search the individual
readers are searched on using a MultiSearcher in order to search in
parallel and
Right, there would need to be a snapshot taken of all terms when
IndexWriter.getReader() is called.
This snapshot would 1) hold a frozen int docFreq per term, and 2) sort
the terms so TermEnum can just step through them. (We might be able
to delay this sorting until the first time someth
Hi Mike,
There would be a new sorted list or something to replace the
hashtable? Seems like an issue that is not solved.
Jason
On Tue, Sep 9, 2008 at 5:29 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> This would just tap into the live hashtable that DocumentsWriter* maintain
> for the p
>>> Even so,
>>> this may not be sufficient for some FS such as HDFS... Is it
>>> reasonable in this case to keep in memory everything including
>>> stored fields and term vectors?
>>
>> We could maybe do something like a proxy IndexInput/IndexOutput that
>> would allow updating the read buffer fro
On Tue, Sep 9, 2008 at 12:45 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> No, it would essentially be a change in the semantics that all
>> implementations would need to support.
>
> Right, which is you are allowed to open an IndexInput on a file when an
> IndexOutput
On Tue, Sep 9, 2008 at 12:41 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> OR, if all writes are append-only, perhaps we don't ever need to
>> invalidate the read buffer and would just need to remove the current
>> logic that caches the file length and then let the unde
Yonik Seeley wrote:
On Tue, Sep 9, 2008 at 11:42 AM, Ning Li <[EMAIL PROTECTED]> wrote:
On Tue, Sep 9, 2008 at 10:02 AM, Yonik Seeley <[EMAIL PROTECTED]>
wrote:
Yeah, I think the underlying RandomAccessFile might do the right
thing, but IndexInput isn't required to see any changes on the fly
Yonik Seeley wrote:
On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Yonik Seeley wrote:
What about something like term freq? Would it need to count the
number of docs after the local maxDoc or is there a better way?
Good question...
I think we'd have to take
On Tue, Sep 9, 2008 at 11:42 AM, Ning Li <[EMAIL PROTECTED]> wrote:
> On Tue, Sep 9, 2008 at 10:02 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> Yeah, I think the underlying RandomAccessFile might do the right
>> thing, but IndexInput isn't required to see any changes on the fly
>> (and current im
On Mon, Sep 8, 2008 at 4:23 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> I thought an index reader which supports real-time search no longer
>> maintains a static view of an index?
>
> It seems advantageous to just make it really cheap to get a new view
> of the index (if you do it for every sear
On Tue, Sep 9, 2008 at 5:28 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
>> What about something like term freq? Would it need to count the
>> number of docs after the local maxDoc or is there a better way?
>
> Good question...
>
> I think we'd have to take a full copy o
This would just tap into the live hashtable that DocumentsWriter*
maintain for the posting lists... except the docFreq will need to be
copied away on reopen, I think.
Mike
Jason Rutherglen wrote:
Term dictionary? I'm curious how that would be solved?
On Mon, Sep 8, 2008 at 3:04 PM, Mic
Yonik Seeley wrote:
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
Right, getCurrentIndex would return a MultiReader that includes
SegmentReader for each segment in the index, plus a "RAMReader" that
searches the RAM buffer. That RAMReader is a tiny shell class
ave to admit there is
>>>>> >> still
>>>>> >> some fuzziness about the whole things in my head - is "Ocean" something
>>>>> >> that
>>>>> >> already works, a separate project on googlecode.com? I think so. If
, a separate project on googlecode.com? I think so. If
>>>> >> so,
>>>> >> and if you are working on getting it integrated into Lucene, would it
>>>> >> make
>>>> >> it less confusing to just refer to it as "real-time search"
Lucene, why are things like
>>> >> replication, crowding/field collapsing, locallucene, name service, tag
>>> >> index, etc. all mentioned there on the Wiki and bundled with
>>> >> description of
>>> >> how real-time search works and is
ription of
>> >> how real-time search works and is to be implemented? I suppose
>> >> mentioning
>> >> replication kind-of makes sense because the replication approach is
>> >> closely
>> >> tied to real-time search - all query nodes need t
>> substantial changes to Lucene (I remember seeing large patches in JIRA),
> >> which makes it hard to digest, understand, comment on, and ultimately
> commit
> >> (hence the luke warm response, I think). Bringing other non-essential
> >> elements into discussion
difficult t o
>> process all this new stuff, at least for me. Am I the only one who finds
>> this hard?
>>
>> That said, it sounds like we have some discussion going (Karl...), so I
>> look forward to understanding more! :)
>>
>>
>> Otis
>> --
>
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> Right, getCurrentIndex would return a MultiReader that includes
> SegmentReader for each segment in the index, plus a "RAMReader" that
> searches the RAM buffer. That RAMReader is a tiny shell class that would
> basica
Term dictionary? I'm curious how that would be solved?
On Mon, Sep 8, 2008 at 3:04 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Yonik Seeley wrote:
>
>>> I think it's quite feasible, but, it'd still have a "reopen" cost in that
>>> any buffered delete by term or query would have to be "m
That sounds about correct and I don't think it matters much. I keep
the documents by default stored in InstantiatedIndex to 100. So the
heap size doesn't become a problem.
On Mon, Sep 8, 2008 at 2:58 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:
> I need to point out that the only thing I know Inst
On Mon, Sep 8, 2008 at 3:56 PM, Ning Li <[EMAIL PROTECTED]> wrote:
> On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>> But, how would you maintain a static view of an index...?
>>
>> IndexReader r1 = indexWriter.getCurrentIndex()
>> indexWriter.addDocument(...)
>> IndexRead
On Mon, Sep 8, 2008 at 2:43 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> But, how would you maintain a static view of an index...?
>
> IndexReader r1 = indexWriter.getCurrentIndex()
> indexWriter.addDocument(...)
> IndexReader r2 = indexWriter.getCurrentIndex()
>
> I assume r1 will have a view of
Yonik Seeley wrote:
I think it's quite feasible, but, it'd still have a "reopen" cost
in that
any buffered delete by term or query would have to be
"materialiazed" into
docIDs on reopen. Though, if this somehow turns out to be a
problem, in the
future we could do this materializing immedi
I need to point out that the only thing I know InstantiatedIndex to be
great at is read access in the inverted index. It consumes a lot more
heap than RAMDirectory and InstantiatedIndexWriter is slightly less
efficient than IndexWriter.
Please let me know if your experience differs from the
On Mon, Sep 8, 2008 at 12:33 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
> I'd also trying to make time to explore the approach of creating an
> IndexReader impl. that searches IndexWriter's RAM buffer.
That seems like it could possibly be the best performing approach in
the long run.
> I t
I'd also trying to make time to explore the approach of creating an
IndexReader impl. that searches IndexWriter's RAM buffer.
I think it's quite feasible, but, it'd still have a "reopen" cost in
that any buffered delete by term or query would have to be
"materialiazed" into docIDs on reop
InstantiatedIndex isn't quite realtime. Instead a new
InstantiatedIndex is created per transaction in Ocean and managed
thereafter. This however is fairly easy to build and could offer
realtime in Lucene without adding the transaction logging. It would
be good to find out what scope is acceptabl
Ning Li wrote:
I agree with Otis that the first step for Lucene is probably to
support real-time
search. The instantiated index in contrib seems to be something close..
Maybe we should start fleshing out what we want in realtime search on
the wiki?
Could it be as simple as making Instantiated
Hi,
We experimented using HBase's scalable infrastructure to scale out Lucene:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg01143.html
There is the concern on the impact of HDFS's random read performance
on Lucene search performance. And we can discuss if HBase's architecture
is best for scal
Hi Joaquin,
Using HBase with realtime Lucene would be in line with what Google
does. However the question is whether or not this is completely
necessary or the most simple approach. That probably can only be
answered by doing a live comparison of the two! Unfortunately that
would require probab
Hi,
- Original Message
From: J. Delgado <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Sunday, September 7, 2008 4:04:58 AM
Subject: Re: Realtime Search for Social Networks Collaboration
On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote
BTW, quoting Marcelo Ochoa (the developer behind the Oracle/Lucene
implementation) the three minimal features a transactional DB should support
for Lucene integration are:
1) The ability to define new functions (e.g. lcontains() lscore) which
would allow to bind queries to lucene and obtain docu
On Sun, Sep 7, 2008 at 2:41 AM, mark harwood <[EMAIL PROTECTED]>wrote:
>>for example joins are not possible using SOLR).
>
> It's largely *because* Lucene doesn't do joins that it can be made to scale
> out. I've replaced two large-scale database systems this year with
> distributed Lucene solutio
Interesting discussion.
>>I think we should seriously look at joining efforts with open-source Database
>>engine projects
I posted some initial dabblings here with a couple of the databases on your
list :http://markmail.org/message/3bu5klzzc5i6uhl7 but this is not really a
scalable solution
://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message
> > From: Yonik Seeley <[EMAIL PROTECTED]>
> > To: java-dev@lucene.apache.org
> > Sent: Thursday, September 4, 2008 10:13:32 AM
> > Subject: Re: Realtime Search for Social Net
Hi Paul,
It's unfortunate the code is larger than most contribs. The libraries
can be factored out. The next patch includes OceanDatabase. The
Ocean package and class names can be removed in favor of "realtime"?
> - There is a whole package of logging in there, but there's no logging
> in luc
Hi Grant,
I think the way to integrate with SOLR and Lucene is if people who are
committers to the respective projects work with me (if they want) on
the integration which will make it fairly straightforward as it was
designed and intended to be.
Cheers,
Jason
On Sat, Sep 6, 2008 at 3:16 PM, Gra
Hello Shalin,
When I tried to integrate before it seemed fairly simple. However the
Ocean core code wasn't quite up to par yet so that needed work. It
will help to work with SOLR people directly who can figure how they
want to integrate such as yourself. Right now I'm finishing up the
OceanData
Op Saturday 06 September 2008 18:53:39 schreef Shalin Shekhar Mangar:
...
>
> The features are more important than the code but it will of course
> help a lot too. I think a good starting point for us (Lucene/Solr
> folks) would be to study Ocean's source and any documentation that
> you can provid
On Sep 6, 2008, at 4:36 AM, Otis Gospodnetic wrote:
Regarding real-time search and Solr, my feeling is the focus should
be on first adding real-time search to Lucene, and then we'll figure
out how to incorporate that into Solr later.
I've read Jason's Wiki as well. Actually, I had to read
1 - 100 of 109 matches
Mail list logo