date:20081226

Re: ANNOUNCE: Welcome Ryan McKinley as Contrib/Documentation Committer

2008-12-26 Thread Ryan McKinley

Thanks! I look forward to getting back into this soon -- the holidays sure suck up more time then we imagine! Happy holidays to everyone. ryan On Dec 24, 2008, at 12:48 AM, Chris Hostetter wrote: I'm happy to announce that in recognition of his efforts in moving forward with creating

Re: Blob storage

2008-12-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Dec 26, 2008 at 10:05 PM, Otis Gospodnetic wrote: > Similar thoughts here. I don't have ML thread pointers nor JIRA issue > pointers, but there has been discussion in this area before, and I believe > the thinking was that what's needed is a general interface/abstraction/API > for stor

stored fields / unicode compression

2008-12-26 Thread Robert Muir

Has there been any thoughts of using SCSU or BOCU-1 instead of UTF-8 for stored fields? Personally I don't put huge amounts of text in stored fields but these encodings/compression work extremely well on short strings like titles, etc. Removing the unicode penalty for non-latin text (i.e. cut in ha

Re: Realtime Search

2008-12-26 Thread Andrzej Bialecki

Robert Engels wrote: You are full of **beep** *beep* ... No matter whether you are right or wrong, please keep a civil tone on this public forum. We are professionals here, so let's discuss and disagree if must be - but in a professional and grown-up way. Thank you. -- Best regards, Andrze

Re: Realtime Search

2008-12-26 Thread Robert Engels

You are full of crap. From your own comments in Lucene 1458: "The work on streamlining the term dictionary is excellent, but perhaps we can do better still. Can we design a format that allows us rely upon the operating system's virtual memory and avoid caching in process memory altogether? Say

Re: Realtime Search

2008-12-26 Thread Marvin Humphrey

Robert, Three exchanges ago in this thread, you made the incorrect assumption that the motivation behind using mmap was read speed, and that memory mapping was being waved around as some sort of magic wand: Is there something that I am missing? I see lots of references to using "memory ma

Re: Realtime Search

2008-12-26 Thread Robert Engels

There is also the distributed model - but in that case each node is running some sort of server anyway (as in Hadoop). It seems that the distributed model would be easier to develop using Hadoop over the embedded model. -Original Message- >From: Robert Engels >Sent: Dec 26, 2008 2:34 P

Re: Realtime Search

2008-12-26 Thread Robert Engels

If you move to the "either embedded, or server model", the post reopen is trivial, as the structures can be created as the segment is written. It is the networked shared access model that causes a lot of these optimizations to be far more complex than needed. Would it maybe be simpler to move t

Re: Realtime Search

2008-12-26 Thread Robert Engels

This is what we mostly do, but we serialize the documents to a log file first, so if server crashes before the background merge of the RAM segments into the disk segments completes, we can replay the operations on server restart. Since the serialize is a sequential write to an already open file,

Re: Realtime Search

2008-12-26 Thread J. Delgado

One thing that I forgot to mention is that in our implementation the real-time indexing took place with many "folder-based" listeners writing to many tiny in-memory indexes partitioned by "sub-sources" with fewer long-term and archive indexes per box. Overall distributed search across various luc

Re: Realtime Search

2008-12-26 Thread Marvin Humphrey

On Fri, Dec 26, 2008 at 06:22:23AM -0500, Michael McCandless wrote: > > 4) Allow 2 concurrent writers: one for small, fast updates, and one for > > big background merges. > > Marvin can you describe more detail here? The goal is to improve worst-case write performance. Currently, writes

Re: Realtime Search

2008-12-26 Thread J. Delgado

The addition of docs into tiny segments using the current data structures seems the right way to go. Sometime back one of my engineers implemented pseudo real-time using MultiSearcher by having an in-memory (RAM based) "short-term" index that auto-merged into a disk-based "long term" index that eve

Re: Realtime Search

2008-12-26 Thread Doug Cutting

Michael McCandless wrote: So then I think we should start with approach #2 (build real-time on top of the Lucene core) and iterate from there. Newly added docs go into a tiny segments, which IndexReader.reopen pulls in. Replaced or deleted docs record the delete against the right SegmentReader

Re: Realtime Search

2008-12-26 Thread Robert Engels

Also, if you are really set on the mmap strategy, why not use the single file with fixed length pages, using the header I proposed (and key compression). You don't need any fancy partial page stuff, just waste a small amount of space at the end of pages. I think this is going to far faster than

Re: Realtime Search

2008-12-26 Thread Robert Engels

That could very well be, but I was referencing your statement: "1) Design index formats that can be memory mapped rather than slurped, bringing the cost of opening/reopening an IndexReader down to a negligible level." The only reason to do this (or have it happen) is if you perform a bi

Re: Blob storage

2008-12-26 Thread Otis Gospodnetic

Similar thoughts here. I don't have ML thread pointers nor JIRA issue pointers, but there has been discussion in this area before, and I believe the thinking was that what's needed is a general interface/abstraction/API for storing and loading field data to an external component, be that a BDB,

Re: Blob storage

2008-12-26 Thread Grant Ingersoll

On Dec 26, 2008, at 9:07 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote: On Fri, Dec 26, 2008 at 2:11 PM, Babak Farhang wrote: BTW . The license is a problem The license is a problem if Babak intends to donate it to the ASF. And it may be a problem for companies who don't allow GPL (thus

[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

2008-12-26 Thread Michael McCandless (JIRA)

[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659240#action_12659240 ] Michael McCandless commented on LUCENE-1483: Given how different the results

Re: Blob storage

2008-12-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Fri, Dec 26, 2008 at 2:11 PM, Babak Farhang wrote: > Most of all, I'm trying to communicate an *idea* which itself cannot > be encumbered by any license, anyway. But if you want to incorporate > some of this code into an asf project, I'd be happy to also release it > under the apache license. H

[jira] Commented: (LUCENE-1314) IndexReader.clone

2008-12-26 Thread Michael McCandless (JIRA)

[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659235#action_12659235 ] Michael McCandless commented on LUCENE-1314: OK I reviewed the patch; some co

Re: Realtime Search

2008-12-26 Thread Michael McCandless

Marvin Humphrey wrote: > 4) Allow 2 concurrent writers: one for small, fast updates, and one for > big background merges. Marvin can you describe more detail here? It sounds like this is your solution for "decoupling" segments changes due to merges from changes from docs being indexed, fro

Re: Blob storage

2008-12-26 Thread Ian Holsman

Babak Farhang wrote: Most of all, I'm trying to communicate an *idea* which itself cannot be encumbered by any license, anyway. But if you want to incorporate some of this code into an asf project, I'd be happy to also release it under the apache license. Hope the license I chose for my project d

Re: Blob storage

2008-12-26 Thread Babak Farhang

Most of all, I'm trying to communicate an *idea* which itself cannot be encumbered by any license, anyway. But if you want to incorporate some of this code into an asf project, I'd be happy to also release it under the apache license. Hope the license I chose for my project doesn't get in the way o

Re: ANNOUNCE: Welcome Ryan McKinley as Contrib/Documentation Committer

Re: Blob storage

stored fields / unicode compression

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Realtime Search

Re: Blob storage

Re: Blob storage

[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for multiple subreaders

Re: Blob storage

[jira] Commented: (LUCENE-1314) IndexReader.clone

Re: Realtime Search

Re: Blob storage

Re: Blob storage

23 matches

Site Navigation

Mail list logo

Footer information