RE: Realtime distributed

2009-10-11 Thread Angel, Eric
that there are plans to have Zoie use Lucene 2.9. How long would you say before it's available? Thanks, E -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Sat 10/10/2009 12:16 PM To: java-user@lucene.apache.org Subject: Re: Realtime distributed John

Re: Realtime distributed

2009-10-11 Thread John Wang
...@gmail.com] Sent: Sat 10/10/2009 12:16 PM To: java-user@lucene.apache.org Subject: Re: Realtime distributed John, Actually everyone is entitled to their technical opinion and none of the comments were misleading. Jake and yourself validated that they are true in your comments. I'm simply trying

Re: Realtime distributed

2009-10-11 Thread John Wang
: Realtime distributed John, Actually everyone is entitled to their technical opinion and none of the comments were misleading. Jake and yourself validated that they are true in your comments. I'm simply trying to create better technology as is everyone on here. The process takes time

Re: Realtime distributed

2009-10-11 Thread Jake Mannix
Ok nevermind actually - the simultaneous indexing was something done in zoie 1.3, and was changed in 1.4 to addIndexesNoOptimize() on the RAMDirectory indexes as soon as they are big enough. It's still true that you can throw away the RAMDirectory once the disk index is reopened though. -jake

Re: Realtime distributed

2009-10-10 Thread Jason Rutherglen
John, Actually everyone is entitled to their technical opinion and none of the comments were misleading. Jake and yourself validated that they are true in your comments. I'm simply trying to create better technology as is everyone on here. The process takes time and coordination between many

Re: Realtime distributed

2009-10-09 Thread Jason Rutherglen
Jake and John, It would be interesting and enlightening to see NRT performance numbers in a variety of configurations. The best way to go about this is to post benchmarks that others may run in their environment which can then be tweaked for their unique edge cases. I wish I had more time to work

Re: Realtime distributed

2009-10-09 Thread Jake Mannix
Jason, We've been running some perf/load/stress tests lately, but on a suggestion from Ted Dunning, I've been trying to come up with a more realistic set of stress tests and indexing rates to see where NRT performs well and where it does not, instead of just indexing at maximum rate, looping

Re: Realtime distributed

2009-10-09 Thread Jason Rutherglen
The dimensions sound good. It's unclear if you're going to post a chart again, numbers, or code? There's a LUCENE-1577 Jira issue for code. On Fri, Oct 9, 2009 at 12:37 PM, Jake Mannix jake.man...@gmail.com wrote: Jason,  We've been running some perf/load/stress tests lately, but on a

Re: Realtime distributed

2009-10-09 Thread John Wang
I can provide some preliminary numbers (we will need to do some detailed analysis and post it somewhere): Dataset: medline starting index: empty. add only, no update, for 30 min. maximum indexing load, 1000 docs/ sec Under stress, we take indexing events (add only) and stream into both systems:

Re: Realtime distributed

2009-10-09 Thread Bradford Stephens
Hey Eric, My consulting company specializes in scalable, real-time search with distributed Lucene. I'm more than happy to chat, if you'd like! :) Cheers, Bradford On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric ean...@business.com wrote: Does anyone have any recommendations?  I've looked at

Re: Realtime distributed

2009-10-09 Thread Bradford Stephens
My deepest apologies for the spam, everyone. I slipped on my G-mail button :) On Fri, Oct 9, 2009 at 9:09 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Hey Eric, My consulting company specializes in scalable, real-time search with distributed Lucene. I'm more than happy to chat, if

Re: Realtime distributed

2009-10-09 Thread Michael Masters
Hi Jake, Zoie looks like a a really cool project. I'd like to learn more about the distributed part of the setup. Any way you could describe that here or on the wiki? -Mike On Thu, Oct 8, 2009 at 9:24 PM, Jake Mannix jake.man...@gmail.com wrote: On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric

Re: Realtime distributed

2009-10-09 Thread Jake Mannix
Hi Mike, Zoie itself doesn't do anything with the new with the distributed side of things - it just plays nicely with it. Zoie, at its core, exposes a couple of primary interfaces (well, this is a slightly simplified form of them) : interface IndexReaderFactory { List getIndexReaders();

Realtime distributed

2009-10-08 Thread Angel, Eric
Does anyone have any recommendations? I've looked at Katta, but it doesn't seem to support realtime searching. It also uses hdfs, which I've heard can be slow. I'm looking to serve 40gb of indexes and support about 1 million updates per day. Thx

Re: Realtime distributed

2009-10-08 Thread Jason Rutherglen
Eric, Katta doesn't require HDFS which would be slow to search on, though Katta can be used to copy indexes out of HDFS onto local servers. The best bet is hardware that uses SSDs because merges and update latency will greatly decrease and there won't be a synchronous IO issue as there is with

Re: Realtime distributed

2009-10-08 Thread Jake Mannix
Jason, On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Today near realtime search (with or without SSDs) comes at a price, that is reduced indexing speed due to continued in RAM merging. People typically hack something together where indexes are held in a

Re: Realtime distributed

2009-10-08 Thread Jake Mannix
On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric ean...@business.com wrote: Does anyone have any recommendations? I've looked at Katta, but it doesn't seem to support realtime searching. It also uses hdfs, which I've heard can be slow. I'm looking to serve 40gb of indexes and support about 1

Re: Realtime distributed

2009-10-08 Thread Jake Mannix
On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: There is the Zoie system which uses the RAMDir solution, Also, to clarify: zoie does not index into a RAMDir and then periodically merge that down to disk, as for one thing, this has a bad failure mode when

Re: Realtime distributed

2009-10-08 Thread John Wang
Jason: I would really appreciate it if you would stop making false statements and misinformation. Everyone is entitled to his/her opinions on technologies, but deliberately making misleading and false information on such a distribution is just unethical, and you'll end up just discrediting