Re: Distributed Deployment Questions

2012-03-03 Thread Jay Stricks
Just wanted to say that I added a randomized 0-200 second sleep in my flume-daemon shell script, which I use to start the Flume service on the agents when new servers are launched. I expected it to help with the master crashing, but it has happened again since then, though much less frequently. O

Re: Distributed Deployment Questions

2012-03-03 Thread Jay Stricks
Really, really appreciate the help, Alex. 1. Max open files for root is at 65000 for all five collectors, but I'm not sure what you want me to check with respect to network latency. I actually don't have any partitions marked as swap on these machines, as far as I can tell with a 'swapon -s' com

Re: Distributed Deployment Questions

2012-03-02 Thread alo alt
Hey Jay, 1. please check max open files, network latency, swap. Useful would be a example of sinks or flows. 2. Here it could be that S3 nodes fall behind and you're hitting different servers on S3 3. Flume master uses zookeeper, here you can tune the max open connections. In fact, when you

Distributed Deployment Questions

2012-03-02 Thread Jay Stricks
Hey folks, I'm looking for some advice on a couple of issues I"m having. My setup is Flume v.094--cdh3u2, single master, six collectors (three flows, all autoCollectorSource), ~80 agents (three flows, autoE2E). 1. I have begun to have collectors fail with "ERROR connector.DirectDriver: Exiting dr