Guys, In my environment, the client is 5 socket servers. Thus i wrote a custom source spawning 5 threads reading from each of them infinitely,and the sink is hdfs(hive table). The work fine by running flume-ng agent.
But how can i deploy this in distributed mode(cluster)? I am confused about the 3 ties(agent,collector,storage) mentioned in the doc. Does it apply to my case? How can I separate my agent/collect/storage? Apparently i can only have one agent running: multiple agent will result in getting duplicates from the socket server. But I want that if one agent dies, other agent can take it up. I would also like to be able to add horizontal scalability for writing to hdfs. How can I achieve all this? thank you very much for your advice. Chen
