Ok. so after more researching:) It seems that what i need is the failover for agent source, (not fail over for sink): If one agent dies, another same kind of agent will start running. Does flume support this scenario? Thanks, Chen
On Thu, Jan 9, 2014 at 3:12 PM, Chen Wang <[email protected]>wrote: > After reading more docs, it seems that if I want to achieve my goal, i > have to do the following: > 1. Having one agent with the custom source running on one node. This agent > reads from those 5 socket server, and sink to some kind of sink(maybe > another socket?) > 2. On another(or more) machines, setting up collectors that read from the > agent sink in 1, and sink to hdfs. > 3. Having a master node managing nodes in 1,2. > > But it seems to be overskilled in my case: in 1, i can already sink to > hdfs. Since the data available at socket server are much faster than the > data translation part. I want to be able to later add more nodes to do the > translation job. so what is the correct setup? > Thanks, > Chen > > > > On Thu, Jan 9, 2014 at 2:38 PM, Chen Wang <[email protected]>wrote: > >> Guys, >> In my environment, the client is 5 socket servers. Thus i wrote a custom >> source spawning 5 threads reading from each of them infinitely,and the sink >> is hdfs(hive table). The work fine by running flume-ng agent. >> >> But how can i deploy this in distributed mode(cluster)? I am confused >> about the 3 ties(agent,collector,storage) mentioned in the doc. Does it >> apply to my case? How can I separate my agent/collect/storage? Apparently i >> can only have one agent running: multiple agent will result in getting >> duplicates from the socket server. But I want that if one agent dies, other >> agent can take it up. I would also like to be able to add horizontal >> scalability for writing to hdfs. How can I achieve all this? >> >> thank you very much for your advice. >> Chen >> > >
