Hi Chen, Maybe it would be worth checking this http://flume.apache.org/FlumeDeveloperGuide.html#loadbalancing-rpc-client
Regards, Joao On Fri, Jan 10, 2014 at 3:50 PM, Jeff Lord <[email protected]> wrote: > Have you taken a look at the load balancing rpc client? > > > On Thu, Jan 9, 2014 at 8:43 PM, Chen Wang <[email protected]>wrote: > >> Jeff, >> I have read this ppt at the beginning, but didn't find solution to my >> user case. To simplify my case, I only have 1 data source(composed of 5 >> socket server) and i am looking for a fault tolerant deployment of flume, >> that can read from this single data source and sink to hdfs in fault >> tolerant mode: when one node dies, another flume node can pick up and >> continue; >> Thanks, >> Chen >> >> >> On Thu, Jan 9, 2014 at 7:49 PM, Jeff Lord <[email protected]> wrote: >> >>> Chen, >>> >>> Have you taken a look at this presentation on Planning and Deploying >>> Flume from ApacheCon? >>> >>> >>> http://archive.apachecon.com/na2013/presentations/27-Wednesday/Big_Data/11:45-Mastering_Sqoop_for_Data_Transfer_for_Big_Data-Arvind_Prabhakar/Arvind%20Prabhakar%20-%20Planning%20and%20Deploying%20Apache%20Flume.pdf >>> >>> It may have the answers you need. >>> >>> Best, >>> >>> Jeff >>> >>> >>> On Thu, Jan 9, 2014 at 7:24 PM, Chen Wang <[email protected]>wrote: >>> >>>> Thanks Saurabh. >>>> If that is the case, I am actually thinking about using storm spout to >>>> talk to our socket server so that the storm cluster can take care of the >>>> reading socket server part. Then in each storm node, start a flume agent, >>>> listening on a RPC port and write to HDFS(with fail over) .Then in the >>>> storm bolt, simply send the data to RPC so that flume can get it. >>>> How do you think of this setup? It takes care of both failover on the >>>> source(by storm) and on the sink(by flume) But It looks a little >>>> complicated for me. >>>> Chen >>>> >>>> >>>> On Thu, Jan 9, 2014 at 7:18 PM, Saurabh B <[email protected]>wrote: >>>> >>>>> Hi Chen, >>>>> >>>>> I think Flume doesn't have a way to configure multiple sources >>>>> pointing to same data source. Of course you can do that, but you will end >>>>> up with duplicate data. Flume offers fail over at the sink level. >>>>> >>>>> On Thu, Jan 9, 2014 at 6:56 PM, Chen Wang >>>>> <[email protected]>wrote: >>>>> >>>>>> Ok. so after more researching:) It seems that what i need is the >>>>>> failover for agent source, (not fail over for sink): >>>>>> If one agent dies, another same kind of agent will start running. >>>>>> Does flume support this scenario? >>>>>> Thanks, >>>>>> Chen >>>>>> >>>>>> >>>>>> On Thu, Jan 9, 2014 at 3:12 PM, Chen Wang <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> After reading more docs, it seems that if I want to achieve my goal, >>>>>>> i have to do the following: >>>>>>> 1. Having one agent with the custom source running on one node. This >>>>>>> agent reads from those 5 socket server, and sink to some kind of >>>>>>> sink(maybe >>>>>>> another socket?) >>>>>>> 2. On another(or more) machines, setting up collectors that read >>>>>>> from the agent sink in 1, and sink to hdfs. >>>>>>> 3. Having a master node managing nodes in 1,2. >>>>>>> >>>>>>> But it seems to be overskilled in my case: in 1, i can already sink >>>>>>> to hdfs. Since the data available at socket server are much faster than >>>>>>> the >>>>>>> data translation part. I want to be able to later add more nodes to do >>>>>>> the >>>>>>> translation job. so what is the correct setup? >>>>>>> Thanks, >>>>>>> Chen >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jan 9, 2014 at 2:38 PM, Chen Wang < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Guys, >>>>>>>> In my environment, the client is 5 socket servers. Thus i wrote a >>>>>>>> custom source spawning 5 threads reading from each of them >>>>>>>> infinitely,and >>>>>>>> the sink is hdfs(hive table). The work fine by running flume-ng >>>>>>>> agent. >>>>>>>> >>>>>>>> But how can i deploy this in distributed mode(cluster)? I am >>>>>>>> confused about the 3 ties(agent,collector,storage) mentioned in the >>>>>>>> doc. >>>>>>>> Does it apply to my case? How can I separate my agent/collect/storage? >>>>>>>> Apparently i can only have one agent running: multiple agent will >>>>>>>> result in >>>>>>>> getting duplicates from the socket server. But I want that if one agent >>>>>>>> dies, other agent can take it up. I would also like to be able to add >>>>>>>> horizontal scalability for writing to hdfs. How can I achieve all this? >>>>>>>> >>>>>>>> thank you very much for your advice. >>>>>>>> Chen >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Mailing List Archives, >>>>> QnaList.com >>>>> >>>> >>>> >>> >> >
