Re: Newbie - Sink question

Ashish Thu, 04 Sep 2014 23:10:23 -0700

I would recommend using an Interceptor for this and possibly a modified
Flume topology. If the json files have large numbers of rows or very high
number of files, go for a Collection tier, and use another level of agents
that uses interceptors for DB lookup and CSV generation. Something like


Collection Agents -> Transformation Agents (writing to S3 Sinks)

You can scale out Transformation/Collection layer agents  based on the
traffic volume

thanks




On Fri, Sep 5, 2014 at 8:23 AM, Kevin Warner <[email protected]>
wrote:

> Hello All,
> We have the following configuration:
> Source->Channel->Sink
>
> Now, the source is pointing to a folder that has lots of json files. The
> channel is file based so that there is fault tolerance and the Sink is
> putting CSV files on S3.
>
> Now, there is code written in Sink that takes the JSON events and does
> some MySQL database lookup and generates CSV files to be put into S3.
>
> The question is, is it the right place for the code or should the code be
> running in channel as the ACID gaurantees is present in Channel. Please
> advise.
>
> -Kev
>
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Re: Newbie - Sink question

Reply via email to