I would recommend using an Interceptor for this and possibly a modified Flume topology. If the json files have large numbers of rows or very high number of files, go for a Collection tier, and use another level of agents that uses interceptors for DB lookup and CSV generation. Something like
Collection Agents -> Transformation Agents (writing to S3 Sinks) You can scale out Transformation/Collection layer agents based on the traffic volume thanks On Fri, Sep 5, 2014 at 8:23 AM, Kevin Warner <[email protected]> wrote: > Hello All, > We have the following configuration: > Source->Channel->Sink > > Now, the source is pointing to a folder that has lots of json files. The > channel is file based so that there is fault tolerance and the Sink is > putting CSV files on S3. > > Now, there is code written in Sink that takes the JSON events and does > some MySQL database lookup and generates CSV files to be put into S3. > > The question is, is it the right place for the code or should the code be > running in channel as the ACID gaurantees is present in Channel. Please > advise. > > -Kev > > -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal
