Unfortunately, no. The spoolDir source was kept single-threaded so that deserializer implementations can be kept simple. The approach with mutliple spoolDir sources is the correct one, though they can all write to the same channel(s) - so you'd need only a larger number of sources, they can all share the same channel(s) and you don't need more sinks unless you want to pull data out faster.
On Tue, Sep 16, 2014 at 11:26 AM, Haidang N <[email protected]> wrote: > Since I'm not allowed to set up Flume on prod servers, I have to download > the logs, put them in a Flume spoolDir and have a sink to consume from the > channel and write to Cassandra. Everything is working fine. > > > However, as I have a lot of log files in the spoolDir, and the current > setup is only processing 1 file at a time, it's taking a while. I want to > be able to process many files concurrently. One way I thought of is to use > the spoolDir but distribute the files into 5-10 different directories, and > define multiple sources/channels/sinks, but this is a bit clumsy. Is there > a better way to achieve this? > > > Thanks > > >
