Hi,

You can simple read the file directly in your Spout. This is an
implementation that reads multiple files concurrently (with respect to a
timestamp attribute that is included in the input record -- of course
you can simplify the code if you don't have a timestamp attribute and
just want to read a single file or multiple files after each other:

https://github.com/mjsax/aeolus/blob/master/queries/lrb/src/main/java/de/hub/cs/dbis/lrb/operators/FileReaderSpout.java

Furthermore, I use a Spout-Wrapper for controlling the ingestion rate
(ie, spout output rate). If you want to get rid of
nested/layered/wrapped Spouts, just merge the code of both
implementations. I personally prefer the wrapper approach as it is very
flexible...

https://github.com/mjsax/aeolus/blob/master/queries/utils/src/main/java/de/hub/cs/dbis/aeolus/spouts/FixedStreamRateDriverSpout.java

Feel free to use and/or modify both.

-Matthias


On 09/10/2015 10:18 PM, Nick R. Katsipoulakis wrote:
> Hello, 
> 
> I am currently running some experiments and in order to send data to my
> spouts, I do the following:
> 
> I spawn external processes which read the data from files (on disk) and
> they send them through TCP sockets to Spouts. I do the former because
> (a) I want to control the input rate of the spouts, and (b) so that I
> can use previously gathered data for my experiments.
> 
> Unfortunately, when I want to maintain input rates greater than 16
> thousands tuples per second, I see that my scheme is not fast enough,
> and the input rate is capped. Do you think that there is a better way to
> send (replay) previously gathered data in my topology?
> 
> Thanks,
> Nick

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to