Hey Matthias,

I apologize for the late response, but I was busy with some additional
changes to my code. Thank you very much for your reply and the code
snippets you provided (love the name "aeolus" by the way :-) ). The only
reason that I have not created my File-provider as a spout is because I do
not always know on which node my spout is spawned. Therefore, there might
be a setting in which the file with the data is not co-located with the
spout. Do you have any work-around for this problem?

Thanks again,
Nick

On Thu, Sep 10, 2015 at 5:15 PM, Matthias J. Sax <[email protected]> wrote:

> Hi,
>
> You can simple read the file directly in your Spout. This is an
> implementation that reads multiple files concurrently (with respect to a
> timestamp attribute that is included in the input record -- of course
> you can simplify the code if you don't have a timestamp attribute and
> just want to read a single file or multiple files after each other:
>
>
> https://github.com/mjsax/aeolus/blob/master/queries/lrb/src/main/java/de/hub/cs/dbis/lrb/operators/FileReaderSpout.java
>
> Furthermore, I use a Spout-Wrapper for controlling the ingestion rate
> (ie, spout output rate). If you want to get rid of
> nested/layered/wrapped Spouts, just merge the code of both
> implementations. I personally prefer the wrapper approach as it is very
> flexible...
>
>
> https://github.com/mjsax/aeolus/blob/master/queries/utils/src/main/java/de/hub/cs/dbis/aeolus/spouts/FixedStreamRateDriverSpout.java
>
> Feel free to use and/or modify both.
>
> -Matthias
>
>
> On 09/10/2015 10:18 PM, Nick R. Katsipoulakis wrote:
> > Hello,
> >
> > I am currently running some experiments and in order to send data to my
> > spouts, I do the following:
> >
> > I spawn external processes which read the data from files (on disk) and
> > they send them through TCP sockets to Spouts. I do the former because
> > (a) I want to control the input rate of the spouts, and (b) so that I
> > can use previously gathered data for my experiments.
> >
> > Unfortunately, when I want to maintain input rates greater than 16
> > thousands tuples per second, I see that my scheme is not fast enough,
> > and the input rate is capped. Do you think that there is a better way to
> > send (replay) previously gathered data in my topology?
> >
> > Thanks,
> > Nick
>
>


-- 
Nikolaos Romanos Katsipoulakis,
University of Pittsburgh, PhD candidate

Reply via email to