Hi Nick,

thanks. I like Aeolus, too ;)

If you want to make sure that a specific spout/bolt in scheduled to a
specific node, you need to provide a custom scheduler.

See here for an example:
https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/


-Matthias

On 09/15/2015 03:31 PM, Nick R. Katsipoulakis wrote:
> Hey Matthias, 
> 
> I apologize for the late response, but I was busy with some additional
> changes to my code. Thank you very much for your reply and the code
> snippets you provided (love the name "aeolus" by the way :-) ). The only
> reason that I have not created my File-provider as a spout is because I
> do not always know on which node my spout is spawned. Therefore, there
> might be a setting in which the file with the data is not co-located
> with the spout. Do you have any work-around for this problem?
> 
> Thanks again,
> Nick
> 
> On Thu, Sep 10, 2015 at 5:15 PM, Matthias J. Sax <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi,
> 
>     You can simple read the file directly in your Spout. This is an
>     implementation that reads multiple files concurrently (with respect to a
>     timestamp attribute that is included in the input record -- of course
>     you can simplify the code if you don't have a timestamp attribute and
>     just want to read a single file or multiple files after each other:
> 
>     
> https://github.com/mjsax/aeolus/blob/master/queries/lrb/src/main/java/de/hub/cs/dbis/lrb/operators/FileReaderSpout.java
> 
>     Furthermore, I use a Spout-Wrapper for controlling the ingestion rate
>     (ie, spout output rate). If you want to get rid of
>     nested/layered/wrapped Spouts, just merge the code of both
>     implementations. I personally prefer the wrapper approach as it is very
>     flexible...
> 
>     
> https://github.com/mjsax/aeolus/blob/master/queries/utils/src/main/java/de/hub/cs/dbis/aeolus/spouts/FixedStreamRateDriverSpout.java
> 
>     Feel free to use and/or modify both.
> 
>     -Matthias
> 
> 
>     On 09/10/2015 10:18 PM, Nick R. Katsipoulakis wrote:
>     > Hello,
>     >
>     > I am currently running some experiments and in order to send data
>     to my
>     > spouts, I do the following:
>     >
>     > I spawn external processes which read the data from files (on
>     disk) and
>     > they send them through TCP sockets to Spouts. I do the former because
>     > (a) I want to control the input rate of the spouts, and (b) so that I
>     > can use previously gathered data for my experiments.
>     >
>     > Unfortunately, when I want to maintain input rates greater than 16
>     > thousands tuples per second, I see that my scheme is not fast enough,
>     > and the input rate is capped. Do you think that there is a better
>     way to
>     > send (replay) previously gathered data in my topology?
>     >
>     > Thanks,
>     > Nick
> 
> 
> 
> 
> -- 
> Nikolaos Romanos Katsipoulakis,
> University of Pittsburgh, PhD candidate

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to