Hello Mat (again), I followed your advice and wrote my own Scheduler and used some of your aeolus code too! It works just fine and it makes my testing much easier (less terminals :-D ). However, now I see that my bandwidth is limited by the disc bandwidth. Matters get worse since I am working on AWS and the nodes that have the files use EBS storage.
Do you happen to have any advice on how I can avoid the disc latency and achieve higher input rates? I know that buffering is one way to go. However, I am afraid that even if I add additional threads on my code, they will be blocked every time the worker context switches my task. Thanks, Nick On Tue, Sep 15, 2015 at 10:20 AM, Nick R. Katsipoulakis < [email protected]> wrote: > Hello again, > > Thank you for the link and the info. I am going to look into this in more > detail. > > Cheers, > Nick > > On Tue, Sep 15, 2015 at 9:43 AM, Matthias J. Sax <[email protected]> wrote: > >> Hi Nick, >> >> thanks. I like Aeolus, too ;) >> >> If you want to make sure that a specific spout/bolt in scheduled to a >> specific node, you need to provide a custom scheduler. >> >> See here for an example: >> >> https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/ >> >> >> -Matthias >> >> On 09/15/2015 03:31 PM, Nick R. Katsipoulakis wrote: >> > Hey Matthias, >> > >> > I apologize for the late response, but I was busy with some additional >> > changes to my code. Thank you very much for your reply and the code >> > snippets you provided (love the name "aeolus" by the way :-) ). The only >> > reason that I have not created my File-provider as a spout is because I >> > do not always know on which node my spout is spawned. Therefore, there >> > might be a setting in which the file with the data is not co-located >> > with the spout. Do you have any work-around for this problem? >> > >> > Thanks again, >> > Nick >> > >> > On Thu, Sep 10, 2015 at 5:15 PM, Matthias J. Sax <[email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Hi, >> > >> > You can simple read the file directly in your Spout. This is an >> > implementation that reads multiple files concurrently (with respect >> to a >> > timestamp attribute that is included in the input record -- of >> course >> > you can simplify the code if you don't have a timestamp attribute >> and >> > just want to read a single file or multiple files after each other: >> > >> > >> https://github.com/mjsax/aeolus/blob/master/queries/lrb/src/main/java/de/hub/cs/dbis/lrb/operators/FileReaderSpout.java >> > >> > Furthermore, I use a Spout-Wrapper for controlling the ingestion >> rate >> > (ie, spout output rate). If you want to get rid of >> > nested/layered/wrapped Spouts, just merge the code of both >> > implementations. I personally prefer the wrapper approach as it is >> very >> > flexible... >> > >> > >> https://github.com/mjsax/aeolus/blob/master/queries/utils/src/main/java/de/hub/cs/dbis/aeolus/spouts/FixedStreamRateDriverSpout.java >> > >> > Feel free to use and/or modify both. >> > >> > -Matthias >> > >> > >> > On 09/10/2015 10:18 PM, Nick R. Katsipoulakis wrote: >> > > Hello, >> > > >> > > I am currently running some experiments and in order to send data >> > to my >> > > spouts, I do the following: >> > > >> > > I spawn external processes which read the data from files (on >> > disk) and >> > > they send them through TCP sockets to Spouts. I do the former >> because >> > > (a) I want to control the input rate of the spouts, and (b) so >> that I >> > > can use previously gathered data for my experiments. >> > > >> > > Unfortunately, when I want to maintain input rates greater than 16 >> > > thousands tuples per second, I see that my scheme is not fast >> enough, >> > > and the input rate is capped. Do you think that there is a better >> > way to >> > > send (replay) previously gathered data in my topology? >> > > >> > > Thanks, >> > > Nick >> > >> > >> > >> > >> > -- >> > Nikolaos Romanos Katsipoulakis, >> > University of Pittsburgh, PhD candidate >> >> > > > -- > Nikolaos Romanos Katsipoulakis, > University of Pittsburgh, PhD candidate > -- Nikolaos Romanos Katsipoulakis, University of Pittsburgh, PhD student
