RE: Using Storm to parse emails and creates batches

2015-12-01 Thread Kalogeropoulos, Andreas
x27;d want to use field grouping and group on a field that contains the hash. Then every tuple that has that field with the identical hashes would get sent to the same bolt instance. On Tue, Dec 1, 2015 at 8:23 PM, Kalogeropoulos, Andreas mailto:andreas.kalogeropou...@emc.com>> wrote: Makin

RE: Using Storm to parse emails and creates batches

2015-12-01 Thread Kalogeropoulos, Andreas
want to eliminate duplicates or make sure that duplicates make it into the same XML file (third bolt)? On Tue, Dec 1, 2015 at 7:48 PM, Kalogeropoulos, Andreas mailto:andreas.kalogeropou...@emc.com>> wrote: Hello Stephen, Imagine that the spout is providing me 300 000 emails per hour. The

RE: Using Storm to parse emails and creates batches

2015-12-01 Thread Kalogeropoulos, Andreas
Hello Stephen, To make my example more realistic : The first bolt will analyze a list of 100 tuples And the last bolt, will probably wait for 10 000 list of tuples before creating the XML. Kind Regards, Andréas Kalogéropoulos From: Kalogeropoulos, Andreas [mailto:andreas.kalogeropou

RE: Using Storm to parse emails and creates batches

2015-12-01 Thread Kalogeropoulos, Andreas
at 7:28 PM, Kalogeropoulos, Andreas mailto:andreas.kalogeropou...@emc.com>> wrote: You are right. Sorry for making you state the obvious ☺. Last question : If my spout has incoming information that I want to have in the same last bolt (the one creating the XML) for deduplication logic, what

RE: Using Storm to parse emails and creates batches

2015-12-01 Thread Kalogeropoulos, Andreas
d On Tue, Dec 1, 2015 at 5:15 PM, Kalogeropoulos, Andreas mailto:andreas.kalogeropou...@emc.com>> wrote: Hello Stephen, I think you got I correctly. Thanks a lot for the idea. If you have seen limitations, please send the disclaimers ☺ . For example, how did you handle persistence of th

RE: Using Storm to parse emails and creates batches

2015-12-01 Thread Kalogeropoulos, Andreas
use that to trigger checking the time constraint is checked on a regular basis (example being send a tick tuple every 1, 5, or 10 seconds) On Tue, Dec 1, 2015 at 1:42 AM, Kalogeropoulos, Andreas mailto:andreas.kalogeropou...@emc.com>> wrote: Hello, I want to use Storm to do three things :

RE: Using Storm to parse emails and creates batches

2015-12-01 Thread Kalogeropoulos, Andreas
any I/O (i.e. contact an external storage) and the processing cost. I hope that you will find my email useful. On Mon, Nov 30, 2015 at 11:42 AM, Kalogeropoulos, Andreas mailto:andreas.kalogeropou...@emc.com>> wrote: Hello, I want to use Storm to do three things : 1. Parse emails data

Using Storm to parse emails and creates batches

2015-11-30 Thread Kalogeropoulos, Andreas
Hello, I want to use Storm to do three things : 1. Parse emails data (from/ to / cc/ subject ) from incoming SMTP source 2. Add additional information (based on sender email) 3. Create an XML based on this data, to inject in another solution Only issue, I want step 1 (and 2)