ink that matters at all.
>
> On Thu, Oct 22, 2015 at 3:54 PM, Javier Gonzalez
> wrote:
>
>> How many workers do you have configured? Is it possible that your whole
>> topology is running within that worker?
>> On Oct 22, 2015 6:15 PM, "Dillian Murphey"
>>
How many workers do you have configured? Is it possible that your whole
topology is running within that worker?
On Oct 22, 2015 6:15 PM, "Dillian Murphey" wrote:
> We have one worker than keeps giving us some problems. First it was out
> of memory issues. We're thinking of spinning up a replace
Clojure is, AFAIK, a functional language. ;)
On Oct 20, 2015 11:42 PM, "padma priya chitturi"
wrote:
> Very nice post :) Storm is very good in terms of the capabilities it has.
> The only thing is they could have provided API in Scala as well as Python.
> Also, when debugging, understanding cloju
Configure your cluster.xml to debug level for your own packages, set storm
debug to true, and retry.
On Oct 20, 2015 3:37 PM, "Ankur Garg" wrote:
> Any idea ppl .
>
> Even though application is running and my spouts and bolts are functioning
> , worker logs are stuck and nothing is getting printe
ex to build, smaller topology can be built via
>>>>>>>> code only, I. E. Which bolt listening to which spout, but if u want to
>>>>>>>> go
>>>>>>>> with good design, I say just write a small wrapper to read some json
>>
With no further information, I would suggest checking if there was any
error in the submission (the output of the storm jar command).
If it did say "topology sumbitted", then check worker logs and see if
there's any init errors crashing your workers.
On Oct 11, 2015 11:05 PM, "Yang Nian" wrote:
>> collector) {
>>>
>>> LOG.info("Inside the open Method for RabbitListner Spout");
>>>
>>> inputManager = (InputQueueManagerImpl) ctx
>>> .getBean(InputQueueManagerImpl.class);
>>>
>>> notificationManager = (NotificationQu
, 2015 at 1:17 PM, researcher cs
wrote:
> ok , then i change it to var for example ? or is there any specific dir
> for it ?
>
> On Sat, Oct 10, 2015 at 4:42 PM, Javier Gonzalez
> wrote:
>
>> Change it. /tmp is not where you want to keep any application data.
>> On Oct
Change it. /tmp is not where you want to keep any application data.
On Oct 10, 2015 7:54 AM, "researcher cs" wrote:
> i'm new to storm and zookeeper just i want to ask about Data Dir in
> zoo.cfg
> i wrote /tmp/zookeeper
> but i read in a site that data dir not to be /tmp/zookeeper we should
> c
IIRC, only if everything you use in your spouts and bolts is serializable.
On Oct 6, 2015 11:29 PM, "Ankur Garg" wrote:
> Hi Ravi ,
>
> I was able to make an Integration with Spring but the problem is that I
> have to autowire for every bolt and spout . That means that even if i
> parallelize spo
If you mean your local desktop machine, you probably need to configure your
logging correctly.
If you mean running a topology with local submitter in a dev server... Why?
:) just run a 1 node storm cluster if you want to do that
On Oct 6, 2015 2:07 PM, "Ankur Garg" wrote:
> Hi ,
>
> I am running
x27;em to Bolt
> 2.
>
> Please confirm if this seems logical and that it should work. I think it
> should, but I may be missing something.
>
> Thanks! :)
>
> --John
>
> On Mon, Oct 5, 2015 at 9:20 AM, Javier Gonzalez
> wrote:
>
>> If I'm read
If I'm reading this correctly, I think you're not getting the result you
want - having all tuples with a given key processed in the same bolt2
instance.
If you want to have all messages of a given key to be processed in the same
Bolt2, you need to do fields grouping from bolt1 to bolt2. By doing f
t;>> https://wassermelonemann.wordpress.com/2014/01/22/tuning-storm-topologies/
>>> Talk:
>>>
>>> http://demo.ooyala.com/player.html?width=640&height=360&embedCode=Q1eXg5NzpKqUUzBm5WTIb6bXuiWHrRMi&videoPcode=9waHc6zKpbJKt9byfS7l4O4sn7Qn
>>>
>&g
t; --John
>
>
>
> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez
> wrote:
>
>> I would suggest sticking with a single worker per machine. It makes
>> memory allocation easier and it makes inter-component communication much
>> more efficient. Configure the ex
I would suggest sticking with a single worker per machine. It makes memory
allocation easier and it makes inter-component communication much more
efficient. Configure the executors with your parallelism hints to take
advantage of all your availabe CPU cores.
Regards,
JG
On Sat, Oct 3, 2015 at 12:
Off the top of my head, I would:
- have a Storm topology ready listening on kafka. If you have a few minutes
between kafka event and delivery of processed input to clients, I would
rather not waste time starting up the topology.
- Not implement 1000 topologies. That's at least 1000 jvms. Is the
pr
This is all the configuration options you can set in the storm.yaml file.
Of interest to you are the drpc.* keys:
https://storm.apache.org/javadoc/apidocs/constant-values.html#backtype.storm.Config
Regards,
Javier
On Mon, Sep 21, 2015 at 7:42 PM, researcher cs
wrote:
> i'm new in storm how can
You emit like this:
collector.emit(new Values(yourBeanInstanceHere));
You just need to wrap it in a Values object.
Regards,
Javier.
On Sep 16, 2015 9:37 AM, "Ankur Garg" wrote:
> Hi ,
>
> I am new to apache Storm . To understand it I was looking at storm
> examples provided in the storm tutori
, "researcher cs" wrote:
> Sorry for my question , i'm beginner , how can i check my supervisors and
> worker logs
>
> On Tue, Sep 15, 2015 at 8:11 AM, Javier Gonzalez
> wrote:
>
>> Check your supervisor and worker logs.
>>
>> On Mon, Sep 14, 2015 at
Check your supervisor and worker logs.
On Mon, Sep 14, 2015 at 8:14 PM, researcher cs
wrote:
> I'm new in storm and trying to submit a topology and found this
> in supervisor
>
>
>
>
> anyone can help ?
>
--
Javier González Nicolini
We use the shaded jar maven plugin. Just make sure that you mark storm as
scope provided (so that you don't get a duplicate storm jar error) and
exclude any RSA/DSA/SF signature files from the manifest folder (so that
you don't get failed signature check errors).
Regards,
Javier
On Sep 14, 2015 1:
If I am reading your code correctly, it seems you're emitting from the
spout without id - therefore, your acking efforts are not being used. You
need to do something like:
Object id= ;
_collector.emit(id,tuple);
Regards,
Javier
On Sep 8, 2015 3:19 PM, "Nick R. Katsipoulakis"
wrote:
> Hello all,
Storm itself offers nothing towards this. Where to fix it depends on how
expensive it is for you. If you can just introduce a new bolt in your
topology without a terrible penalty in resources or processing throughput,
I would do it that way. You don't have to modify anything other than the
topology
Is your message source thread safe? It is possible for four spout threads
to read the same message from the same source if the source does not
guarantee uniqueness across multiple clients.
On Aug 27, 2015 9:59 AM, "Ganesh Chandrasekaran" <
gchandraseka...@wayfair.com> wrote:
> Hi all,
>
>
>
> I am
s,
> Nithesh
>
> On Fri, Aug 21, 2015 at 9:44 PM, Javier Gonzalez
> wrote:
>
>> Hi Nithesh,
>>
>> Mind that the storm metrics are gathered by sampling, so it isn't unusual
>> that the counts are slightly off. I think the default is 5% sampling, it
>&
We had issues with Spring and Storm. What we did is the following: don't do
anything on the constructor. Perhaps pass a String with the location of the
Spring configuration file. In the prepare or open method (for bolts and
spouts respectively) initialize a context with the context file location
an
Hi Nithesh,
Mind that the storm metrics are gathered by sampling, so it isn't unusual
that the counts are slightly off. I think the default is 5% sampling, it
can be tweaked to more in storm.yaml, but it will impact your performance
if you ramp it up.
Regards,
Javier
Hi,
I need to coordinate between two spouts, and could use the group's insight
into this. The scenario:
- One spout (let's call it event spout) receives the incoming data stream.
- One spout is the "control" spout, which will receive messages from a
different stream, that can impact the way that
How many ackers have you got configured when you submit your topology?
On Aug 17, 2015 5:57 PM, "Stuart Perks" wrote:
> Hi I am attempting to run guaranteed message processing but ACK is not
> being called. Post on stack overflow if you prefer answer there.
> http://stackoverflow.com/questions/32
Use mem min and max in child opts, so that the low work topologies have
little initial memory, but the high volume ones can grow accordingly.
On Aug 14, 2015 5:44 AM, "jinhong lu" wrote:
> Hi, I have got a storm cluster of about 20 machines, 40G mem,24 core.
>
> But my cluster have about 300 top
eason I think Zk solution is better.
> On Fri, Aug 14, 2015 at 12:05 PM Javier Gonzalez
> wrote:
>
>> I was thinking of using another spout as "control channel", and from that
>> spout manipulate the original spout to cause the nextTuple method to not
>> call th
gt;
> Thanks Again!
>
> --John
>
> On Fri, Aug 14, 2015 at 2:59 PM, Javier Gonzalez
> wrote:
>
>> You will have a detrimental effect to wiring in boltB, even if it does
>> nothing but ack. Every tuple you have processed from A has to travel to a B
>> bolt, and th
Zk to signal it. On receiving
> signal it can block and unblock appropriately the nextTuple() call
> On Fri, Aug 14, 2015 at 5:34 AM Javier Gonzalez
> wrote:
>
>> I'm trying to ensure everything is processed for coordination with an
>> external system. Therefore, on a gi
You will have a detrimental effect to wiring in boltB, even if it does
nothing but ack. Every tuple you have processed from A has to travel to a B
bolt, and the ack has to travel back.
You could try modifying the number of ackers, and playing with the number
of A and B bolts. How many workers do y
ing to stop only
> a subset of those Spouts?
>
> $ storm help deactivate
> Syntax: [storm deactivate topology-name]
>
> Deactivates the specified topology's spouts.
>
>
>
> On Thu, Aug 13, 2015 at 2:12 PM Javier Gonzalez
> wrote:
>
>> On a more br
On a more broader term, can you share the strategies you've used to pause
(not emit anything else into the topology and not read anything else from
the data source) a topology's spouts?
Thanks,
Javier
On Aug 13, 2015 2:53 PM, "Javier Gonzalez" wrote:
> Hi,
>
> I
Hi,
I have a use case where I would need to stop a spout from emitting for a
period of time. I'm looking at the activate /deactivate methods, but
there's not much information apart from the API and the java base classes
have empty implementations. Can anybody shed any insight on how those work?
T
Just to make sure I'm understanding correctly: Do you have a single stream
of sequential ids or multiple streams that need to be interpolated? Do you
receive a stream of ids and emit a stream of timestamped ids?
On Aug 11, 2015 5:34 PM, "Alec Lee" wrote:
> Hello, all
>
> Here I have a question ab
but
>> there is no way to scale C (even if you add more parallelism, the
>> throughput wouldn't improve as it would have to process 2 messages in
>> serial)
>>
>> I do not think there is a cost to having more streams and so choosing the
>> second option might
Hi all,
Suppose I have a bolt A that has to send information to two bolts B and C.
Each bolt must receive different information from the original A bolt.
Which of these strategies is more efficient?
Strategy 1:
- have A declare a single output stream, with fields "forB" and "forC".
- Emit all the
Have you tried using the same collector in the thread and the bolt?
On Jul 24, 2015 4:20 PM, "Hong Jeon" wrote:
> Hi,
>
> Lets say I have a Bolt that spawns a separate thread in it that handles
> all of the emits and acks. This has worked without errors (I assume since
> emits and acks are still
Try the following:
- 1 worker per machine (to minimize inter-jvm messaging) and adjust
childopts so it takes as much memory as you can without bringing down the
machine
- as many threads as available cpu cores and no more (to avoid thread
context switching)
That should give you some reduction of t
And of course, there's then the matter of the BOLT crashing and losing the
state, unless you keep it in a separate store, and... you get the idea.
On Fri, Jul 3, 2015 at 2:20 PM, Javier Gonzalez wrote:
> You need to keep track of state in your bolt before writing. Yes, it is
> ind
You need to keep track of state in your bolt before writing. Yes, it is
indeed quite a chore. For "exactly once", particularly when coordinating
with external systems such as databases or output queues, Storm is, ah, not
exactly the best fit. Unless keeping track of processed events to avoid
duplic
I'll second this... You are free to use Kafka or anything as your data
source.
However, your use case does not suggest storm to me. If speed is not
essential and exactly once semantics are essential, perhaps you could just
use something like a transacted queue to gather client messages and a
proce
some
> research on the matter because I do not know many things about Java GC.
>
> Thank you for your time.
>
> Regards,
> Nick
>
>
> 2015-06-28 13:02 GMT-04:00 Javier Gonzalez :
>
>> Perhaps you could put explicit GC logs in the childopts so that you see
>
; Thank you again.
>
> Regards,
> Nick
>
> 2015-06-28 11:32 GMT-04:00 Javier Gonzalez :
>
>> It could be that heavy usage of an executor's machine prevents the
>> executor from communicating with nimbus, hence it appears "dead" to nimbus,
>> even
It could be that heavy usage of an executor's machine prevents the executor
from communicating with nimbus, hence it appears "dead" to nimbus, even
though it's still working. I think we saw something like this some time
during our PoC development, and it was fixed by allocating more memory to
our w
back-to-back, then start of the next flush period is same as
> the end of previous period.
>
>
>
> Thanks,
>
> Satish
>
>
>
> On Wed, Jun 24, 2015 at 4:20 PM, Javier Gonzalez
> wrote:
>
> Hi Satish,
>
> Thank you for your response.
>
> The ack at
plies that tuple has completely traversed the topology. Isn't
> that sufficient?
>
> On Tue, Jun 23, 2015 at 10:50 PM, Javier Gonzalez
> wrote:
>
>> Hi,
>>
>> Question: how would you implement a "flush" in a topology: sending a
>> special message
Hi,
Question: how would you implement a "flush" in a topology: sending a
special message to the topology that will in time return with a message
that says everything up to the flush message has finished traversing the
topology? (does that make sense?)
Regards,
Javier
- run nimbus in one machine
- run supervisor in all four
- specify four workers when creating/configuring the topology
- submit topology to nimbus.
This should result in the topology elements being distributed among all
available servers.
If you require specific pairings (e.g. spout MUST be in se
know its using single instance of the session? Is there a way
> we can know from storm UI or and where else?
>
> --
> Kushan Maskey
> 817.403.7500
> Precocity LLC <http://precocity.com>
> M. Miller & Associates <http://mmillerassociates.com/>
> kushan.mas...@mmille
First thing that comes to mind is: pass a Map as one of the Values within
your tuple. Name it "header". Done.
On Tue, Jun 16, 2015 at 10:01 AM, Bas van de Lustgraaf <
basvdlustgr...@gmail.com> wrote:
> Hi Guys,
>
> I would like to implement the same concept of an event header as Apache
> flume us
We had a similar issue (namely, you can't pass anything non-serializable
through the Configuration or create it in the constructor).
What we did is pass in the Configuration or constructor a String with the
path to a properties file. From that configuration file, in the
bolt.prepare method you bri
Mind that the "at least once" guarantees applies only to "regular"
processing (i.e. storm will replay tuples that time out). Re-emitting when
one of the bolts fails explicitly is your responsibility (on the spout
code).
On Tue, Jun 9, 2015 at 5:49 PM, Andrew Xor
wrote:
> Regular topologies do no
ngtack Baek | Precocity, LLC*
>
> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>
> *seungtackb...@precocityllc.com * |
> www.precocityllc.com
>
>
> This is the end of this message.
>
> --
>
> On Mon, Jun 8, 2015 at 6:26 PM, Javier Gonzalez
> wrote:
&
I would say, configure so that your total parallelism matches the number of
cores available (i.e. if you have a topology with X spouts, Y boltAs and Z
boltBs, make it so that X+Y+Z = cores available). And one worker per
machine, inter-JVM communications are expensive. When you have more bolts
and
Hi,
If it's a custom Spout (i.e. you wrote it) it's completely up to you to
re-emit in the event of a failed tuple. You get whatever ID you sent down
the topology when you emitted from the spout, so make sure you are able to
somehow get the message back from that ID to re-emit.
Regards,
JG
On Mo
Couple of things I'd suggest to check:
1.- Perhaps your data is skewed, i.e. the hash function sends the bulk of
the messages to a single bolt? Check in the storm UI the number of executed
tuples in each bolt. If this is the case, then all the paralelism you can
set won't give you gains. You'd nee
Yes, this is the way to increase memory for storm. I would add one caveat:
this is the per worker memory allocation. So one has to dimension the
memory available in the machine and plan for *number of workers*
accordingly, e.g. a server with 64GB RAM will struggle if you start on it
ten workers wi
No objection here. I work at a big company where upgrades move fast like
glaciers ;) and even we are up to java7.
Regards,
JG
On Mon, Jun 1, 2015 at 2:37 PM, P. Taylor Goetz wrote:
> CC user@
>
> I’d like to poll the community about the possibility of dropping support
> for Java 1.6 in the Stor
You only have to make sure JAVA_HOME and your path point to the java8
installation for every storm process (nimbus and supervisor) you start.
I've had no problem using storm with Java8. The only times I've had
problems is when someone changes JAVA_HOME to java6 or 7 and then storm jar
throws the us
nk the answer to your question hinges off of this statement:
> “
> I believe the farming out of the processing to different nodes is hurting
> our performance.
> "
> What makes you believe this?
>
>
>
>
> From: Javier Gonzalez
> Reply-To: "user@storm
Isn't the cleanup method guaranteed to be called only while running as
local topology?
On May 13, 2015 9:20 AM, "Jeffery Maass" wrote:
> Bolts which implement IBolt have a method called cleanup() which is called
> by the Storm framework.
>
> https://storm.apache.org/apidocs/backtype/storm/task/IB
t; give you the capability to use multiple spouts to read from the same topic.
>
> Supun..
>
> On Sat, May 9, 2015 at 4:57 PM, Javier Gonzalez
> wrote:
>
>> Hi,
>>
>> I'm currently approaching the design of an application that will have a
>> single sour
Hi,
I'm currently approaching the design of an application that will have a
single source of data from AMPS (high speed pub-sub system like Kafka). We
are currently facing the issue that the spout is much faster than the
bolts, and I believe the farming out of the processing to different nodes
is
It can be done, with the curator api. We did it in the middle of a PoC a
month ago or so, to store some history that would be needed to detect if an
incoming event was already processed. It performed well. Unfortunately, I
can't share any code as that goes against my contract. I think what we did
w
Had a similar experience - too many emits would jam the spout and it would
never get around to processing the acks received from the bolts. We "fixed"
it by introducing artificial 1ms sleep in the spout processing so that
there was enough idle capacity to run the acks. I doubt that's the better
sol
exists;. *
>
> See if it actually does support conditional insert/update and if you can
> use this feature.
>
> Thanks
> Parth
>
> From: Javier Gonzalez
> Reply-To: "user@storm.apache.org"
> Date: Tuesday, March 3, 2015 at 10:43 AM
> To: "user@st
should also tell you which constraint was
> violated, you can ignore the unique constraint violations and ack back so
> the spout will stop retrying.
>
> Its not clean but should work.
>
> Thanks
> Parth
>
> From: Javier Gonzalez
> Reply-To: "user@storm.apac
Hi guys,
We're looking at storm to solve a message processing scenario that needs to
be horizontally scalable for high projected volume. The use case goes like
this:
1.- receive messages from external source.
2.- generate a set of messages from this external input, based on rules.
3.- persi
73 matches
Mail list logo