I've been working on a Storm topology and recently it has become impossible
to submit my topology. It just times out. The only thing that has changed
since I was last able to successfully submit the topology is the size of
the jar/logic of the topology. It's worth mentioning that it has randomly
su
This may work.
myStream.groupBy(new Fields("user"))
.persistentAggregate(new MemoryMapState.Factory(),
new Fields("some_int_1"),
new Sum(),
new Fields("total_some_int_1")
.newValuesStream()
This may work.
myStream.groupBy(new Fields("user"))
.persistentAggregate(new MemoryMapState.Factory(),
new Fields("some_int_1"),
new Sum(),
new Fields("total_some_int_1")
.newValuesStream()
n-existent field: 'some_int_2' from stream containing
> fields fields: <[user, total_some_int_1]>
>
> What's the purpose of newValuesStream() then ?
>
>
> 2014-07-09 19:20 GMT+02:00 Sam Goodwin :
>
>> This may work.
>>
>>
I don't think this works for persistentAggregate.
On Thu, Jul 10, 2014 at 4:30 PM, Xuehui He wrote:
> copy from :
> http://storm.incubator.apache.org/documentation/Trident-API-Overview.html
>
>
> Sometimes you want to execute multiple aggregators at the same time. This
> is called chaining and
Can you show some code? 200 seconds for 15K puts sounds like you're not
batching.
On Fri, Jul 11, 2014 at 12:47 PM, Chen Wang
wrote:
> typo in previous email
> The emit method in the query bolt takes about 200(instead of 20) seconds..
>
>
> On Fri, Jul 11, 2014 at 11:58 AM, Chen Wang
> wrote:
ParalleismHint is definitely the way to scale up consumption of a Kafka
topic. Make sure the parallelism doesn't exceed the number of partitions
though and try to keep it balanced.
Not sure how your environment is set up but simply deploy more hosts to the
cluster and then use the rebalance option
Are you trying to count the number of distinct actions seen per user? The
HyperLogLog algorithm was designed for that purpose If you are using Redis
it already has an in-built HyperLogLog data structure.
I've built a variety of different data structures and used them in Storm.
What exactly is the
Even with Redis you'll need to maintain the sliding window yourself.
Does it need to be exact? If you want to estimate the number of distinct
users seen in a sliding window then use the HyperLogLog data structure with
a ring buffer. It's fast, accurate and memory efficient. For example,
allocate 6
Summingbird is a big data platform. Part of it attempts to provide a
functional interface to real-time data analysis
https://github.com/twitter/summingbird
On Thu, Jul 17, 2014 at 11:59 PM, Darryl Stoflet wrote:
> There is some exploratory work underway to translate pig jobs into storm.
> I kno
I'm not too sure about how postgres hll works but i'm assuming you're going
to have to send every tuple to Postgres DB remotely. This is very
expensive. Where if you build your hll data strucuture in storm you only
have to persist the fixed size serialized version of the hll to the
database each tr
11 matches
Mail list logo