Re: Future of Logback?

2013-11-11 Thread Ted Dunning
+1 to what Michael says Drill is using logback and there has been zero pressure to move to another framework. On Mon, Nov 11, 2013 at 10:15 PM, Michael Rose wrote: > I believe when Jon says "log4j" he refers to log4j2. Log4j2 is yet another > successor to log4j, which claims to solve issues i

Re: Recommender Engines on top of Storm

2014-01-11 Thread Ted Dunning
Slope one is one of the few algorithms that performs so uniformly poorly that it is being removed from Mahout. I wouldn't recommend it for any applications. In general, recommendation isn't particularly well suited for on-line operation since there is part of the computation that substantially be

Re: Recommender Engines on top of Storm

2014-01-11 Thread Ted Dunning
You can even add many forms of business logic into the same query such as geographic constraints. > > > > 2014/1/11 Ted Dunning > >> >> ... >> Here are the links: >> >> [1] http://research.microsoft.com/apps/pubs/default.aspx?id=122779 >

Re: Recommender Engines on top of Storm

2014-01-11 Thread Ted Dunning
On Sat, Jan 11, 2014 at 1:31 PM, Klausen Schaefersinho < klaus.schaef...@gmail.com> wrote: > @Ted: Thanks for your great response. Just one little question. With > > > cooccurrence analysis and is focused on sparsification of the > cooccurrence matrix to produce an indicator matrix > > you mean t

Re: Large binary payloads with storm

2014-01-12 Thread Ted Dunning
Consider also whether you even *want* to pass large objects through your tuples. If this will cause many copies of the object with no modification or reference, you might be much better off leaving your object in a static cache and simply passing around an ID. There are many heuristics for managi

Re: Storm Zookeeper Client

2014-01-13 Thread Ted Dunning
That is a fine idea if the information is not expected to change very often. Zookeeper will not keep up with even a small Storm cluster that is trying to change things at a high rate. You have to multiply the change rate by the number of clients listening for changes. If this product exceeds sev

Re: Compute the top 100 million in the total 10 billion data efficiently.

2014-01-21 Thread Ted Dunning
Top what? Most frequent? Or the top 1% based on some score attached to the tuples. The latter is trivial. The former less so. If you have the score problem, you just need to use an approximate quantile algorithm like t-digest to get a continuous estimate of the 99-th percentile. For the

Re: Re[2]: Compute the top 100 million in the total 10 billion data efficiently.

2014-01-22 Thread Ted Dunning
On Tue, Jan 21, 2014 at 7:31 AM, wrote: > You mentioned a approximate algorithm. That's great! I will check it out > later. But, Is there a way to calculate it in a precise way? If you want to select the 1% largest numbers, then you have a few choices. If you have memory for the full set, you

Re: Lambdoop - We are hiring! CTO-Founder2Be based in Madrid, Spain

2014-01-29 Thread Ted Dunning
On Wed, Jan 29, 2014 at 8:24 AM, Info wrote: > Our engineering team has been hardly working on implementing our initial > product and now we are ready to launch and grow a new innovative BigData > company that will make BigData application development easier and faster. You should proofread you

Re: Netty Errors, chain reaction, topology breaks down

2014-03-04 Thread Ted Dunning
Feedback from Drill is that the next version (4.x) of Netty works better than the Storm current (3). Drill also does more explicit memory management so this might be a red herring. On Tue, Mar 4, 2014 at 2:36 AM, Richards Peter wrote: > Hi Drew, > > Good that you identified the root cause for

Re: Topology is stuck

2014-04-09 Thread Ted Dunning
In what sense do you mean when you say that reads in ZK are eventually consistent? You may get a slightly old value, but you are guaranteed to see a consistent history. That is, if a value has values (which include version numbers) v_1 ... v_n, then if you see v_i, you will never see v_j where j

Re: Topology is stuck

2014-04-10 Thread Ted Dunning
bug if >> ran for a week. Other correlated factors may include that the trident >> topology has to occasionally fail batches, the zookeeper cluster has to be >> under significant load from other applications beyond trident. I don't many >> much details unfortunately. >

Re: storm trident question

2014-05-12 Thread Ted Dunning
Spark streaming is a very different animal than Storm in that it does micro-batching rather than true streaming. This has positives and negatives. Average latency on record by record processing will appear to be abysmal compared to Storm. Throughput could well be much higher because of the inher

Re: Interesting Comparison

2014-05-12 Thread Ted Dunning
Anybody who has ever only paid 40K$ to IBM for anything should deserve a prize. That is just the entry fee. On Mon, May 12, 2014 at 7:46 AM, Marc Vaillant wrote: > To play devil's advocate, if you believe the stream performance gains, > then the 40k will likely pay for itself in needing to de

Re: Interesting Comparison

2014-05-13 Thread Ted Dunning
Regardless of what IBM does, there is clearly a lot that Storm is not doing for performance. This is largely by design for simplicity. For instance, storm could use reflection and byte code engineering to merge bolts while still allowing rearrangement and rebalancing. Likewise, when throughput gets

Re: Are Real-Time Game Servers a good use case for Storm

2014-06-08 Thread Ted Dunning
On Sun, Jun 8, 2014 at 12:12 PM, joe roberts wrote: > Also, it seems Storm uses TCP via ZeroMQ by default -Is that right? And > if so, can it be switched to use UDP or UDT instead, perhaps by replacing > ZeroMQ with Netty? > Why would you want that?

Re: Are Real-Time Game Servers a good use case for Storm

2014-06-08 Thread Ted Dunning
Why do you think that UDP is faster? On Sun, Jun 8, 2014 at 6:27 PM, joe roberts wrote: > To make it faster! > > > On 6/8/2014 8:27 PM, Ted Dunning wrote: > > > On Sun, Jun 8, 2014 at 12:12 PM, joe roberts < > carl.roberts.zap...@gmail.com> wrote: > >

Re: Are Real-Time Game Servers a good use case for Storm

2014-06-08 Thread Ted Dunning
eptable, and some cases where reliable messages are > needed (UDT), so for my particular use-cases, it is. As I understand it, > Netty offers, UDP, UDT, and TCP classes, therefore, it provides what I need. > > On 6/8/2014 11:07 PM, Ted Dunning wrote: > > > Why do you think t

Re: Are Real-Time Game Servers a good use case for Storm

2014-06-08 Thread Ted Dunning
aking measurements. JMH is your friend. On Sun, Jun 8, 2014 at 9:11 PM, Ted Dunning wrote: > > If you read the replies on the SO question, you will find lots of people > refuting the "UDP is faster" mantra. > > If you haven't already benchmarked Storm to determine th

Re: Are Real-Time Game Servers a good use case for Storm

2014-06-08 Thread Ted Dunning
ly stateless > set of servers is really the way to go. > > Michael Rose (@Xorlev <https://twitter.com/xorlev>) > Senior Platform Engineer, FullContact <http://www.fullcontact.com/> > mich...@fullcontact.com > > > On Sun, Jun 8, 2014 at 9:07 PM, Ted Dunning wrote: > &

Re: Are Real-Time Game Servers a good use case for Storm

2014-06-08 Thread Ted Dunning
mitation - Actions > Command Interface (Client to Server ) * > > > > > * - - Emotable Actions (/dance) - Take blue.sword - Give blue.sword to joe > - Object identifiers for nouns - If red.goblin near a player is ID 30232 > then client sends: kill 30232 * > > > * - Voice -

Re: Are Real-Time Game Servers a good use case for Storm

2014-06-08 Thread Ted Dunning
nt to Server ) * >> >> >> >> >> * - - Emotable Actions (/dance) - Take blue.sword - Give blue.sword to >> joe - Object identifiers for nouns - If red.goblin near a player is ID >> 30232 then client sends: kill 30232 * >> >> >> * - Voice - Channel

Re: [VOTE] Storm Logo Contest - Final Round

2014-06-09 Thread Ted Dunning
+5 points Number 10 On Mon, Jun 9, 2014 at 11:38 AM, P. Taylor Goetz wrote: > This is a call to vote on selecting the winning Storm logo from the 3 > finalists. > > The three candidates are: > > * [No. 6 - Alec Bartos]( > http://storm.incubator.apache.org/2014/04/23/logo-abartos.html) > * [No

Re: [VOTE] Storm Logo Contest - Final Round

2014-06-09 Thread Ted Dunning
I think that this vote is invalid. The points add up to more than 5. One option is to reduce all by 5/8. Better option is for Binh to vote again with a correct sum, say with 3 and 2 points. On Mon, Jun 9, 2014 at 12:12 PM, Binh Nguyen Van wrote: > #9 - 5 pts. > #10 - 3 pts. > > > On Mon, Jun

Re: Apache Storm vs Apache Spark

2014-06-09 Thread Ted Dunning
They are different. Storm allows right now processing of tuples. Spark streaming requires micro batching (which may be a really short time). Spark streaming allows checkpointing of partial results in the stream supported by the framework. Storm says you should roll your own or use trident. App

Re: Apache Storm vs Apache Spark

2014-06-09 Thread Ted Dunning
el. e.g. >> count of orders in last 1 minute, in Storm I have to write code to for >> sliding windows and state management, while Spark seems to provide >> operators to accomplish that. Tuple level operations such as enrichment, >> filters etc.. seems also doable in both. &

Re: [VOTE] Storm Logo Contest - Final Round

2014-06-09 Thread Ted Dunning
I love it. This is a real horse race! On Mon, Jun 9, 2014 at 2:17 PM, Adam Lewis wrote: > #10 - 5 pts. > > > On Mon, Jun 9, 2014 at 5:02 PM, joe roberts > wrote: > >> 10 = 5 pts. >> >> >> On 6/9/2014 2:38 PM, P. Taylor Goetz wrote: >> >> This is a call to vote on selecting the winning Stor

Re: Apache Storm vs Apache Spark

2014-06-09 Thread Ted Dunning
On Mon, Jun 9, 2014 at 2:27 PM, P. Taylor Goetz wrote: > There is one study that I’m aware of that claims Spark streaming is > insanely faster than Storm. I like your way of describing the two tools as starting from differing extremes with a common territory around micro-batching. As such, it

Re: Apache Storm vs Apache Spark

2014-06-09 Thread Ted Dunning
On Mon, Jun 9, 2014 at 3:48 PM, Rajiv Onat wrote: > a) I have stream of orders (keyed on customerid, source is socket) > b) I filter for those orders that is from my high value customers (I have > to make sure I have this list of high value customers available on all bolt > tasks in memory for fa

Re: Apache Storm vs Apache Spark

2014-06-09 Thread Ted Dunning
te, in Storm I have to write code to for > sliding windows and state management, while Spark seems to provide > operators to accomplish that. Tuple level operations such as enrichment, > filters etc.. seems also doable in both. > > > On Mon, Jun 9, 2014 at 12:24 PM, Ted Dunning >

Re: Extracting Performance Metrics

2014-06-16 Thread Ted Dunning
If you can afford a bit more time for insertion, consider also t-digest. Differences relative to the high dynamic range histogram system include: - HDR histograms assume an exponential distribution. t-digest handles arbitrary distributions - t-digest is much more accurate near extreme values.

Re: Extracting Performance Metrics

2014-06-16 Thread Ted Dunning
CodeHale doesn't handle extreme skew on measurements well last time I looked. For throughput, averages are great. For latency, you need very high percentiles to understand what is happening. On Mon, Jun 16, 2014 at 6:00 PM, Michael Rose wrote: > What kind of issues does Metrics have that lead