Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Mayur Rustagi
Another option could be to use a sketch to get approx median(extendable to quantiles as well) for a large number of tasks sketch would give accurate value as tasks are few, for larger task the benefit will be good. Regards, Mayur Rustagi Ph: +1 (650) 937 9673 http://www.sigmoid.com <h

Re: [MLlib] Contributing Algorithm for Outlier Detection

2014-11-07 Thread Mayur Rustagi
We should take a vector instead giving the user flexibility to decide data source/ type What do you mean by vector datatype exactly? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Wed, Nov 5, 2014 at 6:45 AM

Update on Pig on Spark initiative

2014-08-27 Thread Mayur Rustagi
(Sigmoid Analytics) Not to mention Spark Pig communities. Regards Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi

Re: [brainsotrming] Generalization of DStream, a ContinuousRDD ?

2014-08-01 Thread Mayur Rustagi
Interesting, clickstream data would have its own window concept based on session of User , I can imagine windows would change across streams but wouldnt they large be domain specific in Nature? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com

Google Cloud Engine adds out of the box Spark/Shark support

2014-06-26 Thread Mayur Rustagi
https://groups.google.com/forum/#!topic/gcp-hadoop-announce/EfQms8tK5cE I suspect they are using thr own builds.. has anybody had a chance to look at it? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi

Fwd: Monitoring / Instrumenting jobs in 1.0

2014-05-31 Thread Mayur Rustagi
We have a json feed of spark application interface that we use for easier instrumentation monitoring. Has that been considered/found relevant? Already sent as a pull request to 0.9.0, would that work or should we update it to 1.0.0? Mayur Rustagi Ph: +1 (760) 203 3257 http

Re: Better option to use Querying in Spark

2014-05-06 Thread Mayur Rustagi
usecase directly. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Tue, May 6, 2014 at 11:22 AM, prabeesh k prabsma...@gmail.com wrote: Hi, I have seen three different ways to query data from Spark 1. Default SQL

Re: Spark on wikipedia dataset

2014-04-23 Thread Mayur Rustagi
Huge joins would be interesting. I do all my demos on wikipedia dataset for Shark. Joins are typical pain to showcase show off :) Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Wed, Apr 23, 2014 at 10:33 AM, Ajay Nair

Re: Building Spark AMI

2014-04-11 Thread Mayur Rustagi
I am creating one fully configured synced one. But you still need to send over configuration. Do you plan to use chef for that ? On Apr 10, 2014 6:58 PM, Jim Ancona j...@anconafamily.com wrote: Are there scripts to build the AMI used by the spark-ec2 script? Alternatively, is there a place

Re: Custom RDD

2014-03-10 Thread Mayur Rustagi
copy paste? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Mon, Mar 10, 2014 at 12:30 PM, David Thomas dt5434...@gmail.com wrote: Is there any guide available on creating a custom RDD?

Re: when run the same job, time that spark used is very diffrent from shark.

2014-03-07 Thread Mayur Rustagi
returns data back to driver. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Thu, Mar 6, 2014 at 7:39 PM, qingyang li liqingyang1...@gmail.comwrote: *Hi, community, I have setup 3 nodes spark cluster using standalone mode