Another option could be to use a sketch to get approx median(extendable to
quantiles as well) for a large number of tasks sketch would give accurate
value as tasks are few, for larger task the benefit will be good.
Regards,
Mayur Rustagi
Ph: +1 (650) 937 9673
http://www.sigmoid.com <h
We should take a vector instead giving the user flexibility to decide
data source/ type
What do you mean by vector datatype exactly?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Wed, Nov 5, 2014 at 6:45 AM
(Sigmoid Analytics)
Not to mention Spark Pig communities.
Regards
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
Interesting, clickstream data would have its own window concept based on
session of User , I can imagine windows would change across streams but
wouldnt they large be domain specific in Nature?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com
https://groups.google.com/forum/#!topic/gcp-hadoop-announce/EfQms8tK5cE
I suspect they are using thr own builds.. has anybody had a chance to look
at it?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
We have a json feed of spark application interface that we use for easier
instrumentation monitoring. Has that been considered/found relevant?
Already sent as a pull request to 0.9.0, would that work or should we
update it to 1.0.0?
Mayur Rustagi
Ph: +1 (760) 203 3257
http
usecase directly.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Tue, May 6, 2014 at 11:22 AM, prabeesh k prabsma...@gmail.com wrote:
Hi,
I have seen three different ways to query data from Spark
1. Default SQL
Huge joins would be interesting. I do all my demos on wikipedia dataset for
Shark. Joins are typical pain to showcase show off :)
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Wed, Apr 23, 2014 at 10:33 AM, Ajay Nair
I am creating one fully configured synced one. But you still need to send
over configuration. Do you plan to use chef for that ?
On Apr 10, 2014 6:58 PM, Jim Ancona j...@anconafamily.com wrote:
Are there scripts to build the AMI used by the spark-ec2 script?
Alternatively, is there a place
copy paste?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Mon, Mar 10, 2014 at 12:30 PM, David Thomas dt5434...@gmail.com wrote:
Is there any guide available on creating a custom RDD?
returns data
back to driver.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Thu, Mar 6, 2014 at 7:39 PM, qingyang li liqingyang1...@gmail.comwrote:
*Hi, community, I have setup 3 nodes spark cluster using standalone mode
11 matches
Mail list logo