Re: unsubscribe

2014-02-08 Thread Ankur Chauhan
I, for one, really liked google groups. Nabble is pretty painful to use. On Feb 8, 2014, at 16:27, Mayur Rustagi mayur.rust...@gmail.com wrote: Recently moved the mailing list from Google and deactivated that one :) Just send a mail to user-unsubscr...@spark.incubator.apache.org

Re: Hadoop MapReduce on Spark

2014-02-01 Thread Ankur Chauhan
I think the whole idea of the spark API is to simplify building iterative workflows/algorithms when compared to Hadoop's bloated API I am not saying it's completely wrong or anything although it would be clearer if you had a particular use case in mind that you wish to tackle. On Feb 1,

Re: Python API Performance

2014-02-01 Thread Ankur Chauhan
How does Julia interact with spark. I would be interested, mainly because I seem to find scala syntax a little obscure and it would be great to see actual numbers comparing scala, Python, Julia workloads. On Feb 1, 2014, at 16:08, Aureliano Buendia buendia...@gmail.com wrote: A much (much)

Re: Quality of documentation (rant)

2014-01-19 Thread Ankur Chauhan
Hi ognen, I am in the same boat as you are. I actually work on a project that basically does this exact same thing and tried to aggregations by spark but it ended up with me trying again and again to do simple things like reading from s3 and such and failing. I have a little bit of knowledge

Re: Which of the hadoop file formats are supported by Spark ?

2014-01-18 Thread Ankur Chauhan
You may also want to consider Parquet (http://parquet.io). It is pretty efficient http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/ -- Ankur Chauhan

Re: Please help: virtualization type 'hvm' when I try to launch ec2 ssd instance

2014-01-10 Thread Ankur Chauhan
Hi, I ran into the same problem, the AMI that is used but he EC2 scripts is a paravirtualization one. For i2 and c3 nodes you need the newer HVM virtualization. I think the ec2-scripts don't support HVM instances. I ended up manually setting spark or you could try using the chef recipe to

Bump: on disk storage formats

2013-12-08 Thread Ankur Chauhan
Hi all, Sorry for posting this again but I am interested in finding out what different on disk data formats for storing timeline event and analytics aggregate data. Currently I am just using newline delimited json gzipped files. I was wondering if there were any recommendations. -- Ankur

Re: Bump: on disk storage formats

2013-12-08 Thread Ankur Chauhan
wrote: LZO compression at a minimum, and using Parquet as a second step, seems like the way to go though I haven't tried either personally yet. Sent from my mobile phone On Dec 8, 2013, at 16:54, Ankur Chauhan achau...@brightcove.com wrote: Hi all, Sorry for posting this again but I am

Re: Opinions stratosphere

2013-11-22 Thread Ankur Chauhan
/java/eu/stratosphere/pact/example/wordcount/WordCount.java On Thu, Nov 21, 2013 at 3:15 PM, Ankur Chauhan achau...@brightcove.com wrote: Hi, I was just curious about https://github.com/stratosphere/stratosphere and how does spark compare to it. Anyone has any experience with it to make

Opinions stratosphere

2013-11-21 Thread Ankur Chauhan
Hi, I was just curious about https://github.com/stratosphere/stratosphere and how does spark compare to it. Anyone has any experience with it to make any comments? -- Ankur