date:20181029

Re: dremel paper example schema

2018-10-29 Thread lchorbadjiev

Hi Gourav, the question in fact is are there any the limitations of Apache Spark support for Parquet file format. The example schema from the dremel paper is something that is supported in Apache Parquet (using Apache Parquet Java API). Now I am trying to implement the same schema using Apache S

Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-29 Thread akshay naidu

how about Python. java vs scala vs python vs R which is better. On Sat, Oct 27, 2018 at 3:34 AM karan alang wrote: > Hello > - is there a "performance" difference when using Java or Scala for Apache > Spark ? > > I understand, there are other obvious differences (less code with scala, > easier t

Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-29 Thread Gourav Sengupta

I genuinely do not think that Scala for Spark needs us to be super in Scala. There is infact a tutorial called as "Just enough Scala for Spark" which even with my IQ does not take more than 40 mins to go through. Also the sytax of Scala is almost always similar to that of Python. Data processing i

Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-29 Thread kant kodali

Most people when they compare two different programming languages 99% of the time it all seems to boil down to syntax sugar. Performance I doubt Scala is ever faster than Java given that Scala likes Heap more than Java. I had also written some pointless micro-benchmarking code like (Random String

Re: java vs scala for Apache Spark - is there a performance difference ?

2018-10-29 Thread Jean Georges Perrin

did not see anything, but curious if you find something. I think one of the big benefit of using Java, for data engineering in the context of Spark, is that you do not have to train a lot of your team to Scala. Now if you want to do data science, Java is probably not the best tool yet... > On

Re: dremel paper example schema

2018-10-29 Thread Debasish Das

Open source impl of dremel is parquet ! On Mon, Oct 29, 2018, 8:42 AM Gourav Sengupta wrote: > Hi, > > why not just use dremel? > > Regards, > Gourav Sengupta > > On Mon, Oct 29, 2018 at 1:35 PM lchorbadjiev < > lubomir.chorbadj...@gmail.com> wrote: > >> Hi, >> >> I'm trying to reproduce the exa

Re: dremel paper example schema

2018-10-29 Thread Gourav Sengupta

Hi, why not just use dremel? Regards, Gourav Sengupta On Mon, Oct 29, 2018 at 1:35 PM lchorbadjiev wrote: > Hi, > > I'm trying to reproduce the example from dremel paper > (https://research.google.com/pubs/archive/36632.pdf) in Apache Spark using > pyspark and I wonder if it is possible at all

dremel paper example schema

2018-10-29 Thread lchorbadjiev

Hi, I'm trying to reproduce the example from dremel paper (https://research.google.com/pubs/archive/36632.pdf) in Apache Spark using pyspark and I wonder if it is possible at all? Trying to follow the paper example as close as possible I created this document type: from pyspark.sql.types import

Re: Processing Flexibility Between RDD and Dataframe API

2018-10-29 Thread Gourav Sengupta

Hi, I would recommend reading the book by Matei Zaharia. One of the main differentiating factors between Spark 1.x and subsequent releases has been optimization and hence dataframes, and in no way RDD is going away because dataframes are built on RDD's. The use of RDD's are allowed and is recommen

Re: Processing Flexibility Between RDD and Dataframe API

2018-10-29 Thread Jungtaek Lim

Just 2 cents on just one of contributors: while SQL semantic can express various use cases data scientists encounter, I also agree someone who are end users who are more familiar with code instead of SQL can feel it is not flexible. But counterless efforts have been incorporated into Spark SQL (an

Re: dremel paper example schema

Re: java vs scala for Apache Spark - is there a performance difference ?

Re: java vs scala for Apache Spark - is there a performance difference ?

Re: java vs scala for Apache Spark - is there a performance difference ?

Re: java vs scala for Apache Spark - is there a performance difference ?

Re: dremel paper example schema

Re: dremel paper example schema

dremel paper example schema

Re: Processing Flexibility Between RDD and Dataframe API

Re: Processing Flexibility Between RDD and Dataframe API

10 matches

Site Navigation

Mail list logo

Footer information