subject:"How to use a custom partitioner in a dataframe in Spark"

Re: How to use a custom partitioner in a dataframe in Spark

2016-02-18 Thread Koert Kuipers

although it is not a bad idea to write data out partitioned, and then use a merge join when reading it back in, this currently isn't even easily doable with rdds because when you read an rdd from disk the partitioning info is lost. re-introducing a partitioner at that point causes a shuffle

Re: How to use a custom partitioner in a dataframe in Spark

2016-02-18 Thread Rishi Mishra

Michael, Is there any specific reason why DataFrames does not have partitioners like RDDs ? This will be very useful if one is writing custom datasources , which keeps data in partitions. While storing data one can pre-partition the data at Spark level rather than at the datasource. Regards,

Re: How to use a custom partitioner in a dataframe in Spark

2016-02-17 Thread swetha kasireddy

So suppose I have a bunch of userIds and I need to save them as parquet in database. I also need to load them back and need to be able to do a join on userId. My idea is to partition by userId hashcode first and then on userId. So that I don't have to deal with any performance issues because of a

Re: How to use a custom partitioner in a dataframe in Spark

2016-02-17 Thread swetha kasireddy

So suppose I have a bunch of userIds and I need to save them as parquet in database. I also need to load them back and need to be able to do a join on userId. My idea is to partition by userId hashcode first and then on userId. On Wed, Feb 17, 2016 at 11:51 AM, Michael Armbrust

Re: How to use a custom partitioner in a dataframe in Spark

2016-02-17 Thread Rishi Mishra

Unfortunately there is not any, at least till 1.5. Have not gone through the new DataSet of 1.6. There is some basic support for Parquet like partitionByColumn. If you want to partition your dataset on a certain way you have to use an RDD to partition & convert that into a DataFrame before

How to use a custom partitioner in a dataframe in Spark

2016-02-16 Thread SRK

Hi, How do I use a custom partitioner when I do a saveAsTable in a dataframe. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-a-custom-partitioner-in-a-dataframe-in-Spark-tp26240.html Sent from the Apache Spark User List

Re: How to use a custom partitioner in a dataframe in Spark

Re: How to use a custom partitioner in a dataframe in Spark

Re: How to use a custom partitioner in a dataframe in Spark

Re: How to use a custom partitioner in a dataframe in Spark

Re: How to use a custom partitioner in a dataframe in Spark

How to use a custom partitioner in a dataframe in Spark

6 matches

Site Navigation

Mail list logo

Footer information