Re: Spark Partition by Columns doesn't work properly

Chanh Le Thu, 09 Jun 2016 00:35:56 -0700

Ok, thanks.

On Thu, Jun 9, 2016, 12:51 PM Jasleen Kaur <jasleenkaur1...@gmail.com>
wrote:


> The github repo is https://github.com/datastax/spark-cassandra-connector
>
> The talk video and slides should be uploaded soon on spark summit website
>
>
> On Wednesday, June 8, 2016, Chanh Le <giaosu...@gmail.com> wrote:
>
>> Thanks, I'll look into it. Any luck to get link related to.
>>
>> On Thu, Jun 9, 2016, 12:43 PM Jasleen Kaur <jasleenkaur1...@gmail.com>
>> wrote:
>>
>>> Try using the datastax package. There was a great talk on spark summit
>>> about it. It will take care of the boiler plate code and you can focus on
>>> real business value
>>>
>>> On Wednesday, June 8, 2016, Chanh Le <giaosu...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>> I tested the partition by columns of data frame but it’s not good I
>>>> mean wrong.
>>>> I am using Spark 1.6.1 load data from Cassandra.
>>>> I repartition by 2 field date, network_id - 200 partitions
>>>> I reparation by 1 field date - 200 partitions.
>>>> but my data is data of 90 days -> I mean if we reparation by date it
>>>> will be 90 partitions.
>>>>
>>>> val daily = sql
>>>>   .read
>>>>   .format("org.apache.spark.sql.cassandra")
>>>>   .options(Map("table" -> dailyDetailTableName, "keyspace" -> reportSpace))
>>>>   .load()
>>>>   .repartition(col("date"))
>>>>
>>>>
>>>>
>>>> I mean It doesn’t change the way I put the columns to repartition.
>>>>
>>>> Does anyone has the same problem?
>>>>
>>>> Thank in advance.
>>>>
>>>

Re: Spark Partition by Columns doesn't work properly

Reply via email to