Re: Custom Partitioning in Catalyst

2017-06-16 Thread Reynold Xin
Seems like a great idea to do? On Fri, Jun 16, 2017 at 12:03 PM, Russell Spitzer wrote: > I considered adding this to DataSource APIV2 ticket but I didn't want to > be first :P Do you think there will be any issues with opening up the > partitioning as well? > > On

Re: Custom Partitioning in Catalyst

2017-06-16 Thread Reynold Xin
Perhaps we should extend the data source API to support that. On Fri, Jun 16, 2017 at 11:37 AM, Russell Spitzer wrote: > I've been trying to work with making Catalyst Cassandra partitioning > aware. There seem to be two major blocks on this. > > The first is that

Custom Partitioning in Catalyst

2017-06-16 Thread Russell Spitzer
I've been trying to work with making Catalyst Cassandra partitioning aware. There seem to be two major blocks on this. The first is that DataSourceScanExec is unable to learn what the underlying partitioning should be from the BaseRelation it comes from. I'm currently able to get around this by

Re: structured streaming documentation does not match behavior

2017-06-16 Thread Shixiong(Ryan) Zhu
I created https://issues.apache.org/jira/browse/SPARK-21123. PR is welcome. On Thu, Jun 15, 2017 at 10:55 AM, Shixiong(Ryan) Zhu < shixi...@databricks.com> wrote: > Good catch. These are file source options. Could you submit a PR to fix > the doc? Thanks! > > On Thu, Jun 15, 2017 at 10:46 AM,

Re: How does MapWithStateRDD distribute the data

2017-06-16 Thread coolcoolkid
Hello, I have encountered some situation just like what is described above. I am running a Spark Streaming Application with 2 executors, 16 cores and 10G memory for each executor and the input topic Kafka has 64 partitions. My code are like this: