Spark 2.0.1 release?

2016-09-16 Thread Ewan Leith
Hi all, Apologies if I've missed anything, but is there likely to see a 2.0.1 bug fix release, or does a jump to 2.1.0 with additional features seem more probable? The issues for 2.0.1 seem pretty much done here https://issues.apache.org/jira/browse/SPARK/fixforversion/12336857/?selectedTab=com

Re: Spark 2.0.1 release?

2016-09-16 Thread Ewan Leith
ek for rc. On Fri, Sep 16, 2016 at 11:16 AM, Ewan Leith mailto:ewan.le...@realitymine.com>> wrote: Hi all, Apologies if I've missed anything, but is there likely to see a 2.0.1 bug fix release, or does a jump to 2.1.0 with additional features seem more probable? The issues for 2

RE: SparkUI via proxy

2016-11-25 Thread Ewan Leith
This is more of a question for the spark user’s list, but if you look at FoxyProxy and SSH tunnels it’ll get you going. These instructions from AWS for accessing EMR are a good start http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html http://docs.aws.amazon.com

Any way for users to help "stuck" JIRAs with pull requests for Spark 2.3 / future releases?

2017-12-21 Thread Ewan Leith
Hi all, I was wondering with the approach of Spark 2.3 if there's any way us "regular" users can help advance any of JIRAs that could have made it into Spark 2.3 but are likely to miss now as the pull requests are awaiting detailed review. For example: https://issues.apache.org/jira/browse/SPA

Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Ewan Leith
Hi all, We really like the ability to infer a schema from JSON contained in an RDD, but when we're using Spark Streaming on small batches of data, we sometimes find that Spark infers a more specific type than it should use, for example if the json in that small batch only contains integer value

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Ewan Leith
hat we'll probably have to adopt if we can't come up with a way to keep the inference working. Thanks, Ewan -- Original message-- From: Reynold Xin Date: Thu, 1 Oct 2015 22:12 To: Ewan Leith; Cc: dev@spark.apache.org; Subject:Re: Dataframe nested schema inference fr

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-01 Thread Ewan Leith
Exactly, that's a much better way to put it. Thanks, Ewan -- Original message-- From: Yin Huai Date: Thu, 1 Oct 2015 23:54 To: Ewan Leith; Cc: r...@databricks.com;dev@spark.apache.org; Subject:Re: Dataframe nested schema inference from Json without type conflicts Hi Ewan,

RE: Dataframe nested schema inference from Json without type conflicts

2015-10-05 Thread Ewan Leith
it currently works, does anyone think a pull request would plausibly get into the Spark main codebase? Thanks, Ewan From: Ewan Leith [mailto:ewan.le...@realitymine.com] Sent: 02 October 2015 01:57 To: yh...@databricks.com Cc: r...@databricks.com; dev@spark.apache.org Subject: Re: Dataframe

Re: Dataframe nested schema inference from Json without type conflicts

2015-10-05 Thread Ewan Leith
Thanks Yin, I'll put together a JIRA and a PR tomorrow. Ewan -- Original message-- From: Yin Huai Date: Mon, 5 Oct 2015 17:39 To: Ewan Leith; Cc: dev@spark.apache.org; Subject:Re: Dataframe nested schema inference from Json without type conflicts Hello Ewan, Adding a

RE: Dataframe nested schema inference from Json without type conflicts

2015-10-23 Thread Ewan Leith
llable = true) |-- long: string (nullable = true) |-- null: string (nullable = true) |-- string: string (nullable = true) Thanks, Ewan From: Yin Huai [mailto:yh...@databricks.com] Sent: 01 October 2015 23:54 To: Ewan Leith Cc: r...@databricks.com; dev@spark.apache.org Subject: Re: Dataframe neste

RE: Spark 1.6.1

2016-01-25 Thread Ewan Leith
Hi Brandon, It's relatively straightforward to try out different type options for this in the spark-shell, try pasting the attached code into spark-shell before you make a normal postgres JDBC connection. You can then experiment with the mappings without recompiling Spark or anything like th

Selecting column in dataframe created with incompatible schema causes AnalysisException

2016-03-02 Thread Ewan Leith
When you create a dataframe using the sqlContext.read.schema() API, if you pass in a schema that's compatible with some of the records, but incompatible with others, it seems you can't do a .select on the problematic columns, instead you get an AnalysisException error. I know loading the wrong

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Ewan Leith
I think this new issue in JIRA blocks the release unfortunately? https://issues.apache.org/jira/browse/SPARK-16664 - Persist call on data frames with more than 200 columns is wiping out the data Otherwise there'll need to be 2.0.1 pretty much right after? Thanks, Ewan On 23 Jul 2016 03:46, Xia

Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-23 Thread Ewan Leith
n that. I will document this as a known issue in the release notes. We have other bugs that we have fixed since RC5, and we can fix those together in 2.0.1. On July 22, 2016 at 10:24:32 PM, Ewan Leith (ewan.le...@realitymine.com<mailto:ewan.le...@realitymine.com>) wrote: I think this new

Re: How to resolve the SparkExecption : Size exceeds Integer.MAX_VALUE

2016-08-15 Thread Ewan Leith
I think this is more suited to the user mailing list than the dev one, but this almost always means you need to repartition your data into smaller partitions as one of the partitions is over 2GB. When you create your dataset, put something like . repartition(1000) at the end of the command crea