Re: is there a way for removing hadoop from spark

2017-11-12 Thread trsell
@Jörn Spark without Hadoop is useful - For using sparks programming model on a single beefy instance - For testing and integrating with a CI/CD pipeline. It's ugly to have tests which depend on a cluster running somewhere. On Sun, 12 Nov 2017 at 17:17 Jörn Franke

Re: welcoming Burak and Holden as committers

2017-01-25 Thread trsell
Congratulations! On Thu, 26 Jan 2017, 02:27 Bryan Cutler, wrote: > Congratulations Holden and Burak, well deserved!!! > > On Tue, Jan 24, 2017 at 10:13 AM, Reynold Xin wrote: > > Hi all, > > Burak and Holden have recently been elected as Apache Spark

Re: Aggregating over sorted data

2016-12-22 Thread trsell
I would love this feature On Thu, 22 Dec 2016, 18:45 assaf.mendelson, wrote: > It seems that this aggregation is for dataset operations only. I would > have hoped to be able to do dataframe aggregation. Something along the line > of: sort_df(df).agg(my_agg_func) > > > >

Re: Can I add a new method to RDD class?

2016-12-04 Thread trsell
How does your application fetch the spark dependency? Perhaps list your project dependencies and check it's using your dev build. On Mon, 5 Dec 2016, 08:47 tenglong, wrote: > Hi, > > Apparently, I've already tried adding a new method to RDD, > > for example, > > class RDD

Re: Why the json file used by sparkSession.read.json must be a valid json object per line

2016-10-16 Thread trsell
Think of it as jsonl instead of a json file. Point people at this if they need an official looking spec: http://jsonlines.org/ One good reason for using this format is you can split mid file easily. This make it work well with standard unix tools in pipes. On Sun, 16 Oct 2016 at 16:24

Re: critical bugs to be fixed in Spark 2.0.1?

2016-08-22 Thread trsell
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-17100 It's a blocker for upgrading. I'd be happy to try and fix it if anyone has any hints. On Tue, 23 Aug 2016, 04:20 Robert Kruszewski, wrote: > SPARK-16991 (https://github.com/apache/spark/pull/14661)