@Jörn Spark without Hadoop is useful
- For using sparks programming model on a single beefy instance
- For testing and integrating with a CI/CD pipeline.
It's ugly to have tests which depend on a cluster running somewhere.
On Sun, 12 Nov 2017 at 17:17 Jörn Franke
Congratulations!
On Thu, 26 Jan 2017, 02:27 Bryan Cutler, wrote:
> Congratulations Holden and Burak, well deserved!!!
>
> On Tue, Jan 24, 2017 at 10:13 AM, Reynold Xin wrote:
>
> Hi all,
>
> Burak and Holden have recently been elected as Apache Spark
I would love this feature
On Thu, 22 Dec 2016, 18:45 assaf.mendelson, wrote:
> It seems that this aggregation is for dataset operations only. I would
> have hoped to be able to do dataframe aggregation. Something along the line
> of: sort_df(df).agg(my_agg_func)
>
>
>
>
How does your application fetch the spark dependency? Perhaps list your
project dependencies and check it's using your dev build.
On Mon, 5 Dec 2016, 08:47 tenglong, wrote:
> Hi,
>
> Apparently, I've already tried adding a new method to RDD,
>
> for example,
>
> class RDD
Think of it as jsonl instead of a json file.
Point people at this if they need an official looking spec:
http://jsonlines.org/
One good reason for using this format is you can split mid file easily.
This make it work well with standard unix tools in pipes.
On Sun, 16 Oct 2016 at 16:24
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-17100
It's a blocker for upgrading.
I'd be happy to try and fix it if anyone has any hints.
On Tue, 23 Aug 2016, 04:20 Robert Kruszewski, wrote:
> SPARK-16991 (https://github.com/apache/spark/pull/14661)