+1
On Wednesday, July 20, 2016, Krishna Sankar wrote:
> +1 (non-binding, of course)
>
> 1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min
> mvn clean package -Pyarn -Phadoop-2.7 -DskipTests
> 2. Tested pyspark, mllib (iPython 4.0)
> 2.0 Spark version is
+1 (non-binding, of course)
1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min
mvn clean package -Pyarn -Phadoop-2.7 -DskipTests
2. Tested pyspark, mllib (iPython 4.0)
2.0 Spark version is 2.0.0
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Lasso Regression
+1
Sent from my iPad
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
I've run some tests with some real and some synthetic parquet data with nested
columns with and without the hive metastore on our Spark 1.5, 1.6 and 2.0
versions. I haven't seen any unexpected performance surprises, except that
Spark 2.0 now does schema inference across all files in a
@Michael,
I answered in Jira and could repeat here.
I think that my problem is unrelated to Hive, because I'm using
read.parquet method.
I also attached some VisualVM snapshots to SPARK-16321 (I think I should
merge both issues)
And code profiling suggest bottleneck when reading parquet file.
I
I refer to Maciej Bryński's (mac...@brynski.pl) emails of 29 and 30 June
2016 to this list. He said that his benchmarking suggested that Spark 2.0
was slower than 1.6.
I'm wondering if that was ever investigated, and if so if the speed is back
up, or not.
On Wed, Jul 20, 2016 at 12:18 PM,
Marcin,
I'm not sure what you're referring to. Can you be more specific?
Cheers,
Michael
> On Jul 20, 2016, at 9:10 AM, Marcin Tustin wrote:
>
> Whatever happened with the query regarding benchmarks? Is that resolved?
>
> On Tue, Jul 19, 2016 at 10:35 PM, Reynold Xin
Whatever happened with the query regarding benchmarks? Is that resolved?
On Tue, Jul 19, 2016 at 10:35 PM, Reynold Xin wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and
+1
SHA and MD5 sums match for all binaries. Docs look fine this time
around. Built and ran `dev/run-tests` with Java 7 on a linux machine.
No blocker bugs on JIRA and the only critical bug with target as 2.0.0
is SPARK-16633, which doesn't look like a release blocker. I also
checked issues which
Greetings!
We're reading input files with newApiHadoopFile that is configured with
multiline split. Everything's fine, besides
https://issues.apache.org/jira/browse/MAPREDUCE-6549. It looks like the
issue is fixed, but within hadoop 2.7.2. Which means we have to download
spark without hadoop and
10 matches
Mail list logo