Hi All,
I've just cut the release branch for Spark 1.2, consistent with then
end of the scheduled feature window for the release. New commits to
master will need to be explicitly merged into branch-1.2 in order to
be in the release.
This begins the transition into a QA period for Spark 1.2, with
thanks everyone, that worked. I had been just cleaning the sql project,
which wasn't enough, but a full clean of everything and its happy now.
just in case this helps anybody else come up with steps to reproduce, for
me the error was always in DataTypeConversions.scala, and I think it
*might*
Minor question, but when would be the right time to update the default
Spark version
https://github.com/apache/spark/blob/76386e1a23c55a58c0aeea67820aab2bac71b24b/ec2/spark_ec2.py#L42
in the EC2 script?
On Mon, Nov 3, 2014 at 3:55 AM, Patrick Wendell pwend...@gmail.com wrote:
Hi All,
I've
Hey Patrick,
It's Ozgun from Citus Data. We'd like to make these benchmark results fair,
and have tried different config settings for SparkSQL over the past month.
We picked the best config settings we could find, and also contacted the
Spark users list about running TPC-H numbers.
I added the drivers for precisionAt(k: Int) driver for the movielens
test-cases...Although I am a bit confused on precisionAt(k: Int) code from
RankingMetrics.scala...
While cross validating, I am really not sure how to set K...
if (labSet.nonEmpty) { val n = math.min(pred.length, k) ... }
If I
Hi,
I am testing MatrixFactorizationModel.predict(user: Int, product: Int) but
the code fails on userFeatures.lookup(user).head
In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has been
called and in all the test-cases that API has been used...
I can perhaps refactor my code to
Hi everyone,
I'm running into more and more cases where too many files are opened when
spark.shuffle.consolidateFiles is turned off.
I was wondering if this is a common scenario among the rest of the
community, and if so, if it is worth considering the setting to be turned on
by default. From
Hi everyone,
I'm running into more and more cases where too many files are opened when
spark.shuffle.consolidateFiles is turned off.
I was wondering if this is a common scenario among the rest of the
community, and if so, if it is worth considering the setting to be turned on
by default. From
In Spark 1.1, the sort-based shuffle (spark.shuffle.manager=sort) will have
better performance while creating fewer files. So I'd suggest trying that too.
Matei
On Nov 3, 2014, at 6:12 PM, Andrew Or and...@databricks.com wrote:
Hey Matt,
There's some prior work that compares
(BTW this had a bug with negative hash codes in 1.1.0 so you should try
branch-1.1 for it).
Matei
On Nov 3, 2014, at 6:28 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
In Spark 1.1, the sort-based shuffle (spark.shuffle.manager=sort) will have
better performance while creating fewer
Hey Andrew, Matei,
Thanks for responding.
For some more context, we were running into Too many open files issues
where we were seeing this happen immediately after the Collect phase
(about 30 seconds into a run) on a decently sized dataset (14 MM rows).
The ulimit set in the spark-env was
Was user presented in training? We can put a check there and return
NaN if the user is not included in the model. -Xiangrui
On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das debasish.da...@gmail.com wrote:
Hi,
I am testing MatrixFactorizationModel.predict(user: Int, product: Int) but
the code
12 matches
Mail list logo