Thank you for the suggestions, actually this project is already on
spark-packages for 1~2 months.
Then I think what I need is some promotions :P
2015-08-25 23:51 GMT+08:00 saurfang [via Apache Spark Developers List]
ml-node+s1001551n1380...@n3.nabble.com:
This is very cool. I also have a sbt
This probably means your app is failing and the second attempt is
hitting that issue. You may fix the directory already exists error
by setting
spark.eventLog.overwrite=true in your conf, but most probably that
will just expose the actual error in your app.
On Tue, Aug 25, 2015 at 9:37 AM,
Final chance to fill out the survey!
http://goo.gl/forms/erct2s6KRR
I'm gonna close it to new responses tonight and send out a summary of the
results.
Nick
On Thu, Aug 20, 2015 at 2:08 PM Nicholas Chammas nicholas.cham...@gmail.com
wrote:
I'm planning to close the survey to further responses
Here is the error
yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User
class threw exception: Log directory
hdfs://Sandbox/user/spark/applicationHistory/application_1438113296105_0302
already exists!)
I am using cloudera 5.3.2 with Spark 1.2.0
Any help is appreciated.
On Tue, Aug 25, 2015 at 2:17 AM, andrew.row...@thomsonreuters.com wrote:
Then, if I wanted to do a build against a specific profile, I could also
pass in a -Dspark.version=1.4.1-custom-string and have the output artifacts
correctly named. The default behaviour should be the same. Child pom
Please vote on releasing the following candidate as Apache Spark version
1.5.0. The vote is open until Friday, Aug 29, 2015 at 5:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.
[ ] +1 Release this package as Apache Spark 1.5.0
[ ] -1 Do not release this package because ...
Hi, Reynold and others
I agree with your comments on mid-tenured objects and GC. In fact, dealing with
mid-tenured objects are the major challenge for all java GC implementations.
I am wondering if anyone has played -XX:+PrintTenuringDistribution flags and
see how exactly ages distribution
Is there a jira to update the sql hive docs?Spark SQL and DataFrames - Spark
1.5.0 Documentation
| |
| | | | | |
| Spark SQL and DataFrames - Spark 1.5.0 DocumentationSpark SQL and DataFrame
Guide Overview DataFrames Starting Point: SQLContext Creating DataFrames
DataFrame
You can add it to the spark packages i guess http://spark-packages.org/
Thanks
Best Regards
On Fri, Aug 14, 2015 at 1:45 PM, pishen tsai pishe...@gmail.com wrote:
Sorry for previous line-breaking format, try to resend the mail again.
I have written a sbt plugin called spark-deployer, which
I've got an interesting challenge in building Spark. For various reasons we
do a few different builds of spark, typically with a few different profile
options (e.g. against different versions of Hadoop, some with/without Hive
etc.). We mirror the spark repo internally and have a buildserver that
Hello y'all,
So I've been getting kinda annoyed with how many PR tests have been
timing out. I took one of the logs from one of my PRs and started to
do some crunching on the data from the output, and here's a list of
the 5 slowest suites:
307.14s HiveSparkSubmitSuite
382.641s VersionsSuite
398s
I'd be okay skipping the HiveCompatibilitySuite for core-only changes.
They do often catch bugs in changes to catalyst or sql though. Same for
HashJoinCompatibilitySuite/VersionsSuite.
HiveSparkSubmitSuite/CliSuite should probably stay, as they do test things
like addJar that have been broken by
Anyone using HiveContext with secure Hive with Spark 1.5 and have it working?
We have a non standard version of hive but was pulling our hive jars and its
failing to authenticate. It could be something in our hive version but
wondering if spark isn't forwarding credentials properly.
Tom
There is already code in place that restricts which tests run
depending on which code is modified. However, changes inside of
Spark's core currently require running all dependent tests. If you
have some ideas about how to improve that heuristic, it would be
great.
- Patrick
On Tue, Aug 25, 2015
Thank you for the explanation. The size if the 100M data is ~1.4GB in memory
and each worker has 32GB of memory. It seems to be a lot of free memory
available. I wonder how Spark can hit GC with such setup?
Reynold Xin r...@databricks.commailto:r...@databricks.com
On Fri, Aug 21, 2015 at
It works for me in cluster mode.
I’m running on Hortonworks 2.2.4.12 in secure mode with Hive 0.14
I built with
./make-distribution —tgz -Phive -Phive-thriftserver -Phbase-provided -Pyarn
-Phadoop-2.6
Doug
On Aug 25, 2015, at 4:56 PM, Tom Graves tgraves...@yahoo.com.INVALID wrote:
On Fri, Aug 21, 2015 at 11:07 AM, Ulanov, Alexander alexander.ula...@hp.com
wrote:
It seems that there is a nice improvement with Tungsten enabled given that
data is persisted in memory 2x and 3x. However, the improvement is not that
nice for parquet, it is 1.5x. What’s interesting, with
I chatted with Patrick briefly offline. It would be interesting to
know whether the scripts have some way of saying run a smaller
version of certain tests (e.g. by setting a system property that the
tests look at to decide what to run). That way, if there are no
changes under sql/, we could still
This isn't really answering the question, but for what it is worth, I
manage several different branches of Spark and publish custom named
versions regularly to an internal repository, and this is *much* easier
with SBT than with maven. You can actually link the Spark SBT build into
an external
19 matches
Mail list logo