Hi:
In paper Item-Based Top-N Recommendation
Algorithms(https://stuyresearch.googlecode.com/hg/blake/resources/10.1.1.102.4451.pdf),
there are two parameters measuring the quality of recommendation: HR and ARHR.
If I use ALS(Implicit) for top-N recommendation system, I want to check it's
We have not tried the work-around because there are other bugs in there
that affected our set-up, though it seems it would help.
On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote:
+1 to have the work around in.
I'll be investigating from the Mesos side too.
Tim
On
FYI: Looks like the Mesos folk also have a bot to do automatic linking, but
it appears to have been provided to them somehow by ASF.
See this comment as an example:
I am running the code with @rxin's patch in standalone mode. In my case I am
registering org.apache.spark.graphx.GraphKryoRegistrator .
Recently I started to see com.esotericsoftware.kryo.KryoException:
java.io.IOException: failed to uncompress the chunk: PARSING_ERROR . Has
anyone seen this?
Hi jerryye,
Maybe if you voted up my question on Stack Overflow it would get some
traction and we would get nearer to a solution.
Thanks,
Amnon
--
View this message in context:
This is kind of weird then, seems perhaps unrelated to this issue (or at least
to the way I understood it). Is the problem maybe that Mesos saw 0 MB being
freed and didn't re-offer the machine *even though there was more than 32 MB
free overall*?
Matei
On August 25, 2014 at 12:59:59 PM, Cody
Anyway it would be good if someone from the Mesos side investigates this and
proposes a solution. The 32 MB per task hack isn't completely foolproof either
(e.g. people might allocate all the RAM to their executor and thus stop being
able to launch tasks), so maybe we wait on a Mesos fix for
Thanks for working on this! Its unclear at the moment exactly how we are
going to handle this, since the end goal is to be compatible with as many
versions of Hive as possible. That said, I think it would be great to open
a PR in this case. Even if we don't merge it, thats a good way to get it
- dev list
+ user list
You should be able to query Spark SQL using JDBC, starting with the 1.1
release. There is some documentation is the repo
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#running-the-thrift-jdbc-server,
and we'll update the official docs once the
Was the original issue with Spark 1.1 (i.e. master branch) or an earlier
release?
One possibility is that your S3 bucket is in a remote Amazon region, which
would make it very slow. In my experience though saveAsTextFile has worked even
for pretty large datasets in that situation, so maybe
One other idea - when things freeze up, try to run jstack on the spark
shell process and on the executors and attach the results. It could be that
somehow you are encountering a deadlock somewhere.
On Mon, Aug 25, 2014 at 1:26 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Was the original
Hey Nicholas,
That seems promising - I prefer having a proper link to having that fairly
verbose comment though, because in some cases there will be dozens of
comments and it could get lost. I wonder if they could do something where
it posts a link instead...
- Patrick
On Mon, Aug 25, 2014 at
Hi Matei,
At least in my case, the s3 bucket is in the same region. Running count()
works and so does generating synthetic data. What I saw was that the job
would hang for over an hour with no progress but tasks would immediately
start finishing if I cached the data.
- jerry
On Mon, Aug 25,
Hi Matei,
I'm going to investigate from both Mesos and Spark side will hopefully
have a good long term solution. In the mean time having a work around
to start with is going to unblock folks.
Tim
On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Anyway it would be
From my perspective, there're few benefits regarding Hive 0.13.1+. The
following are the 4 major ones that I can see why people are asking to upgrade
to Hive 0.13.1 recently.
1. Performance and bug fix, patches. (Usual case)
2. Native support for Parquet format, no need to provide custom JARs
Hi Patrick,
Here's the process:
java -cp
/root/ephemeral-hdfs/conf/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly-1.1.1-SNAPSHOT-hadoop1.0.4.jar
-XX:MaxPermSize=128m -Djava.library.path=/root/ephemeral-hdfs/lib/native/
-Xms5g -Xmx10g
Hi,
Unless you manually patched Spark, if you have Reynold’s patch for SPARK-2878,
you also have the patch for SPARK-2893 which makes the underlying cause much
more obvious and explicit. So the below is unlikely to be related to
SPARK-2878.
Graham
On 26 Aug 2014, at 4:13 am, npanj
Hi Matei,
The original issue happened on a spark-1.0.2-bin-hadoop2 installation.
I will try the synthetic operation and see if I get the same results or not.
Amnon
On Mon, Aug 25, 2014 at 11:26 PM, Matei Zaharia [via Apache Spark
Developers List] ml-node+s1001551n8000...@n3.nabble.com wrote:
My problem is that I'm not sure this workaround would solve things, given the
issue described here (where there was a lot of memory free but it didn't get
re-offered). If you think it does, it would be good to explain why it behaves
like that.
Matei
On August 25, 2014 at 2:28:18 PM, Timothy
Hi Michael,
This is great news.
Any initial proposal or design about the caching to Tachyon that you
can share so far?
I don't think there is a JIRA ticket open to track this feature yet.
- Henry
On Mon, Aug 25, 2014 at 1:13 PM, Michael Armbrust
mich...@databricks.com wrote:
What is the plan
Got it. Another thing that would help is if you spot any exceptions or failed
tasks in the web UI (http://driver:4040).
Matei
On August 25, 2014 at 3:07:41 PM, amnonkhen (amnon...@gmail.com) wrote:
Hi Matei,
The original issue happened on a spark-1.0.2-bin-hadoop2 installation.
I will try
I don't think it solves Cody's problem which still need more
investigating, but I believe it does solve the problem you described
earlier.
I just confirmed with Mesos folks that we no longer need the minimum
memory requirement so we'll be dropping that soon and the workaround
might not be needed
Hi Folks,
We are testing our home-made KMeans algorithm using Spark on Yarn.
Recently, we've found that the application failed frequently when doing
clustering over 300,000,000 users (each user is represented by a feature
vector and the whole data set is around 600,000,000). After digging into
I posted the fix on the JIRA ticket
(https://issues.apache.org/jira/browse/SPARK-3190). To update the user list,
this is indeed an integer overflow problem when summing up the partition sizes.
The fix is to use Longs for the sum: https://github.com/apache/spark/pull/2106.
Ankur
Check this out:
https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc
We're hitting close to 300 open PRs. Those are the least recently updated
ones.
I think having a low number of stale (i.e. not recently updated) PRs is a
good thing to shoot for. It doesn't leave
Hi,
I've exercised multiple options available for persist() including RDD
replication. I have gone thru the classes that involve in caching/storing
the RDDS at different levels. StorageLevel class plays a pivotal role by
recording whether to use memory or disk or to replicate the RDD on multiple
Hey Nicholas,
In general we've been looking at these periodically (at least I have) and
asking people to close out of date ones, but it's true that the list has gotten
fairly large. We should probably have an expiry time of a few months and close
them automatically. I agree that it's daunting
There were no failures nor exceptions.
On Tue, Aug 26, 2014 at 1:31 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Got it. Another thing that would help is if you spot any exceptions or
failed tasks in the web UI (http://driver:4040).
Matei
On August 25, 2014 at 3:07:41 PM, amnonkhen
Hey Amnon,
So just to make sure I understand - you also saw the same issue with 1.0.2?
Just asking because whether or not this regresses the 1.0.2 behavior is
important for our own bug tracking.
- Patrick
On Mon, Aug 25, 2014 at 10:22 PM, Amnon Khen amnon...@gmail.com wrote:
There were no
29 matches
Mail list logo