I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful?

2014-08-25 Thread Lizhengbing (bing, BIPA)
Hi: In paper Item-Based Top-N Recommendation Algorithms(https://stuyresearch.googlecode.com/hg/blake/resources/10.1.1.102.4451.pdf), there are two parameters measuring the quality of recommendation: HR and ARHR. If I use ALS(Implicit) for top-N recommendation system, I want to check it's

Re: Mesos/Spark Deadlock

2014-08-25 Thread Gary Malouf
We have not tried the work-around because there are other bugs in there that affected our set-up, though it seems it would help. On Mon, Aug 25, 2014 at 12:54 AM, Timothy Chen tnac...@gmail.com wrote: +1 to have the work around in. I'll be investigating from the Mesos side too. Tim On

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Nicholas Chammas
FYI: Looks like the Mesos folk also have a bot to do automatic linking, but it appears to have been provided to them somehow by ASF. See this comment as an example:

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-25 Thread npanj
I am running the code with @rxin's patch in standalone mode. In my case I am registering org.apache.spark.graphx.GraphKryoRegistrator . Recently I started to see com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR . Has anyone seen this?

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread amnonkhen
Hi jerryye, Maybe if you voted up my question on Stack Overflow it would get some traction and we would get nearer to a solution. Thanks, Amnon -- View this message in context:

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
This is kind of weird then, seems perhaps unrelated to this issue (or at least to the way I understood it). Is the problem maybe that Mesos saw 0 MB being freed and didn't re-offer the machine *even though there was more than 32 MB free overall*? Matei On August 25, 2014 at 12:59:59 PM, Cody

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
Anyway it would be good if someone from the Mesos side investigates this and proposes a solution. The 32 MB per task hack isn't completely foolproof either (e.g. people might allocate all the RAM to their executor and thus stop being able to launch tasks), so maybe we wait on a Mesos fix for

Re: Working Formula for Hive 0.13?

2014-08-25 Thread Michael Armbrust
Thanks for working on this! Its unclear at the moment exactly how we are going to handle this, since the end goal is to be compatible with as many versions of Hive as possible. That said, I think it would be great to open a PR in this case. Even if we don't merge it, thats a good way to get it

Re: Storage Handlers in Spark SQL

2014-08-25 Thread Michael Armbrust
- dev list + user list You should be able to query Spark SQL using JDBC, starting with the 1.1 release. There is some documentation is the repo https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#running-the-thrift-jdbc-server, and we'll update the official docs once the

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Matei Zaharia
Was the original issue with Spark 1.1 (i.e. master branch) or an earlier release? One possibility is that your S3 bucket is in a remote Amazon region, which would make it very slow. In my experience though saveAsTextFile has worked even for pretty large datasets in that situation, so maybe

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
One other idea - when things freeze up, try to run jstack on the spark shell process and on the executors and attach the results. It could be that somehow you are encountering a deadlock somewhere. On Mon, Aug 25, 2014 at 1:26 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Was the original

Re: Pull requests will be automatically linked to JIRA when submitted

2014-08-25 Thread Patrick Wendell
Hey Nicholas, That seems promising - I prefer having a proper link to having that fairly verbose comment though, because in some cases there will be dozens of comments and it could get lost. I wonder if they could do something where it posts a link instead... - Patrick On Mon, Aug 25, 2014 at

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread jerryye
Hi Matei, At least in my case, the s3 bucket is in the same region. Running count() works and so does generating synthetic data. What I saw was that the job would hang for over an hour with no progress but tasks would immediately start finishing if I cached the data. - jerry On Mon, Aug 25,

Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
Hi Matei, I'm going to investigate from both Mesos and Spark side will hopefully have a good long term solution. In the mean time having a work around to start with is going to unblock folks. Tim On Mon, Aug 25, 2014 at 1:08 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Anyway it would be

RE: Working Formula for Hive 0.13?

2014-08-25 Thread Andrew Lee
From my perspective, there're few benefits regarding Hive 0.13.1+. The following are the 4 major ones that I can see why people are asking to upgrade to Hive 0.13.1 recently. 1. Performance and bug fix, patches. (Usual case) 2. Native support for Parquet format, no need to provide custom JARs

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread jerryye
Hi Patrick, Here's the process: java -cp /root/ephemeral-hdfs/conf/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/assembly/target/scala-2.10/spark-assembly-1.1.1-SNAPSHOT-hadoop1.0.4.jar -XX:MaxPermSize=128m -Djava.library.path=/root/ephemeral-hdfs/lib/native/ -Xms5g -Xmx10g

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-25 Thread Graham Dennis
Hi, Unless you manually patched Spark, if you have Reynold’s patch for SPARK-2878, you also have the patch for SPARK-2893 which makes the underlying cause much more obvious and explicit. So the below is unlikely to be related to SPARK-2878. Graham On 26 Aug 2014, at 4:13 am, npanj

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread amnonkhen
Hi Matei, The original issue happened on a spark-1.0.2-bin-hadoop2 installation. I will try the synthetic operation and see if I get the same results or not. Amnon On Mon, Aug 25, 2014 at 11:26 PM, Matei Zaharia [via Apache Spark Developers List] ml-node+s1001551n8000...@n3.nabble.com wrote:

Re: Mesos/Spark Deadlock

2014-08-25 Thread Matei Zaharia
My problem is that I'm not sure this workaround would solve things, given the issue described here (where there was a lot of memory free but it didn't get re-offered). If you think it does, it would be good to explain why it behaves like that. Matei On August 25, 2014 at 2:28:18 PM, Timothy

Re: [Spark SQL] off-heap columnar store

2014-08-25 Thread Henry Saputra
Hi Michael, This is great news. Any initial proposal or design about the caching to Tachyon that you can share so far? I don't think there is a JIRA ticket open to track this feature yet. - Henry On Mon, Aug 25, 2014 at 1:13 PM, Michael Armbrust mich...@databricks.com wrote: What is the plan

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Matei Zaharia
Got it. Another thing that would help is if you spot any exceptions or failed tasks in the web UI (http://driver:4040). Matei On August 25, 2014 at 3:07:41 PM, amnonkhen (amnon...@gmail.com) wrote: Hi Matei, The original issue happened on a spark-1.0.2-bin-hadoop2 installation. I will try

Re: Mesos/Spark Deadlock

2014-08-25 Thread Timothy Chen
I don't think it solves Cody's problem which still need more investigating, but I believe it does solve the problem you described earlier. I just confirmed with Mesos folks that we no longer need the minimum memory requirement so we'll be dropping that soon and the workaround might not be needed

too many CancelledKeyException throwed from ConnectionManager

2014-08-25 Thread yao
Hi Folks, We are testing our home-made KMeans algorithm using Spark on Yarn. Recently, we've found that the application failed frequently when doing clustering over 300,000,000 users (each user is represented by a feature vector and the whole data set is around 600,000,000). After digging into

Re: Graphx seems to be broken while Creating a large graph(6B nodes in my case)

2014-08-25 Thread Ankur Dave
I posted the fix on the JIRA ticket (https://issues.apache.org/jira/browse/SPARK-3190). To update the user list, this is indeed an integer overflow problem when summing up the partition sizes. The fix is to use Longs for the sum: https://github.com/apache/spark/pull/2106. Ankur

Handling stale PRs

2014-08-25 Thread Nicholas Chammas
Check this out: https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc We're hitting close to 300 open PRs. Those are the least recently updated ones. I think having a low number of stale (i.e. not recently updated) PRs is a good thing to shoot for. It doesn't leave

RDD replication in Spark

2014-08-25 Thread rapelly kartheek
Hi, I've exercised multiple options available for persist() including RDD replication. I have gone thru the classes that involve in caching/storing the RDDS at different levels. StorageLevel class plays a pivotal role by recording whether to use memory or disk or to replicate the RDD on multiple

Re: Handling stale PRs

2014-08-25 Thread Matei Zaharia
Hey Nicholas, In general we've been looking at these periodically (at least I have) and asking people to close out of date ones, but it's true that the list has gotten fairly large. We should probably have an expiry time of a few months and close them automatically. I agree that it's daunting

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Amnon Khen
There were no failures nor exceptions. On Tue, Aug 26, 2014 at 1:31 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Got it. Another thing that would help is if you spot any exceptions or failed tasks in the web UI (http://driver:4040). Matei On August 25, 2014 at 3:07:41 PM, amnonkhen

Re: saveAsTextFile to s3 on spark does not work, just hangs

2014-08-25 Thread Patrick Wendell
Hey Amnon, So just to make sure I understand - you also saw the same issue with 1.0.2? Just asking because whether or not this regresses the 1.0.2 behavior is important for our own bug tracking. - Patrick On Mon, Aug 25, 2014 at 10:22 PM, Amnon Khen amnon...@gmail.com wrote: There were no