Re: RDD order guarantees

2015-01-18 Thread Reynold Xin
Hi Ewan, Not sure if there is a JIRA ticket (there are too many that I lose track). I chatted briefly with Aaron on this. The way we can solve it is to create a new FileSystem implementation that overrides the listStatus method, and then in Hadoop Conf set the fs.file.impl to that. Shouldn't be

Re: Memory config issues

2015-01-18 Thread Alessandro Baretta
Akhil, Ah, very good point. I guess "SET spark.sql.shuffle.partitions=1024" should do it. Alex On Sun, Jan 18, 2015 at 10:29 PM, Akhil Das wrote: > Its the executor memory (spark.executor.memory) which you can set while > creating the spark context. By default it uses 0.6% of the executor memo

Re: Memory config issues

2015-01-18 Thread Akhil Das
Its the executor memory (spark.executor.memory) which you can set while creating the spark context. By default it uses 0.6% of the executor memory for Storage. Now, to show some memory usage, you need to cache (persist) the RDD. Regarding the OOM Exception, you can increase the level of parallelism

Re: GraphX doc: triangleCount() requirement overstatement?

2015-01-18 Thread Reynold Xin
We will merge https://issues.apache.org/jira/browse/SPARK-3650 for 1.3. Thanks for reminding! On Sun, Jan 18, 2015 at 8:34 PM, Michael Malak < michaelma...@yahoo.com.invalid> wrote: > According to: > > https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#triangle-counting > > "Note

Memory config issues

2015-01-18 Thread Alessandro Baretta
All, I'm getting out of memory exceptions in SparkSQL GROUP BY queries. I have plenty of RAM, so I should be able to brute-force my way through, but I can't quite figure out what memory option affects what process. My current memory configuration is the following: export SPARK_WORKER_MEMORY=8397

GraphX doc: triangleCount() requirement overstatement?

2015-01-18 Thread Michael Malak
According to: https://spark.apache.org/docs/1.2.0/graphx-programming-guide.html#triangle-counting "Note that TriangleCount requires the edges to be in canonical orientation (srcId < dstId)" But isn't this overstating the requirement? Isn't the requirement really that IF there are duplicate ed

Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-01-18 Thread Sean Owen
Agree, I think this can / should be fixed with a slightly more conservative version of https://github.com/apache/spark/pull/3938 related to SPARK-5108. On Sun, Jan 18, 2015 at 3:41 PM, Ted Yu wrote: > Please tale a look at SPARK-4048 and SPARK-5108 > > Cheers > > On Sat, Jan 17, 2015 at 10:26 PM,

Re: run time exceptions in Spark 1.2.0 manual build together with OpenStack hadoop driver

2015-01-18 Thread Ted Yu
Please tale a look at SPARK-4048 and SPARK-5108 Cheers On Sat, Jan 17, 2015 at 10:26 PM, Gil Vernik wrote: > Hi, > > I took a source code of Spark 1.2.0 and tried to build it together with > hadoop-openstack.jar ( To allow Spark an access to OpenStack Swift ) > I used Hadoop 2.6.0. > > The buil

Re: Semantics of LGTM

2015-01-18 Thread Reynold Xin
Maybe just to avoid LGTM as a single token when it is not actually according to Patrick's definition, but anybody can still leave comments like: "The direction of the PR looks good to me." or "+1 on the direction" "The build part looks good to me" ... On Sat, Jan 17, 2015 at 8:49 PM, Kay Ouste