Re: jackson-core-asl jar (1.8.8 vs 1.9.x) conflict with the spark-sql (version 1.x)

2014-06-28 Thread Paul Brown
Hi, Mans -- Both of those versions of Jackson are pretty ancient. Do you know which of the Spark dependencies is pulling them in? It would be good for us (the Jackson, Woodstox, etc., folks) to see if we can get people to upgrade to more recent versions of Jackson. -- Paul —

Re: Distribute data from Kafka evenly on cluster

2014-06-28 Thread Mayur Rustagi
how abou this? https://groups.google.com/forum/#!topic/spark-users/ntPQUZFJt4M Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Sat, Jun 28, 2014 at 10:19 AM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, I have a

collect on partitions get very slow near the last few partitions.

2014-06-28 Thread Sung Hwan Chung
I'm doing something like this: rdd.groupBy.map().collect() The work load on final map is pretty much evenly distributed. When collect happens, say on 60 partitions, the first 55 or so partitions finish very quickly say within 10 seconds. However, the last 5, particularly the very last one,

Re: collect on partitions get very slow near the last few partitions.

2014-06-28 Thread Sung Hwan Chung
I'm finding the following messages in the driver. Can this potentially have anything to do with these drastic slowdowns? 14/06/28 00:00:17 INFO ShuffleBlockManager: Could not find files for shuffle 8 for deleting 14/06/28 00:00:17 INFO ContextCleaner: Cleaned shuffle 8 14/06/28 00:00:17 INFO

Re: HBase 0.96+ with Spark 1.0+

2014-06-28 Thread Sean Owen
This sounds like an instance of roughly the same item as in https://issues.apache.org/jira/browse/SPARK-1949 Have a look at adding that exclude to see if it works. On Fri, Jun 27, 2014 at 10:21 PM, Stephen Boesch java...@gmail.com wrote: The present trunk is built and tested against HBase 0.94.

Re: HBase 0.96+ with Spark 1.0+

2014-06-28 Thread Stephen Boesch
Thanks Sean. I had actually already added exclusion rule for org.mortbay.jetty - and that had not resolved it. Just in case I used your precise formulation: val excludeMortbayJetty = ExclusionRule(organization = org.mortbay.jetty) .. ,(org.apache.spark % spark-core_2.10 % sparkVersion

Re: HBase 0.96+ with Spark 1.0+

2014-06-28 Thread Siyuan he
Hi Stephen, I am using spark1.0+ HBase0.96.2. This is what I did: 1) rebuild spark using: mvn -Dhadoop.version=2.3.0 -Dprotobuf.version=2.5.0 -DskipTests clean package 2) In spark-env.sh, set SPARK_CLASSPATH = /path-to/hbase-protocol-0.96.2-hadoop2.jar Hopefully it can help. Siyuan On Sat, Jun

Re: jackson-core-asl jar (1.8.8 vs 1.9.x) conflict with the spark-sql (version 1.x)

2014-06-28 Thread M Singh
Hi Paul: Here are the dependencies in spark 1.1.0-snapshot that are pulling in org.codehaus.jackson:jackson-core-asl 1.8 and 1.9 jar. 1.9 com.twitter:parquet-hadoop:jar:1.4.3 org.apache.avro:avro:jar:1.7.6 1.8 org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT

Alternative to checkpointing and materialization for truncating lineage in high iteration jobs

2014-06-28 Thread Nilesh Chakraborty
Hello, In a thread about java.lang.StackOverflowError when calling count() [1] I saw Tathagata Das share an interesting approach for truncating RDD lineage - this helps prevent StackOverflowErrors in high iteration jobs while avoiding the disk-writing performance penalty. Here's an excerpt from

Re: Alternative to checkpointing and materialization for truncating lineage in high iteration jobs

2014-06-28 Thread Baoxu Shi(Dash)
I’m facing the same situation. It would be great if someone could provide a code snippet as example. On Jun 28, 2014, at 12:36 PM, Nilesh Chakraborty nil...@nileshc.com wrote: Hello, In a thread about java.lang.StackOverflowError when calling count() [1] I saw Tathagata Das share an