date:20140127

setting partitioners with hadoop rdds

2014-01-27 Thread Imran Rashid

Hi, I'm trying to figure out how to get partitioners to work correctly with hadoop rdds, so that I can get narrow dependencies & avoid shuffling. I feel like I must be missing something obvious. I can create an RDD with a parititioner of my choosing, shuffle it and then save it out to hdfs. Bu

Re: Cannot get Hadoop dependencies

2014-01-27 Thread Kal El

Well, it seems that 0.20.2 is actually the latest version (2.2.0) I have the following problem: In build.sbt I have this: libraryDependencies ++= Seq( ("org.apache.spark" %% "spark-core" % "0.8.0-incubating"). exclude("org.mortbay.jetty", "servlet-api"). exclude("commons-beanutils", "com

Re: What could be the cause of this Streaming error

2014-01-27 Thread Khanderao kand

Scala version changed in 0.9.0 to Scala 2.10 Are you using the same version? On Tue, Jan 28, 2014 at 11:30 AM, Ashish Rangole wrote: > Hi, > > I am seeing the following error message when I began testing my Streaming > application locally. Could it be due to a mismatch with > old spark jars som

What could be the cause of this Streaming error

2014-01-27 Thread Ashish Rangole

Hi, I am seeing the following error message when I began testing my Streaming application locally. Could it be due to a mismatch with old spark jars somewhere or is this something else? Thanks, Ashish SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/my

SparkStreaming not read hadoop configuration from its sparkContext on Stand Alone mode?

2014-01-27 Thread robin_up

Hi I try to run a small piece of code on Spark Steaming. It sets the s3 keys in sparkContext object and passed into a sparkStreaming object. However, I got the below error -- it seems StreamingContext did not use the hadoop config on work threads. It works ok if I run it in spark core (batch mode

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

2014-01-27 Thread kamatsuoka

The version of commons-io included in the Spark assembly is an old one, which doesn't have the version of closeQuietly that takes a Closeable: $ javap -cp /root/spark/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop2.0.0-mr1-cdh4.2.0.jar org.apache.commons.io.IOUtils Compil

Sporadic "IOException: Class not found" in ClosureCleaner

2014-01-27 Thread John Salvatier

I am sporadically getting this error in the process of loading a couple of data files. Its frequent but not consistent. I tried executing `ulimit -n 16000` before running the script (as recommended here), but this didn't s

Re: real world streaming code

2014-01-27 Thread dachuan

thanks, Ryan. I will study Algebird first and try to adapt TopKMonoid to spark streaming program. On Mon, Jan 27, 2014 at 2:54 PM, Ryan Weald wrote: > Hi dachuan, > > Getting top-k up and running using spark streaming is actually very easy > using Twitter's Algebird project. I gave a presentati

Re: Cannot get Hadoop dependencies

2014-01-27 Thread Jey Kottalam

I believe that Hadoop 0.20.2 is too old for compatibility with Spark. The hadoop-client dependency is available in the 0.23.x, 1.0.x, and newer releases, but not in the 0.20.x releases. Source: http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client On Mon, Jan 27, 2014 at 6:20 AM, 尹绪森

Re: s3n > 5GB

2014-01-27 Thread kamatsuoka

Thanks for the replies. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/s3n-5GB-tp943p967.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: real world streaming code

2014-01-27 Thread Ryan Weald

Hi dachuan, Getting top-k up and running using spark streaming is actually very easy using Twitter's Algebird project. I gave a presentation recently at a spark user meetup that wen through an example of using algebird in a spark streaming job. You can find the video and slides here - http://isurf

Re: What I am missing from configuration?

2014-01-27 Thread Matei Zaharia

Hi Dana, I think the problem is that your simple.sbt does not add a dependency on hadoop-client for CDH4, so you get a different version of the Hadoop library on your driver application compared to the cluster. Try adding a dependency on hadoop-client version 2.0.0-mr1-cdh4.X.X for your version

Re: s3n > 5GB

2014-01-27 Thread Ryan Weald

I have run Hadoop + spark jobs on large s3n files without an issue. That being said if you have very large files you might want to consider using s3:// instead, as that uses a HDFS block format compatible storage which means you can more effectively split your large file between map tasks. In my e

Re: Purpose of the HTTP Server?

2014-01-27 Thread Mark Hamstra

Used for broadcast variables and to distribute files or jars to worker nodes. See HttpBroadcast.scalaand SparkContext.scala

updateStateByKey Question

2014-01-27 Thread Craig Vanderborgh

Hello, I'm a big fan of updateStateByKey(), have been using it for a year, and now need to push the envelope again. My question is simply this: Can I use updateStateByKey in the following way? states = events.updateStateByKey[State](firstUpdateFcn) states2 = events.updateStateByKey[State](se

Re: Purpose of the HTTP Server?

2014-01-27 Thread Heiko Braun

Yes, I've seen the one used for the UI. But there is also the HttpServer and HttpFileServer. Those are the ones I was wondering about. /Heiko On 27 Jan 2014, at 15:18, Cheng Lian wrote: > It's used for the Web UI. By default, you may access http://localhost:8080 > to view the cluster informa

Problems while moving from 0.8.0 to 0.8.1

2014-01-27 Thread Archit Thakur

Hi, Implementation of aggregation logic has been changed with 0.8.1 (Aggregator.scala) It is now using AppendOnlyMap as compared to java.util.HashMap in 0.8.0 release. Aggregator.scala def combineValuesByKey(iter: Iterator[_ <: Product2[K, V]]) : Iterator[(K, C)] = { val combiners = new Appe

Re: Cannot get Hadoop dependencies

2014-01-27 Thread 尹绪森

http://www.scala-sbt.org/release/docs/Getting-Started/Library-Dependencies This document might be useful. You should make sure that your specified package in the right uri，and the repo is added in resolver. 2014-1-27 PM9:24于 "Kal El" 写道： > I am having some trouble with Hadoop. I cannot build my p

Re: Purpose of the HTTP Server?

2014-01-27 Thread Cheng Lian

It's used for the Web UI. By default, you may access http://localhost:8080to view the cluster information as well as individual job details. On Mon, Jan 27, 2014 at 10:15 PM, Heiko Braun wrote: > > > Can someone briefly explain the the purpose of the HTTP server in spark? > Is it related to shi

Purpose of the HTTP Server?

2014-01-27 Thread Heiko Braun

Can someone briefly explain the the purpose of the HTTP server in spark? Is it related to shipping the jars around in a cluster? Regards, Heiko

Cannot get Hadoop dependencies

2014-01-27 Thread Kal El

I am having some trouble with Hadoop. I cannot build my project with sbt. According to the documentation, I added a line like this in my build.sbt file: "libraryDependencies+="org.apache.hadoop"%"hadoop-client"%""" my line being: libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.2

Re: Inaccurate Estimates from LinearRegressionWithSGD

2014-01-27 Thread Sean Owen

This fix from 8 days ago might be related: https://github.com/apache/incubator-spark/pull/459 If you are not building from HEAD, I might try again with that, or wait for the 0.9 release that will contain it. May not be the cause. On Mon, Jan 27, 2014 at 1:35 AM, herbps10 wrote: > Hello, > > I

setting partitioners with hadoop rdds

Re: Cannot get Hadoop dependencies

Re: What could be the cause of this Streaming error

What could be the cause of this Streaming error

SparkStreaming not read hadoop configuration from its sparkContext on Stand Alone mode?

Re: NoSuchMethodError: org.apache.commons.io.IOUtils.closeQuietly with cdh4 binary

Sporadic "IOException: Class not found" in ClosureCleaner

Re: real world streaming code

Re: Cannot get Hadoop dependencies

Re: s3n > 5GB

Re: real world streaming code

Re: What I am missing from configuration?

Re: s3n > 5GB

Re: Purpose of the HTTP Server?

updateStateByKey Question

Re: Purpose of the HTTP Server?

Problems while moving from 0.8.0 to 0.8.1

Re: Cannot get Hadoop dependencies

Re: Purpose of the HTTP Server?

Purpose of the HTTP Server?

Cannot get Hadoop dependencies

Re: Inaccurate Estimates from LinearRegressionWithSGD

22 matches

Site Navigation

Mail list logo

Footer information