Re: Jackson-core-asl conflict with Spark

2015-03-12 Thread Paul Brown
So... one solution would be to use a non-Jurassic version of Jackson. 2.6 will drop before too long, and 3.0 is in longer-term planning. The 1.x series is long deprecated. If you're genuinely stuck with something ancient, then you need to include the JAR that contains the class, and 1.9.13 does

Re: Downloads from S3 exceedingly slow when running on spark-ec2

2014-12-20 Thread Paul Brown
I would suggest checking out disk IO on the nodes in your cluster and then reading up on the limiting behaviors that accompany different kinds of EC2 storage. Depending on how things are configured for your nodes, you may have a local storage configuration that provides "bursty" IOPS where you get

Re: Parsing a large XML file using Spark

2014-11-21 Thread Paul Brown
Unfortunately, unless you impose restrictions on the XML file (e.g., where namespaces are declared, whether entity replacement is used, etc.), you really can't parse only a piece of it even if you have start/end elements grouped together. If you want to deal effectively (and scalably) with large X

Re: Recommended pipeline automation tool? Oozie?

2014-07-10 Thread Paul Brown
We use Luigi for this purpose. (Our pipelines are typically on AWS (no EMR) backed by S3 and using combinations of Python jobs, non-Spark Java/Scala, and Spark. We run Spark jobs by connecting drivers/clients to the master, and those are what is invoked from Luigi.) — p...@mult.ifario.us | Multi

Re: jackson-core-asl jar (1.8.8 vs 1.9.x) conflict with the spark-sql (version 1.x)

2014-06-27 Thread Paul Brown
Hi, Mans -- Both of those versions of Jackson are pretty ancient. Do you know which of the Spark dependencies is pulling them in? It would be good for us (the Jackson, Woodstox, etc., folks) to see if we can get people to upgrade to more recent versions of Jackson. -- Paul — p...@mult.ifario.u

Re: Upgrading to Spark 1.0.0 causes NoSuchMethodError

2014-06-25 Thread Paul Brown
Hi, Robert -- I wonder if this is an instance of SPARK-2075: https://issues.apache.org/jira/browse/SPARK-2075 -- Paul — p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Wed, Jun 25, 2014 at 6:28 AM, Robert James wrote: > On 6/24/14, Robert James wrote: > > My app works f

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Paul Brown
a you are running with? Are they the same? > > Just off the cuff, I wonder if this is related to: > https://issues.apache.org/jira/browse/SPARK-1520 > > If it is, it could appear that certain functions are not in the jar > because they go beyond the extended zip boundary `jar tvf`

Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0

2014-06-08 Thread Paul Brown
Moving over to the dev list, as this isn't a user-scope issue. I just ran into this issue with the missing saveAsTestFile, and here's a little additional information: - Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases. - Driver built as an uberjar via Maven. - Deployed to sma

Re: missing method in my slf4j after excluding Spark ZK log4j

2014-05-12 Thread Paul Brown
Hi, Adrian -- If my memory serves, you need 1.7.7 of the various slf4j modules to avoid that issue. Best. -- Paul — p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Mon, May 12, 2014 at 7:51 AM, Adrian Mocanu wrote: > Hey guys, > > I've asked before, in Spark 0.9 - I now

Re: Packaging a spark job using maven

2014-05-12 Thread Paul Brown
Hi, Laurent -- That's the way we package our Spark jobs (i.e., with Maven). You'll need something like this: https://gist.github.com/prb/d776a47bd164f704eecb That packages separate driver (which you can run with java -jar ...) and worker JAR files. Cheers. -- Paul — p...@mult.ifario.us | Mult

Re: parson json within rdd's filter()

2014-03-13 Thread Paul Brown
ckaged in a .jar file and I execute .addJar on the > SparkContext. My expectation is that the whole jar together with that > function is available on every worker automatically. Is that not a valid > expectation? > > Ognen > > > On 3/13/14, 11:09 AM, Paul Brown wrote: >

Re: parson json within rdd's filter()

2014-03-13 Thread Paul Brown
It's trying to send You just need to have the jsonMatches function available on the worker side of the interaction rather than on the driver side, e.g., put it on an object CodeThatIsRemote that gets shipped with the JARs and then filter(CodeThatIsRemote.jsonMatches) and you should be off to the ra

Re: Unable to redirect Spark logs to slf4j

2014-03-05 Thread Paul Brown
Hi, Sergey -- Here's my recipe, implemented via Maven; YMMV if you need to do it via sbt, etc., but it should be equivalent: 1) Replace org.apache.spark.Logging trait with this: https://gist.github.com/prb/bc239b1616f5ac40b4e5 (supplied by Patrick during the discussion on the dev list) 2) Amend y

Re: Having Spark read a JSON file

2014-02-28 Thread Paul Brown
SON text, but they don't have to stay that way. Would > you recommend I somehow convert the files into another format, say Avro, > before handling them with Spark? > > Paul, > > When you say not to write your ser/de as inline blocks, could you provide > a simple examp