Re: classpath conflict with spark internal libraries and the spark shell.

2016-09-09 Thread Colin Kincaid Williams
My bad, gothos on IRC pointed me to the docs: http://jhz.name/2016/01/10/spark-classpath.html Thanks Gothos! On Fri, Sep 9, 2016 at 9:23 PM, Colin Kincaid Williams wrote: > I'm using the spark shell v1.61 . I have a classpath conflict, where I > have an external library ( not

classpath conflict with spark internal libraries and the spark shell.

2016-09-09 Thread Colin Kincaid Williams
I'm using the spark shell v1.61 . I have a classpath conflict, where I have an external library ( not OSS either :( , can't rebuild it.) using httpclient-4.5.2.jar. I use spark-shell --jars file:/path/to/httpclient-4.5.2.jar However spark is using httpclient-4.3 internally. Then when I try to use

Re: Improving performance of a kafka spark streaming app

2016-06-22 Thread Colin Kincaid Williams
Streaming UI tab showing empty events and very different metrics than on 1.5.2 On Thu, Jun 23, 2016 at 5:06 AM, Colin Kincaid Williams wrote: > After a bit of effort I moved from a Spark cluster running 1.5.2, to a > Yarn cluster running 1.6.1 jars. I'm still setting the maxRPP. The

Re: Improving performance of a kafka spark streaming app

2016-06-22 Thread Colin Kincaid Williams
sible my issues were related to running on the Spark 1.5.2 cluster. Also is the missing event count in the completed batches a bug? Should I file an issue? On Tue, Jun 21, 2016 at 9:04 PM, Colin Kincaid Williams wrote: > Thanks @Cody, I will try that out. In the interm, I tried to validate > my

Re: Improving performance of a kafka spark streaming app

2016-06-21 Thread Colin Kincaid Williams
ion and just measure what your read > performance is by doing something like > > createDirectStream(...).foreach(_.println) > > not take() or print() > > On Tue, Jun 21, 2016 at 3:19 PM, Colin Kincaid Williams > wrote: >> @Cody I was able to bring my processing ti

Re: Improving performance of a kafka spark streaming app

2016-06-21 Thread Colin Kincaid Williams
looking for advice regarding # Kafka Topic Partitions / Streaming Duration / maxRatePerPartition / any other spark settings or code changes that I should make to try to get a better consumption rate. Thanks for all the help so far, this is the first Spark application I have written. On Mon, Jun 2

Re: Improving performance of a kafka spark streaming app

2016-06-20 Thread Colin Kincaid Williams
ocessing time is > 1.16 seconds, you're always going to be falling behind. That would > explain why you've built up an hour of scheduling delay after eight > hours of running. > > On Sat, Jun 18, 2016 at 4:40 PM, Colin Kincaid Williams > wrote: >> Hi Mich again,

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
c? >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> &g

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
I'm attaching a picture from the streaming UI. On Sat, Jun 18, 2016 at 7:59 PM, Colin Kincaid Williams wrote: > There are 25 nodes in the spark cluster. > > On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh > wrote: >> how many nodes are in your cluster? >> >&g

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
eh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > > > On 18 June 2016 at 20:40, Colin Kincaid Williams wrote: >> >> I updated my app to Spark 1

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
-5.3.0-1.cdh5.3.0.p0.30/lib/hbase/lib/* \ /home/colin.williams/kafka-hbase.jar "FromTable" "ToTable" "broker1:9092,broker2:9092" On Tue, May 3, 2016 at 8:20 PM, Colin Kincaid Williams wrote: > Thanks Cody, I can see that the partitions are well distributed... > Then

Re: Improving performance of a kafka spark streaming app

2016-05-03 Thread Colin Kincaid Williams
tributing across partitions evenly). > > On Tue, May 3, 2016 at 1:44 PM, Colin Kincaid Williams wrote: >> Thanks again Cody. Regarding the details 66 kafka partitions on 3 >> kafka servers, likely 8 core systems with 10 disks each. Maybe the >> issue with the receiver was the large n

Re: Improving performance of a kafka spark streaming app

2016-05-03 Thread Colin Kincaid Williams
t; > Really though, I'd try to start with spark 1.6 and direct streams, or > even just kafkacat, as a baseline. > > > > On Mon, May 2, 2016 at 7:01 PM, Colin Kincaid Williams wrote: >> Hello again. I searched for "backport kafka" in the list archives but

Re: Improving performance of a kafka spark streaming app

2016-05-02 Thread Colin Kincaid Williams
ing with 1.3. If you're stuck > on 1.2, I believe there have been some attempts to backport it, search > the mailing list archives. > > On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams > wrote: >> I've written an application to get content from a kafka topic w

Re: Improving performance of a kafka spark streaming app

2016-05-02 Thread Colin Kincaid Williams
spark 1.2, or is upgrading possible? The > kafka direct stream is available starting with 1.3. If you're stuck > on 1.2, I believe there have been some attempts to backport it, search > the mailing list archives. > > On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams > wrot

Re: Improving performance of a kafka spark streaming app

2016-05-02 Thread Colin Kincaid Williams
me extent. > > David Krieg | Enterprise Software Engineer > Early Warning > Direct: 480.426.2171 | Fax: 480.483.4628 | Mobile: 859.227.6173 > > > -Original Message- > From: Colin Kincaid Williams [mailto:disc...@uw.edu] > Sent: Monday, May 02, 2016 10:55 AM &g

Improving performance of a kafka spark streaming app

2016-05-02 Thread Colin Kincaid Williams
I've written an application to get content from a kafka topic with 1.7 billion entries, get the protobuf serialized entries, and insert into hbase. Currently the environment that I'm running in is Spark 1.2. With 8 executors and 2 cores, and 2 jobs, I'm only getting between 0-2500 writes / second

Running out of memory locally launching multiple spark jobs using spark yarn / submit from shell script.

2016-01-17 Thread Colin Kincaid Williams
I launch around 30-60 of these jobs defined like start-job.sh in the background from a wrapper script. I wait about 30 seconds between launches, then the wrapper monitors yarn to determine when to launch more. There is a limit defined at around 60 jobs, but even if I set it to 30, I run out of memo

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Colin Kincaid Williams
he info in one place. > > On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams > wrote: > >> Looks like in my tired state, I didn't mention spark the whole time. >> However, it might be implied by the application log above. Spark log >> aggregation appears to b

Re: How to get yarn logs to display in the spark or yarn history-server?

2015-02-24 Thread Colin Kincaid Williams
; /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver > > > It may be slightly different for you if the resource manager and the > history server are not on the same machine. > > Hope it will work for you as well! > Christophe. > > On 24/02/2015 06:31, Colin Kinca

How to get yarn logs to display in the spark or yarn history-server?

2015-02-23 Thread Colin Kincaid Williams
Hi, I have been trying to get my yarn logs to display in the spark history-server or yarn history-server. I can see the log information yarn logs -applicationId application_1424740955620_0009 15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to us3sm2hbqa04r07-comp-pr