My bad, gothos on IRC pointed me to the docs:
http://jhz.name/2016/01/10/spark-classpath.html
Thanks Gothos!
On Fri, Sep 9, 2016 at 9:23 PM, Colin Kincaid Williams wrote:
> I'm using the spark shell v1.61 . I have a classpath conflict, where I
> have an external library ( not
I'm using the spark shell v1.61 . I have a classpath conflict, where I
have an external library ( not OSS either :( , can't rebuild it.)
using httpclient-4.5.2.jar. I use spark-shell --jars
file:/path/to/httpclient-4.5.2.jar
However spark is using httpclient-4.3 internally. Then when I try to
use
Streaming UI tab showing empty events and very different metrics than on 1.5.2
On Thu, Jun 23, 2016 at 5:06 AM, Colin Kincaid Williams wrote:
> After a bit of effort I moved from a Spark cluster running 1.5.2, to a
> Yarn cluster running 1.6.1 jars. I'm still setting the maxRPP. The
sible my issues were related to running on the Spark
1.5.2 cluster. Also is the missing event count in the completed
batches a bug? Should I file an issue?
On Tue, Jun 21, 2016 at 9:04 PM, Colin Kincaid Williams wrote:
> Thanks @Cody, I will try that out. In the interm, I tried to validate
> my
ion and just measure what your read
> performance is by doing something like
>
> createDirectStream(...).foreach(_.println)
>
> not take() or print()
>
> On Tue, Jun 21, 2016 at 3:19 PM, Colin Kincaid Williams
> wrote:
>> @Cody I was able to bring my processing ti
looking for advice regarding # Kafka Topic Partitions / Streaming
Duration / maxRatePerPartition / any other spark settings or code
changes that I should make to try to get a better consumption rate.
Thanks for all the help so far, this is the first Spark application I
have written.
On Mon, Jun 2
ocessing time is
> 1.16 seconds, you're always going to be falling behind. That would
> explain why you've built up an hour of scheduling delay after eight
> hours of running.
>
> On Sat, Jun 18, 2016 at 4:40 PM, Colin Kincaid Williams
> wrote:
>> Hi Mich again,
c?
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
&g
I'm attaching a picture from the streaming UI.
On Sat, Jun 18, 2016 at 7:59 PM, Colin Kincaid Williams wrote:
> There are 25 nodes in the spark cluster.
>
> On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh
> wrote:
>> how many nodes are in your cluster?
>>
>&g
eh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> On 18 June 2016 at 20:40, Colin Kincaid Williams wrote:
>>
>> I updated my app to Spark 1
-5.3.0-1.cdh5.3.0.p0.30/lib/hbase/lib/*
\
/home/colin.williams/kafka-hbase.jar "FromTable" "ToTable"
"broker1:9092,broker2:9092"
On Tue, May 3, 2016 at 8:20 PM, Colin Kincaid Williams wrote:
> Thanks Cody, I can see that the partitions are well distributed...
> Then
tributing across partitions evenly).
>
> On Tue, May 3, 2016 at 1:44 PM, Colin Kincaid Williams wrote:
>> Thanks again Cody. Regarding the details 66 kafka partitions on 3
>> kafka servers, likely 8 core systems with 10 disks each. Maybe the
>> issue with the receiver was the large n
t;
> Really though, I'd try to start with spark 1.6 and direct streams, or
> even just kafkacat, as a baseline.
>
>
>
> On Mon, May 2, 2016 at 7:01 PM, Colin Kincaid Williams wrote:
>> Hello again. I searched for "backport kafka" in the list archives but
ing with 1.3. If you're stuck
> on 1.2, I believe there have been some attempts to backport it, search
> the mailing list archives.
>
> On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams
> wrote:
>> I've written an application to get content from a kafka topic w
spark 1.2, or is upgrading possible? The
> kafka direct stream is available starting with 1.3. If you're stuck
> on 1.2, I believe there have been some attempts to backport it, search
> the mailing list archives.
>
> On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams
> wrot
me extent.
>
> David Krieg | Enterprise Software Engineer
> Early Warning
> Direct: 480.426.2171 | Fax: 480.483.4628 | Mobile: 859.227.6173
>
>
> -Original Message-
> From: Colin Kincaid Williams [mailto:disc...@uw.edu]
> Sent: Monday, May 02, 2016 10:55 AM
&g
I've written an application to get content from a kafka topic with 1.7
billion entries, get the protobuf serialized entries, and insert into
hbase. Currently the environment that I'm running in is Spark 1.2.
With 8 executors and 2 cores, and 2 jobs, I'm only getting between
0-2500 writes / second
I launch around 30-60 of these jobs defined like start-job.sh in the
background from a wrapper script. I wait about 30 seconds between launches,
then the wrapper monitors yarn to determine when to launch more. There is a
limit defined at around 60 jobs, but even if I set it to 30, I run out of
memo
he info in one place.
>
> On Tue, Feb 24, 2015 at 12:36 PM, Colin Kincaid Williams
> wrote:
>
>> Looks like in my tired state, I didn't mention spark the whole time.
>> However, it might be implied by the application log above. Spark log
>> aggregation appears to b
; /opt/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
>
>
> It may be slightly different for you if the resource manager and the
> history server are not on the same machine.
>
> Hope it will work for you as well!
> Christophe.
>
> On 24/02/2015 06:31, Colin Kinca
Hi,
I have been trying to get my yarn logs to display in the spark
history-server or yarn history-server. I can see the log information
yarn logs -applicationId application_1424740955620_0009
15/02/23 22:15:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing
over to us3sm2hbqa04r07-comp-pr
21 matches
Mail list logo