Re: Generalised Spark-HBase integration

2015-07-28 Thread Michal Haris
e HBase apis. > > We have tried to cover any possible way to use HBase with Spark. Let us > know if we missed anything if we did we will add it. > > On Tue, Jul 28, 2015 at 12:12 PM, Michal Haris > wrote: > >> Hi Ted, yes, cloudera blog and your code was my starting point

Re: Generalised Spark-HBase integration

2015-07-28 Thread Michal Haris
scanner.advance) numCells += 1 > [INFO]^ > [ERROR] one error found > > FYI > > On Tue, Jul 28, 2015 at 8:59 AM, Michal Haris > wrote: > >> Hi all, last couple of months I've been working on a large graph >> analytics and along the way have written from scra

Re: Generalised Spark-HBase integration

2015-07-28 Thread Michal Haris
op/ > > Let me know if you have any questions, also let me know if you want to > connect to join efforts. > > Ted Malaska > > On Tue, Jul 28, 2015 at 11:59 AM, Michal Haris > wrote: > >> Hi all, last couple of months I've been working on a large graph >>

Generalised Spark-HBase integration

2015-07-28 Thread Michal Haris
into an (almost) spark module, which works with the latest spark and the new hbase api, so... sharing! : https://github.com/michal-harish/spark-on-hbase -- Michal Haris Technical Architect direct line: +44 (0) 207 749 0229 www.visualdna.com | t: +44 (0) 207 734 7033 31 Old Nichol Street London E2 7HR

Re: 1.4.0 classpath issue with spark-submit

2015-07-25 Thread Michal Haris
try adding that jar in SPARK_CLASSPATH (its deprecated though) in > spark-env.sh file. > > Thanks > Best Regards > > On Tue, Jul 21, 2015 at 7:34 PM, Michal Haris > wrote: > >> I have a spark program that uses dataframes to query hive and I run it >> both as a s

1.4.0 classpath issue with spark-submit

2015-07-21 Thread Michal Haris
aths that are passed along with --driver-class-path option are missing. When I switch to an older 1.4.0-SNAPSHOT on the driver, everything works. I observe the issue with 1.4.1. Are there any known obvious changes to how spark-submit handles configuration that I have missed ? -- Michal Haris Techni

Re: Including additional scala libraries in sparkR

2015-07-14 Thread Michal Haris
be used by SparkR > RDD API for further processing. > > You can use --jars to include your scala library to be accessed by the JVM > backend. > > > From: Michal Haris [michal.ha...@visualdna.com] > Sent: Sunday, July 12, 2015 6:39 PM > To:

Including additional scala libraries in sparkR

2015-07-12 Thread Michal Haris
R. Is there a way to include and invoke additional scala objects and RDDs within sparkR shell/job ? Something similar to additional jars and init script in normal spark submit/shell.. -- Michal Haris Technical Architect direct line: +44 (0) 207 749 0229 www.visualdna.com | t: +44 (0) 207 734 7033 3

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

2015-05-11 Thread Michal Haris
gt; curious to know where AppendOnlyMap.changeValue is being called from. > > On Fri, May 8, 2015 at 1:26 PM, Michal Haris > wrote: > >> +dev >> On 6 May 2015 10:45, "Michal Haris" wrote: >> >> > Just wanted to check if somebody has seen similar beha

Re: large volume spark job spends most of the time in AppendOnlyMap.changeValue

2015-05-08 Thread Michal Haris
+dev On 6 May 2015 10:45, "Michal Haris" wrote: > Just wanted to check if somebody has seen similar behaviour or knows what > we might be doing wrong. We have a relatively complex spark application > which processes half a terabyte of data at various stages. We have profiled &

large volume spark job spends most of the time in AppendOnlyMap.changeValue

2015-05-06 Thread Michal Haris
ld be appreciated. -- Michal Haris Technical Architect direct line: +44 (0) 207 749 0229 www.visualdna.com | t: +44 (0) 207 734 7033,