Re: Spark submit hbase issue

2021-04-14 Thread Mich Talebzadeh
Try adding hbase-site.xml file to %SPARK_HOME%\conf and see if it works HTH view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other

Re: Spark RDD + HBase: adoption trend

2021-01-20 Thread Sean Owen
RDDs are still relevant in a few ways - there is no Dataset in Python for example, so RDD is still the 'typed' API. They still underpin DataFrames. And of course it's still there because there's probably still a lot of code out there that uses it. Occasionally it's still useful to drop into that

Re: Spark RDD + HBase: adoption trend

2021-01-20 Thread Jacek Laskowski
Hi Marco, IMHO RDD is only for very sophisticated use cases that very few Spark devs would be capable of. I consider RDD API a sort of Spark assembler and most Spark devs should stick to Dataset API. Speaking of HBase, see

Re: Spark to HBase Fast Bulk Upload

2016-09-19 Thread Kabeer Ahmed
Hi, Without using Spark there are a couple of options. You can refer to the link: http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/. The gist is that you convert the data into HFiles and use the bulk upload option to get the data quickly into HBase. HTH Kabeer. On

RE: Spark with HBase Error - Py4JJavaError

2016-07-08 Thread Puneet Tripathi
Hi Ram, Thanks very much it worked. Puneet From: ram kumar [mailto:ramkumarro...@gmail.com] Sent: Thursday, July 07, 2016 6:51 PM To: Puneet Tripathi Cc: user@spark.apache.org Subject: Re: Spark with HBase Error - Py4JJavaError Hi Puneet, Have you tried appending --jars $SPARK_HOME/lib/spark

Re: Spark with HBase Error - Py4JJavaError

2016-07-07 Thread ram kumar
Hi Puneet, Have you tried appending --jars $SPARK_HOME/lib/spark-examples-*.jar to the execution command? Ram On Thu, Jul 7, 2016 at 5:19 PM, Puneet Tripathi < puneet.tripa...@dunnhumby.com> wrote: > Guys, Please can anyone help on the issue below? > > > > Puneet > > > > *From:* Puneet

RE: Spark with HBase Error - Py4JJavaError

2016-07-07 Thread Puneet Tripathi
Guys, Please can anyone help on the issue below? Puneet From: Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com] Sent: Thursday, July 07, 2016 12:42 PM To: user@spark.apache.org Subject: Spark with HBase Error - Py4JJavaError Hi, We are running Hbase in fully distributed mode. I tried to

Re: Spark and HBase RDD join/get

2016-01-14 Thread Ted Yu
For #1, yes it is possible. You can find some example in hbase-spark module of hbase where hbase as DataSource is provided. e.g. https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala Cheers On Thu, Jan 14, 2016 at 5:04 AM,

Re: Spark and HBase RDD join/get

2016-01-14 Thread Kristoffer Sjögren
Thanks Ted! On Thu, Jan 14, 2016 at 4:49 PM, Ted Yu wrote: > For #1, yes it is possible. > > You can find some example in hbase-spark module of hbase where hbase as > DataSource is provided. > e.g. > >

Re: Spark on hbase using Phoenix in secure cluster

2015-12-07 Thread Ruslan Dautkhanov
Try Phoenix from Cloudera parcel distribution https://blog.cloudera.com/blog/2015/11/new-apache-phoenix-4-5-2-package-from-cloudera-labs/ They may have better Kerberos support .. On Tue, Dec 8, 2015 at 12:01 AM Akhilesh Pathodia < pathodia.akhil...@gmail.com> wrote: > Yes, its a kerberized

Re: Spark on hbase using Phoenix in secure cluster

2015-12-07 Thread Akhilesh Pathodia
Yes, its a kerberized cluster and ticket was generated using kinit command before running spark job. That's why Spark on hbase worked but when phoenix is used to get the connection to hbase, it does not pass the authentication to all nodes. Probably it is not handled in Phoenix version 4.3 or

Re: Spark on hbase using Phoenix in secure cluster

2015-12-07 Thread Ruslan Dautkhanov
That error is not directly related to spark nor hbase javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] Is this a kerberized cluster? You likely don't have a good (non-expired)

Re: spark to hbase

2015-10-27 Thread Deng Ching-Mallete
Hi, It would be more efficient if you configure the table and flush the commits by partition instead of per element in the RDD. The latter works fine because you only have 4 elements, but it won't bid well for large data sets IMO.. Thanks, Deng On Tue, Oct 27, 2015 at 5:22 PM, jinhong lu

Re: spark to hbase

2015-10-27 Thread Ted Yu
Jinghong: Hadmin variable is not used. You can omit that line. Which hbase release are you using ? As Deng said, don't flush per row. Cheers > On Oct 27, 2015, at 3:21 AM, Deng Ching-Mallete wrote: > > Hi, > > It would be more efficient if you configure the table and

Re: spark to hbase

2015-10-27 Thread Ted Yu
Jinghong: In one of earlier threads on storing data to hbase, it was found that htrace jar was not on classpath, leading to write failure. Can you check whether you are facing the same problem ? Cheers On Tue, Oct 27, 2015 at 5:11 AM, Ted Yu wrote: > Jinghong: > Hadmin

Re: spark to hbase

2015-10-27 Thread jinhong lu
Hi, Ted thanks for your help. I check the jar, it is in classpath, and now the problem is : 1、 Follow codes runs good, and it put the result to hbse: val res = lines.map(pairFunction).groupByKey().flatMap(pairFlatMapFunction).aggregateByKey(new TrainFeature())(seqOp,

Re: spark to hbase

2015-10-27 Thread Ted Yu
For #2, have you checked task log(s) to see if there was some clue ? You may want to use foreachPartition to reduce the number of flushes. In the future, please remove color coding - it is not easy to read. Cheers On Tue, Oct 27, 2015 at 6:53 PM, jinhong lu wrote: > Hi,

Re: spark to hbase

2015-10-27 Thread jinhong lu
I write a demo, but still no response, no error, no log. My hbase is 0.98, hadoop 2.3, spark 1.4. And I run in yarn-client mode. any idea? thanks. package com.lujinhong.sparkdemo import org.apache.spark._ import org.apache.spark.rdd.NewHadoopRDD import org.apache.hadoop.conf.Configuration;

Re: spark to hbase

2015-10-27 Thread Fengdong Yu
Also, please remove the HBase related to the Scala Object, this will resolve the serialize issue and avoid open connection repeatedly. and remember close the table after the final flush. > On Oct 28, 2015, at 10:13 AM, Ted Yu wrote: > > For #2, have you checked task

Re: Spark and HBase join issue

2015-03-14 Thread Ted Yu
The 4.1 GB table has 3 regions. This means that there would be at least 2 nodes which don't carry its region. Can you split this table into 12 (or more) regions ? BTW what's the value for spark.yarn.executor.memoryOverhead ? Cheers On Sat, Mar 14, 2015 at 10:52 AM, francexo83

Re: Spark with HBase

2014-12-15 Thread Aniket Bhatnagar
In case you are still looking for help, there has been multiple discussions in this mailing list that you can try searching for. Or you can simply use https://github.com/unicredit/hbase-rdd :-) Thanks, Aniket On Wed Dec 03 2014 at 16:11:47 Ted Yu yuzhih...@gmail.com wrote: Which hbase release

Re: Spark with HBase

2014-12-03 Thread Akhil Das
You could go through these to start with http://www.vidyasource.com/blog/Programming/Scala/Java/Data/Hadoop/Analytics/2014/01/25/lighting-a-spark-with-hbase http://stackoverflow.com/questions/25189527/how-to-process-a-range-of-hbase-rows-using-spark Thanks Best Regards On Wed, Dec 3, 2014 at

Re: Spark with HBase

2014-12-03 Thread Ted Yu
Which hbase release are you running ? If it is 0.98, take a look at: https://issues.apache.org/jira/browse/SPARK-1297 Thanks On Dec 2, 2014, at 10:21 PM, Jai jaidishhari...@gmail.com wrote: I am trying to use Apache Spark with a psuedo distributed Hadoop Hbase Cluster and I am looking for

Re: spark 1.1.0 - hbase 0.98.6-hadoop2 version - py4j.protocol.Py4JJavaError java.lang.ClassNotFoundException

2014-10-04 Thread Nick Pentreath
forgot to copy user list On Sat, Oct 4, 2014 at 3:12 PM, Nick Pentreath nick.pentre...@gmail.com wrote: what version did you put in the pom.xml? it does seem to be in Maven central: http://search.maven.org/#artifactdetails%7Corg.apache.hbase%7Chbase%7C0.98.6-hadoop2%7Cpom dependency

Re: Spark with HBase

2014-08-07 Thread Akhil Das
You can download and compile spark against your existing hadoop version. Here's a quick start https://spark.apache.org/docs/latest/cluster-overview.html#cluster-manager-types You can also read a bit here http://docs.sigmoidanalytics.com/index.php/Installing_Spark_andSetting_Up_Your_Cluster ( the

Re: Spark with HBase

2014-08-07 Thread chutium
this two posts should be good for setting up spark+hbase environment and use the results of hbase table scan as RDD settings http://www.abcn.net/2014/07/lighting-spark-with-hbase-full-edition.html some samples: http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html -- View

RE: Spark with HBase

2014-07-04 Thread N . Venkata Naga Ravi
Hi, Any update on the solution? We are still facing this issue... We could able to connect to HBase with independent code, but getting issue with Spark integration. Thx, Ravi From: nvn_r...@hotmail.com To: u...@spark.incubator.apache.org; user@spark.apache.org Subject: RE: Spark with HBase

RE: Spark with HBase

2014-06-29 Thread N . Venkata Naga Ravi
+user@spark.apache.org From: nvn_r...@hotmail.com To: u...@spark.incubator.apache.org Subject: Spark with HBase Date: Sun, 29 Jun 2014 15:28:43 +0530 I am using follwoing versiongs .. spark-1.0.0-bin-hadoop2 hbase-0.96.1.1-hadoop2 When executing Hbase Test , i am facing

Re: Spark on HBase vs. Spark on HDFS

2014-05-23 Thread Mayur Rustagi
Also I am unsure if Spark on Hbase leverages Locality. When you cache process data do you see node_local jobs in process list. Spark on HDFS leverages locality quite well can really boost performance by 3-4x in my experience. If you are loading all your data from HBase to spark then you are

Re: Spark on HBase vs. Spark on HDFS

2014-05-22 Thread Nick Pentreath
Hi In my opinion, running HBase for immutable data is generally overkill in particular if you are using Shark anyway to cache and analyse the data and provide the speed. HBase is designed for random-access data patterns and high throughput R/W activities. If you are only ever writing immutable

Re: Spark and HBase

2014-04-26 Thread Nicholas Chammas
Thank you for sharing. Phoenix for realtime queries and Spark for more complex batch processing seems like a potentially good combo. I wonder if Spark's future will include support for the same kinds of workloads that Phoenix is being built for. This little

Re: Spark and HBase

2014-04-25 Thread Josh Mahonin
Phoenix generally presents itself as an endpoint using JDBC, which in my testing seems to play nicely using JdbcRDD. However, a few days ago a patch was made against Phoenix to implement support via PIG using a custom Hadoop InputFormat, which means now it has Spark support too. Here's a code

Re: Spark and HBase

2014-04-25 Thread Nicholas Chammas
Josh, is there a specific use pattern you think is served well by Phoenix + Spark? Just curious. On Fri, Apr 25, 2014 at 3:17 PM, Josh Mahonin jmaho...@filetrek.com wrote: Phoenix generally presents itself as an endpoint using JDBC, which in my testing seems to play nicely using JdbcRDD.

Re: Spark and HBase

2014-04-08 Thread Bin Wang
Hi Flavio, I happened to attend, actually attending the 2014 Apache Conf, I heard a project called Apache Phoenix, which fully leverage HBase and suppose to be 1000x faster than Hive. And it is not memory bounded, in which case sets up a limit for Spark. It is still in the incubating group and

Re: Spark and HBase

2014-04-08 Thread Christopher Nguyen
Flavio, the two are best at two orthogonal use cases, HBase on the transactional side, and Spark on the analytic side. Spark is not intended for row-based random-access updates, while far more flexible and efficient in dataset-scale aggregations and general computations. So yes, you can easily

Re: Spark and HBase

2014-04-08 Thread Flavio Pompermaier
Thanks for the quick reply Bin. Phenix is something I'm going to try for sure but is seems somehow useless if I can use Spark. Probably, as you said, since Phoenix use a dedicated data structure within each HBase Table has a more effective memory usage but if I need to deserialize data stored in a

Re: Spark and HBase

2014-04-08 Thread Nicholas Chammas
Just took a quick look at the overview herehttp://phoenix.incubator.apache.org/ and the quick start guide herehttp://phoenix.incubator.apache.org/Phoenix-in-15-minutes-or-less.html . It looks like Apache Phoenix aims to provide flexible SQL access to data, both for transactional and analytic