Cannot connect to Hbase using mapreduce

2020-12-08 Thread Christian Rivasseau
Hello, I am trying to setup a connection to hbase using hadoop 2. I have a working deployment of hbase where the master is up and running, sees its regionservers. I have a working (not managed) zookeeper where zkCli.sh get /hbase/hbaseid works from any machine in the cluster. Trying to setup the

Try to confirm my understanding of HBase and MapReduce behavior.

2020-06-05 Thread Brian Hsu
I'm trying to do some process on my HBase dataset in our company. But I'm pretty new to the HBase and Hadoop ecosystem. I would like to get some feedback from this community, to see if my understanding of HBase and the MapReduce operation on it is correct. Some backgrounds here: 1. We have

Re: RpcRetryingCaller error accessing HBase from MapReduce job

2017-01-25 Thread Ted Yu
hbase conf directory should be in the classpath. Cheers On Wed, Jan 25, 2017 at 9:04 AM, Hernán Blanco wrote: > Good eye, that could be a problem. hbase-site.xml doesn't seem to be > included in the classpath... It should appear in this line of the map > log,

Re: RpcRetryingCaller error accessing HBase from MapReduce job

2017-01-25 Thread Hernán Blanco
Good eye, that could be a problem. hbase-site.xml doesn't seem to be included in the classpath... It should appear in this line of the map log, right? 2017-01-25 17:50:33,951 INFO [main] org.apache.zookeeper.ZooKeeper: Client

Re: RpcRetryingCaller error accessing HBase from MapReduce job

2017-01-25 Thread Ted Yu
>From the new log snippet you posted, the hbase client tried to connect to 192.168.0.24 whose node is not listed in the quorum. Can you check the classpath for the maptask ? Looks like the effective hbase-site.xml wasn't on the classpath. On Wed, Jan 25, 2017 at 7:21 AM, Hernán Blanco

Re: RpcRetryingCaller error accessing HBase from MapReduce job

2017-01-25 Thread Hernán Blanco
Hello Ted, Thank you for your reply. Indeed, I wasn't sure if that property existed, and in fact hbase-default.xml doesn't include it. I just followed the advise from some webpage (I couldn't recall which) that stated that you can include *any* property from the zoo.cfg native config file for

Re: RpcRetryingCaller error accessing HBase from MapReduce job

2017-01-25 Thread Ted Yu
bq. hbase.zookeeper.property.server.7 I searched 1.2 codebase but didn't find config parameter in the above form. http://hbase.apache.org/book.html didn't mention it either. May I ask where you obtained such config ? For hbase.zookeeper.quorum, do you have zookeeper running on the 12 nodes

RpcRetryingCaller error accessing HBase from MapReduce job

2017-01-25 Thread Hernán Blanco
Hi all, I'm running HBase 1.2.4 on a distributed setup with 12 virtual machines on the same local network. The "main" node (node26.example.com) runs the HMaster, while the other 11 machines run RegionServers. No backup HMaster. This cluster also runs Hadoop 2.7.2 smoothly. Both HBase shell and

Re: HBase chain MapReduce job with broadcasting smaller tables to all Mappers

2014-07-07 Thread Arun Allamsetty
code. I store this object in a temporary table in HBase. MapReduce Job 2: This is where I am having problems. I now need to read this summary object such that it is available in each mapper so that when I read data from a third (different) table, I can use this summary object to perform

HBase chain MapReduce job with broadcasting smaller tables to all Mappers

2014-07-03 Thread Arun Allamsetty
from two tables with no common row keys and create a summary out of them in the reducer. The output of the reducer is a Java Object containing the summary which has been serialized to byte code. I store this object in a temporary table in HBase. MapReduce Job 2: This is where I am having problems. I

Re: HBase chain MapReduce job with broadcasting smaller tables to all Mappers

2014-07-03 Thread Ted Yu
containing the summary which has been serialized to byte code. I store this object in a temporary table in HBase. MapReduce Job 2: This is where I am having problems. I now need to read this summary object such that it is available in each mapper so that when I read data from a third (different

Re: Increment Counters in HBase during MapReduce

2012-06-24 Thread David Koch
Hello J-D I have a similar requirement as that presented by the original poster, i.e updating a totals count without having to push the entire data set through the Mapper again. Are you advising against calling incrementColumnValue on a mapper's HTable instance because the operation is not

Re: Increment Counters in HBase during MapReduce

2012-06-24 Thread Michael Segel
There are a couple of issues and I'm sure others will point them out. If you turn off speculative execution on the job, you don't get duplicate tasks running in parallel. You could create a table to store your aggregations on a per job basis where your row-id could incorporate your job-id.

Re: Increment Counters in HBase during MapReduce

2012-06-19 Thread Michael Segel
Sure, why not? You can always open a connection to the counter table in your Mapper.setup() method and then increment the counters within the Mapper.map() method. Your update of the counter is an artifact and not the output of the Mapper.map() method. On Jun 18, 2012, at 7:49 PM, Sid Kumar

Re: Increment Counters in HBase during MapReduce

2012-06-19 Thread Jean-Daniel Cryans
This question was answered here already: http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/%3caanlktinnw2d7dmcyfu3ptv1hu_i3xqk_1pdsgd5nt...@mail.gmail.com%3E Counters are not idempotent, this can be hard to manage. J-D On Mon, Jun 18, 2012 at 5:49 PM, Sid Kumar sqlsid...@gmail.com

Re: Increment Counters in HBase during MapReduce

2012-06-19 Thread Sid Kumar
Thanks for the info. It seems safer to do the aggregations in the MR code. Do you guys think of any better alternative? Sid On Tue, Jun 19, 2012 at 9:55 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: This question was answered here already:

Re: Increment Counters in HBase during MapReduce

2012-06-19 Thread Amandeep Khurana
As the the thread JD pointed out suggests - the best approach if you want to avoid aggregations later on is to aggregate in an MR job, output to a file with ad id and the number of impressions found for that ad. Run a separate client application, likely single threaded if the number of ads is not

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Tim Robertson
it if it comes up. Whatever you do, we have found Ganglia absolutely critical to understand what is happening on the cluster, and we use Puppet [2] so we can quickly test different setups. Cheers, Tim [1] http://gbif.blogspot.dk/2012/05/optimizing-hbase-mapreduce-scans-for.html [2] E.g. https://github.com

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Michael Segel
to the data and not import data to the computation. If I were to segregate HBase and MapReduce clusters, then when using MapReduce on HBase data would I not have to transfer large amounts of data from HBase/HDFS cluster to MapReduce/HDFS cluster? Cloudera on their best practice page (http

Re: Shared Cluster between HBase and MapReduce

2012-06-06 Thread Atif Khan
Thanks for the confirmation. There is also a good/detailed discussion thread on this issue found at http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-td4018856.html http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-td4018856.html . Michael

Shared Cluster between HBase and MapReduce

2012-06-05 Thread Atif Khan
behind Hadoop is to export computation to the data and not import data to the computation. If I were to segregate HBase and MapReduce clusters, then when using MapReduce on HBase data would I not have to transfer large amounts of data from HBase/HDFS cluster to MapReduce/HDFS cluster? Cloudera

Re: Shared Cluster between HBase and MapReduce

2012-06-05 Thread Paul Mackles
as the two when sharing the same HDFS cluster could lead to performance problems. I am not sure if this is entirely true given the fact that the main concept behind Hadoop is to export computation to the data and not import data to the computation. If I were to segregate HBase and MapReduce clusters

Re: Shared Cluster between HBase and MapReduce

2012-06-05 Thread Amandeep Khurana
Atif, These are general recommendations and definitely change based on the access patterns and the way you will be using HBase and MapReduce. In general, if you are building a latency sensitive application on top of HBase, running a MapReduce job at the same time will impact performance due

HBase to MapReduce Scans missing rows

2012-05-30 Thread Whitney Sorenson
We have been using HBase Scans to feed MapReduce jobs for over a year now. However, on close inspection, we have seen instances where some block of rows are inexplicably missing. We thought that this may happen during region splits or with jobs with many mappers, but we have seen, for example,

Re: HBase to MapReduce Scans missing rows

2012-05-30 Thread Stack
On Wed, May 30, 2012 at 9:37 AM, Whitney Sorenson wsoren...@hubspot.com wrote: We have been using HBase Scans to feed MapReduce jobs for over a year now. However, on close inspection, we have seen instances where some block of rows are inexplicably missing. We thought that this may happen

Re: HBase to MapReduce Scans missing rows

2012-05-30 Thread Whitney Sorenson
HBase Version 0.90.4-cdh3u2, rHBase version and svn revision HBase Compiled Thu Oct 13 20:32:26 PDT 2011, jenkins When HBase version was compiled and by whom Hadoop Version 0.20.2-cdh3u2, r95a824e4005b2a94fe1c11f1ef9db4c672ba43cb Hadoop version and svn revision Hadoop Compiled

Re: HBase to MapReduce Scans missing rows

2012-05-30 Thread Stack
On Wed, May 30, 2012 at 10:45 AM, Whitney Sorenson wsoren...@hubspot.com wrote: HBase Version   0.90.4-cdh3u2, r        HBase version and svn revision HBase Compiled  Thu Oct 13 20:32:26 PDT 2011, jenkins   When HBase version was compiled and by whom Hadoop Version  0.20.2-cdh3u2,

HBase and MapReduce

2012-05-23 Thread Hemant Bhanawat
I have couple of questions related to MapReduce over HBase 1. HBase guarantees data locality of store files and Regionserver only if it stays up for long. If there are too many region movements or the server has been recycled recently, there is a high probability that store file blocks

Re: HBase and MapReduce

2012-05-23 Thread Dave Revell
the MR processing goes through the RegionServer, it may impact the RegionServer performance for other random operations? Yes, absolutely. Some people use separate HBase clusters for mapreduce versus real-time traffic for this reason. You can also try to limit the rate of data consumption by your