Hello,
I am trying to setup a connection to hbase using hadoop 2.
I have a working deployment of hbase where the master is up and running,
sees its regionservers.
I have a working (not managed) zookeeper where zkCli.sh get /hbase/hbaseid
works from any
machine in the cluster.
Trying to setup the
I'm trying to do some process on my HBase dataset in our company. But I'm
pretty new to the HBase and Hadoop ecosystem.
I would like to get some feedback from this community, to see if my
understanding of HBase and the MapReduce operation on it is correct.
Some backgrounds here:
1. We have
hbase conf directory should be in the classpath.
Cheers
On Wed, Jan 25, 2017 at 9:04 AM, Hernán Blanco
wrote:
> Good eye, that could be a problem. hbase-site.xml doesn't seem to be
> included in the classpath... It should appear in this line of the map
> log,
Good eye, that could be a problem. hbase-site.xml doesn't seem to be
included in the classpath... It should appear in this line of the map
log, right?
2017-01-25 17:50:33,951 INFO [main] org.apache.zookeeper.ZooKeeper: Client
>From the new log snippet you posted, the hbase client tried to connect
to 192.168.0.24 whose node is not listed in the quorum.
Can you check the classpath for the maptask ? Looks like the effective
hbase-site.xml wasn't on the classpath.
On Wed, Jan 25, 2017 at 7:21 AM, Hernán Blanco
Hello Ted,
Thank you for your reply. Indeed, I wasn't sure if that property
existed, and in fact hbase-default.xml doesn't include it. I just
followed the advise from some webpage (I couldn't recall which) that
stated that you can include *any* property from the zoo.cfg native
config file for
bq. hbase.zookeeper.property.server.7
I searched 1.2 codebase but didn't find config parameter in the above form.
http://hbase.apache.org/book.html didn't mention it either.
May I ask where you obtained such config ?
For hbase.zookeeper.quorum, do you have zookeeper running on the 12 nodes
Hi all,
I'm running HBase 1.2.4 on a distributed setup with 12 virtual machines
on the same local network. The "main" node (node26.example.com) runs the
HMaster, while the other 11 machines run RegionServers. No backup
HMaster. This cluster also runs Hadoop 2.7.2 smoothly.
Both HBase shell and
code.
I store this object in a temporary table in HBase.
MapReduce Job 2: This is where I am having problems. I now need to read
this summary object such that it is available in each mapper so that
when I
read data from a third (different) table, I can use this summary object
to
perform
from two tables with no common row keys and
create a summary out of them in the reducer. The output of the reducer is a
Java Object containing the summary which has been serialized to byte code.
I store this object in a temporary table in HBase.
MapReduce Job 2: This is where I am having problems. I
containing the summary which has been serialized to byte code.
I store this object in a temporary table in HBase.
MapReduce Job 2: This is where I am having problems. I now need to read
this summary object such that it is available in each mapper so that when I
read data from a third (different
Hello J-D
I have a similar requirement as that presented by the original poster, i.e
updating a totals count without having to push the entire data set through
the Mapper again.
Are you advising against calling incrementColumnValue on a mapper's HTable
instance because the operation is not
There are a couple of issues and I'm sure others will point them out.
If you turn off speculative execution on the job, you don't get duplicate tasks
running in parallel.
You could create a table to store your aggregations on a per job basis where
your row-id could incorporate your job-id.
Sure, why not?
You can always open a connection to the counter table in your Mapper.setup()
method and then increment the counters within the Mapper.map() method.
Your update of the counter is an artifact and not the output of the
Mapper.map() method.
On Jun 18, 2012, at 7:49 PM, Sid Kumar
This question was answered here already:
http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/%3caanlktinnw2d7dmcyfu3ptv1hu_i3xqk_1pdsgd5nt...@mail.gmail.com%3E
Counters are not idempotent, this can be hard to manage.
J-D
On Mon, Jun 18, 2012 at 5:49 PM, Sid Kumar sqlsid...@gmail.com
Thanks for the info. It seems safer to do the aggregations in the MR code.
Do you guys think of any better alternative?
Sid
On Tue, Jun 19, 2012 at 9:55 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:
This question was answered here already:
As the the thread JD pointed out suggests - the best approach if you
want to avoid aggregations later on is to aggregate in an MR job,
output to a file with ad id and the number of impressions found for
that ad. Run a separate client application, likely single threaded if
the number of ads is not
it if it
comes up. Whatever you do, we have found Ganglia absolutely critical to
understand what is happening on the cluster, and we use Puppet [2] so we
can quickly test different setups.
Cheers,
Tim
[1]
http://gbif.blogspot.dk/2012/05/optimizing-hbase-mapreduce-scans-for.html
[2] E.g. https://github.com
to the data and not import data to the
computation. If I were to segregate HBase and MapReduce clusters, then when
using MapReduce on HBase data would I not have to transfer large amounts of
data from HBase/HDFS cluster to MapReduce/HDFS cluster?
Cloudera on their best practice page
(http
Thanks for the confirmation. There is also a good/detailed discussion thread
on this issue found at
http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-td4018856.html
http://apache-hbase.679495.n3.nabble.com/Shared-HDFS-for-HBase-and-MapReduce-td4018856.html
.
Michael
behind
Hadoop is to export computation to the data and not import data to the
computation. If I were to segregate HBase and MapReduce clusters, then when
using MapReduce on HBase data would I not have to transfer large amounts of
data from HBase/HDFS cluster to MapReduce/HDFS cluster?
Cloudera
as the two
when
sharing the same HDFS cluster could lead to performance problems. I am
not
sure if this is entirely true given the fact that the main concept behind
Hadoop is to export computation to the data and not import data to the
computation. If I were to segregate HBase and MapReduce clusters
Atif,
These are general recommendations and definitely change based on the access
patterns and the way you will be using HBase and MapReduce. In general, if you
are building a latency sensitive application on top of HBase, running a
MapReduce job at the same time will impact performance due
We have been using HBase Scans to feed MapReduce jobs for over a year
now. However, on close inspection, we have seen instances where some
block of rows are inexplicably missing.
We thought that this may happen during region splits or with jobs with
many mappers, but we have seen, for example,
On Wed, May 30, 2012 at 9:37 AM, Whitney Sorenson wsoren...@hubspot.com wrote:
We have been using HBase Scans to feed MapReduce jobs for over a year
now. However, on close inspection, we have seen instances where some
block of rows are inexplicably missing.
We thought that this may happen
HBase Version 0.90.4-cdh3u2, rHBase version and svn revision
HBase Compiled Thu Oct 13 20:32:26 PDT 2011, jenkins When HBase
version was compiled and by whom
Hadoop Version 0.20.2-cdh3u2,
r95a824e4005b2a94fe1c11f1ef9db4c672ba43cb Hadoop version and svn
revision
Hadoop Compiled
On Wed, May 30, 2012 at 10:45 AM, Whitney Sorenson
wsoren...@hubspot.com wrote:
HBase Version 0.90.4-cdh3u2, r HBase version and svn revision
HBase Compiled Thu Oct 13 20:32:26 PDT 2011, jenkins When HBase
version was compiled and by whom
Hadoop Version 0.20.2-cdh3u2,
I have couple of questions related to MapReduce over HBase
1. HBase guarantees data locality of store files and Regionserver only if
it stays up for long. If there are too many region movements or the server
has been recycled recently, there is a high probability that store file
blocks
the MR processing goes through the RegionServer, it may
impact the RegionServer performance for other random operations?
Yes, absolutely. Some people use separate HBase clusters for mapreduce
versus real-time traffic for this reason. You can also try to limit the
rate of data consumption by your
29 matches
Mail list logo