Make sure guava.jar is in your classpath.
On Thu, Nov 17, 2011 at 12:23 PM, Denis Kreis wrote:
> Hi,
>
> i'm getting this error when trying to use the importtsv tool with
> hadoop-0.20.205.0 and hbase-0.92.0
>
> hadoop jar ../../hbase-0.92.0-SNAPSHOT/hbase-0.92.0-SNAPSHOT.jar importtsv
> Excepti
This issue is a common pitfall to those new to HBase and I think it could be
a good thing to have in the HBase book. Once someone realizes that you can
store multiple values for the same cell, each with a timestamp there can be
a natural tendency to think "hey, I can store a one-to-many using multi
Before you close your laptop and take it home, do you gracefully stop your
local HBase instance? When I do this I'm able to start up from home without
a problem. But when I forget, all goes to crap.
On Fri, Aug 26, 2011 at 9:15 AM, Mark wrote:
> Ok so I'm not the only one. Any solutions?
>
>
> O
Hey Garrett,
I'm not sure about a config setting but in Pig we changed TestHBaseStorage
to delete all rows of tables instead of truncate them. This was faster since
the tables are typically small in tests. See Dymitriy's note in the
deleteAllRows method here:
http://svn.apache.org/repos/asf/pig/t
Yes, you can do this via the thrift API:
http://yannramin.com/2008/07/19/using-facebook-thrift-with-python-and-hbase/
Alternatively you can use Pig's HBaseStorage (r/w), or HBase's ImportTsv
(w).
On Sun, Aug 7, 2011 at 5:35 AM, Varadharajan Mukundan
wrote:
> Greetings,
>
> Currently I'm using H
What do you see when you do this from the ZK client:
get /hbase/root-region-server
I suspect a client somewhere registered itself in ZK. Maybe fixing the IP of
the root region server in ZK will do the trick.
On Fri, Jul 15, 2011 at 10:58 AM, Jason Chuong <
jason.chu...@cbsinteractive.com> wrote
AFAIK, sqoop can only write to HBase at this point.
We use Pig's HBaseStorage class to read from HBase and transform data for
import into other system, which has worked well for us.
On Thu, Jun 23, 2011 at 11:38 AM, Vishal Kapoor
wrote:
> thought it was only SQL-to-Hadoop
> can it also dump from
on once it's released.
>
> Please let us know how it goes and if there's anything we can help with!
>
Will do, thanks!
>
> Gary
>
>
>
> On Mon, Jun 20, 2011 at 10:06 AM, Bill Graham wrote:
>
>> Thanks Dean, that sounds similar to the approach w
I believe.
> Check this: http://search-hadoop.com/?q=acl&fc_project=HBase&fc_type=jira
>
>
> Otis
>
> --
> We're hiring HBase / Hadoop / Hive / Mahout engineers with interest in Big
> Data Mining and Analytics
>
> http://blog.sematext.com/2011
irst token of the Row Id
> > i.e. "CUST123|someOtherRowQualifier". For all
> > customer queries, we add
> > their customerId as the row prefix and, of course, ensure
> > that they are
> > authorized within our app.
> >
> > On Mon, Jun 13, 2011 at 5:31 PM,
The output is in hex or Hexidecimal format. It's basically a way to
represent binary data (which HBase stores) as text (which the shell
produces). See http://en.wikipedia.org/wiki/Hexadecimal
On Thu, Jun 16, 2011 at 8:26 AM, hbaser wrote:
>
> OK, but where did these "hex" come from?
>
> What is
Try removing the spaces in the column list, i.e. commas only.
On Tue, Jun 14, 2011 at 11:29 PM, James Ram wrote:
> Hi,
>
> I'm having trouble with using the importtsv tool.
> I ran the following command:
>
> hadoop jar hadoop_sws/hbase-0.90.0/hbase-0.90.0.jar importtsv
> -Dimporttsv.columns=HB
Also, you might want to look at HBASE-3880, which is committed but not
released yet. It allows you to specify a custom Mapper class when running
ImportTsv. It seems like a similar patch to make the input format plug-able
would be needed in your case though.
On Tue, Jun 14, 2011 at 9:53 AM, Todd L
Hello there,
We have a number of different groups within our organization who will soon
be working within the same HBase cluster and we're trying to set up some
best practices to keep thinks organized. Since there are no HBase ACLs and
no concept of multiple databases in the cluster, we're lookin
You can load the HDFS files into HBase. Check out importtsv to generate
HFiles and completebulkload to load them into a table:
http://hbase.apache.org/bulk-loads.html
On Mon, Jun 6, 2011 at 9:38 PM, James Ram wrote:
> Hi,
>
> I too have the same situation. The data in HDFS should be mapped to
You can do this in a few lines of Pig, check out the HBaseStorage class.
You'll need to now the names of your column families, but besides that it
could be done fairly generically.
On Mon, Jun 6, 2011 at 3:57 PM, Jack Levin wrote:
> Hello, does anyone have any tools you could share that would
One alternative option is to calculate some stats during compactions and
store that somewhere for retrieval. The metrics wouldn't be up to date of
course, since they've be stats from the last compaction time. I think that
would still be useful info to have, but it's different than what's being
requ
We use Pig to join HBase tables using HBaseStorage which has worked well. If
you're using HBase >= 0.89 you'll need to build from the trunk or the Pig
0.8 branch.
On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> > The Hive-HBase integration allows you to c
ser@hbase.apache.org
> > >>> Cc: billgra...@gmail.com
> > >>> Date: Tuesday, May 24, 2011, 1:48 PM
> > >>> I don't think so.
> > >>>
> > >>> On Tue, May 24, 2011 at 1:45 PM, Himanish Kushary <
> himan...@gmail.com
As well as http://www.lilyproject.org/lily/about/playground/hbaserowlog.html
I'd like to hear if anyone has had good or bad experiences using either of
these techniques, as we'll soon have a need to implement update
notifications as well.
On Tue, May 24, 2011 at 11:31 AM, Ted Yu wrote:
> Tak
Is there anything meaningful in the RS logs? I've seen situations like this
where a RS is failing to start due to issues reading the WAL. If this is the
case it would list which WAL is problematic, which is zero-length in my
experience, so I delete it from HDFS and things start up.
On Mon, May 23
le.close();
> }
> Keric can remove the HConnectionManager.deleteAllConnections() call so that
> more than one STORE commands can be used at the same time.
>
> But I am not sure how HConnectionManager.deleteAllConnections() itself can
> be triggered by a PIG command.
>
> Cheers
>
>
ence the second table
never gets written to
If this is the case, then the question becomes just how do you write to two
tables in one MR job using TableOutputFormat?
1- http://search-hadoop.com/m/IsdwtMF2pV/HTable+reuse/v=plain
On Fri, May 20, 2011 at 2:29 PM, Bill Graham wrote:
> Yes, th
Yes, that's what it seems. I've opened a Pig JIRA for it:
https://issues.apache.org/jira/browse/PIG-2085
On Thu, May 19, 2011 at 1:31 PM, Jean-Daniel Cryans wrote:
> Your attachement didn't make it, it rarely does on the mailing lists.
> I suggest you use a gist.github or a pastebin.
>
> Regardi
HBase will always need to store the column name in each cell that uses it.
The only way to reduce the size taken by storing repeated column names
(besides using compression) is to instead store a small pointer to a lookup
table that holds the column name. Check out OpenTSDB, which does something
si
One big reason is that there will be updates in the memory store that aren't
yet written to HFiles. You'll miss these.
On Fri, May 6, 2011 at 12:27 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> Is there an issue open or any particular reason that an MR job needs to
> access
> the HB
I had this issue and had to add the HBase conf dir to HADOOP_CLASSPATH
in conf/hadoop-env.sh on each of the nodes in the cluster so they
could find Zookeeper.
On Sun, Apr 24, 2011 at 1:04 PM, Dmitriy Ryaboy wrote:
> I suspect the problem here is that you don't have your hbase config
> directory
We've been using pig to read bulk data from hdfs, transform it and
load it into HBase using the HBaseStorage class, which has worked well
for us. If you try it out you'll want to build from the 0.9.0 branch
(being cut as we speak I beleive) or the trunk. There's an open pig
JIRA with a patch to dis
Agreed. I've seen similar issues when upon startup where for whatever
reason an hlog (often empty) can't be read, which hangs the startup
process. Manually deleting it from HDFS clears the issue.
On Tue, Apr 12, 2011 at 10:01 AM, Jinsong Hu wrote:
> You probably should stop all master/regionserv
ay since all you want are the
> keys.
>
> On Tue, Mar 29, 2011 at 2:15 PM, Bill Graham wrote:
>>
>> 1. use Pig to read in our datasets, join/filter/transform/etc before
>> writing the output back to HDFS with N reducers ordered by key, where
>> N is the number of
g.
> Assuming reducer output file is SequenceFile, steps 2 and 3 can be
> automated.
>
> On Tue, Mar 29, 2011 at 2:15 PM, Bill Graham wrote:
>>
>> I've been thinking about this topic lately so I'll fork from another
>> discussion to ask if anyone has a good app
Hi,
We have an empty table that is somehow in a bad state that I'm unable
to disable or drop. We're running 0.90.0 on CDH3b2. Is there a way
that I can manually remove this table from HBase without making a mess
of things?
The table has 2 CFs and it's empty. When I do a scan I get this:
org.apac
I've been thinking about this topic lately so I'll fork from another
discussion to ask if anyone has a good approach to determining keys
for pre-splitting from a known dataset. We have a key scenario similar
to what Ted describes below.
We periodically run MR jobs to transform and bulk load data f
Back to the issue of keeping a count, I've often wondered if this
would be easy to do without much cost at compaction time? It of course
wouldn't be a true real-time total but something like a
compactedRowCount. It could be a useful metric to expose via JMX to
get a feel for growth over time.
On
On Mon, Mar 14, 2011 at 8:54 PM, Stack wrote:
> On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham wrote:
>> Anyway, it's been about a week and all regions for the table are still
>> on 1 node. I see messages like this in the logs every 5 minutes:
>>
>&
I hope I'm not hijacking the thread but I'm seeing what I think is a
similar issue. About a week ago I loaded a bunch of data into a newly
created table. It took about an hour and resulted in 12 regions being
created on a single node. (Afterwards I remembered a conversation with
JD where he describ
You could also do this with MR easily using Pig's HBaseStorage and
either an inner join or an outer join with a filter on null, depending
on if you want matches or misses, respectively.
On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed wrote:
> I suggest it to be ROWCOL because you have many columns
For example:
>
> rs=$(cat ${HBASE_CONF_DIR}/regionservers | xargs)
>
> if ( whiptail --yesno "Do you want to shutdown the cluster with the
> following regionserver $rs\n[y/n]" 10 40 )
> then
> # proceed with the shutdown
> else
> # exit
> fi
>
>
Actually, how far behind replication is w.r.t. edit logs is different
than how out of sync they are, but you get the idea.
On Thu, Mar 3, 2011 at 9:07 AM, Bill Graham wrote:
> One more question for the FAQ:
>
> 6. Is it possible for an admin to tell just how out of sync the two
>
One more question for the FAQ:
6. Is it possible for an admin to tell just how out of sync the two
clusters are? Something like Seconds_Behind_Master in MySQL's SHOW
SLAVE STATUS?
On Wed, Mar 2, 2011 at 9:32 PM, Jean-Daniel Cryans wrote:
> Although, I would add that this feature is still experi
This first region starts with an empty byte[] and the last region ends
with one. Those in between have non-empy byte[]s to specify their
boundaries.
On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung wrote:
> Thanks, Stack!
>
> Got a few more questions.
>
> Does every region start with an empty byte[
cause stopping the master is done by starting a VM and
running oahh.master.HMaster stop with the remote configs. This seems
to happen regardless of whether there's a match on a local pid file
with a local master process.
On Wed, Mar 2, 2011 at 8:57 PM, Stack wrote:
> On Wed, Mar 2, 2011 at
Hi,
We had a troubling experience today that I wanted to share. Our dev
cluster got completely shut down by a developer by mistake, without
said developer even realizing it. Here's how...
We have multiple sets of HBase configs checked into SVN that
developers can checkout and point their HBASE_CO
This happens often on apache lists when sending emails as HTML. Try
sending as plain text instead.
On Thu, Feb 24, 2011 at 10:35 AM, Mark Kerzner wrote:
>
> I keep getting this error:
>
> Delivery to the following recipient failed permanently:
>
> user@hbase.apache.org
>
> Technical details of
Just to follow up, this appears to be a bug. I've created a JIRA.
https://issues.apache.org/jira/browse/HBASE-3550
On Fri, Feb 18, 2011 at 10:57 AM, Bill Graham wrote:
> Hi,
>
> I'm unable to get ColumnPrefixFilter working when I use it in a
> FilterList and I'm wond
Hi,
I'm unable to get ColumnPrefixFilter working when I use it in a
FilterList and I'm wondering if this is a bug or a mis-usage on my
part. If I set ColumnPrefixFilter directly on the Scan object all
works fine. The following code shows an example of scanning a table
with a column descriptor 'inf
Thanks Stack, this is really helpful.
On Fri, Jan 28, 2011 at 2:06 PM, Stack wrote:
> On Fri, Jan 28, 2011 at 1:15 PM, Bill Graham wrote:
>> I also don't have a solid understanding of the responsibilities of
>> master, but it seems like it's job is really about
I also don't have a solid understanding of the responsibilities of
master, but it seems like it's job is really about managing regions
(i.e., coordinating splits and compactions, etc.) and updating ROOT
and META. Is that correct?
On Fri, Jan 28, 2011 at 9:31 AM, Weishung Chung wrote:
> Great, th
Hello all,
I'm working on a patch to HBaseStorage to support additional
functionality with respect to how columns are specified and how HBase
data is converted into Pig data structures. If you use Pig to read
HBase data, please take a look at this JIRA and provide feedback if
you have it:
https:/
I can't say from experience, but here's a thread that implies that
shorter column names are better.
http://search-hadoop.com/m/oWZQd161GI22
On Tue, Jan 25, 2011 at 11:14 PM, JinChao Wen wrote:
> Hi all,
>
> If there are lots of very long column family name and column name in my
> table, is the
Thanks for the comments. Attached is the log file from the master
after the restart. The last error message was repeated every second.
See comments below.
On Tue, Jan 25, 2011 at 7:20 PM, Stack wrote:
> On Tue, Jan 25, 2011 at 3:27 PM, Bill Graham wrote:
>> Hi,
>>
>> A
A.
On Tue, Jan 25, 2011 at 3:27 PM, Bill Graham wrote:
> Hi,
>
> A developer on our team created a table today and something failed and
> we fell back into the dire scenario we were in earlier this week. When
> I got on the scene 2 of our 4 regions had crashed. When I brought them
&
Hi,
A developer on our team created a table today and something failed and
we fell back into the dire scenario we were in earlier this week. When
I got on the scene 2 of our 4 regions had crashed. When I brought them
back up, they wouldn't come online and the master was scrolling
messages like tho
This wiki shows how to build the lzo jar:
http://wiki.apache.org/hadoop/UsingLzoCompression
You'll get that exception if the jar is not found in lib/.
On Tue, Jan 25, 2011 at 10:38 AM, Peter Haidinyak wrote:
> Hi
> I am using HBase version .89.20100924+28 and Hadoop version 0.20.2+737
>
the new table without writing
a MR job? If it's easy to save the data I will, but I can survive
without it.
More comments below.
thanks,
Bill
On Sun, Jan 23, 2011 at 11:16 AM, Stack wrote:
> On Sat, Jan 22, 2011 at 10:27 AM, Bill Graham wrote:
>> Hi,
>>
>> Last n
Hi,
Last night while experimenting with getting lzo set up I managed to
somehow lose all .META. data and all my tables. My regions still exist
in HDFS, but the shell tells me I have no tables. At this point I'm
pretty sure I need to reinstall HBase clean-slate on HDFS, hence
losing all data, but I
hen so does the row key.
>
> I hope this helps,
> -ryan
>
> On Fri, Jan 21, 2011 at 3:26 PM, Bill Graham wrote:
>> I follow the tombstone/compact/delete cycle of the column values, but
>> I'm still unclear of the row key life cycle.
>>
>> Is it that the bytes t
d they disappear
> for good. (Barring other backups of course)
>
> Because of our variable length storage model, we dont store rows in
> particular blocks and rewrite said blocks, so notions of rows
> 'existing' or not, don't event apply to HBase as they do to RDBMS
>
If you use some combination of delete requests and leave a row without
any column data will the row/rowkey still exist? I'm thinking of the
use case where you want to prune all old data, including row keys,
from a table.
On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson wrote:
> There are 3 kinds of
is to happen. Very strange. I moved that
directory out of the way and the master started fine.
I can debug how the classpath gets set, but any ideas?
On Mon, Jan 10, 2011 at 10:29 PM, Stack wrote:
> For sure you removed the old hadoop from hbase/lib?
>
> On Mon, Jan 10, 2011 at 10:12 P
directory with the same version from your HDFS install.
> -Todd
>
> On Mon, Jan 10, 2011 at 9:45 PM, Bill Graham wrote:
>>
>> Hi,
>>
>> Today I upgraded from Hadoop 0.20.1 to CHD3b2 0.20.2 to get the append
>> functionality that HBase requires and now I can
Hi,
Today I upgraded from Hadoop 0.20.1 to CHD3b2 0.20.2 to get the append
functionality that HBase requires and now I can't start HBase. Hadoop
and HDFS seem to be working just fine, but when I start up the HBase
master, I get this error in the NNs:
2011-01-10 21:20:36,134 ERROR
org.apache.hadoo
Take a look at Oozie or Azkaban:
http://www.quora.com/What-are-the-differences-advantages-disadvantages-of-Azkaban-vs-Oozie
On Wed, Jan 5, 2011 at 9:35 AM, Peter Veentjer wrote:
> He Guys,
>
> although it isn't completely related to HBase. Is there support for
> scheduling map reduce jobs?
>
> E
Hi Andrew,
Just to make sure I'm clear, are you saying that HBase 0.90.0 is
incompatible with CDH3b3 due to the security changes?
We're just getting going with HBase and have been running 0.90.0rc1 on
an un-patched version of Hadoop in dev. We were planning on upgrading
to CDH3b3 to get the sync
64 matches
Mail list logo