Re: importtsv bulk load error

2011-11-17 Thread Bill Graham
Make sure guava.jar is in your classpath. On Thu, Nov 17, 2011 at 12:23 PM, Denis Kreis wrote: > Hi, > > i'm getting this error when trying to use the importtsv tool with > hadoop-0.20.205.0 and hbase-0.92.0 > > hadoop jar ../../hbase-0.92.0-SNAPSHOT/hbase-0.92.0-SNAPSHOT.jar importtsv > Excepti

Re: Versioning

2011-08-26 Thread Bill Graham
This issue is a common pitfall to those new to HBase and I think it could be a good thing to have in the HBase book. Once someone realizes that you can store multiple values for the same cell, each with a timestamp there can be a natural tendency to think "hey, I can store a one-to-many using multi

Re: Development

2011-08-26 Thread Bill Graham
Before you close your laptop and take it home, do you gracefully stop your local HBase instance? When I do this I'm able to start up from home without a problem. But when I forget, all goes to crap. On Fri, Aug 26, 2011 at 9:15 AM, Mark wrote: > Ok so I'm not the only one. Any solutions? > > > O

Re: mini-hbase configuration for tests

2011-08-15 Thread Bill Graham
Hey Garrett, I'm not sure about a config setting but in Pig we changed TestHBaseStorage to delete all rows of tables instead of truncate them. This was faster since the tables are typically small in tests. See Dymitriy's note in the deleteAllRows method here: http://svn.apache.org/repos/asf/pig/t

Re: Reg. support for using HBase as a source and sink for a Map-Reduce streaming job

2011-08-07 Thread Bill Graham
Yes, you can do this via the thrift API: http://yannramin.com/2008/07/19/using-facebook-thrift-with-python-and-hbase/ Alternatively you can use Pig's HBaseStorage (r/w), or HBase's ImportTsv (w). On Sun, Aug 7, 2011 at 5:35 AM, Varadharajan Mukundan wrote: > Greetings, > > Currently I'm using H

Re: hbase crash after restart

2011-07-15 Thread Bill Graham
What do you see when you do this from the ZK client: get /hbase/root-region-server I suspect a client somewhere registered itself in ZK. Maybe fixing the IP of the root region server in ZK will do the trick. On Fri, Jul 15, 2011 at 10:58 AM, Jason Chuong < jason.chu...@cbsinteractive.com> wrote

Re: export data from HBase to mysql

2011-06-23 Thread Bill Graham
AFAIK, sqoop can only write to HBase at this point. We use Pig's HBaseStorage class to read from HBase and transform data for import into other system, which has worked well for us. On Thu, Jun 23, 2011 at 11:38 AM, Vishal Kapoor wrote: > thought it was only SQL-to-Hadoop > can it also dump from

Re: any multitenancy suggestions for HBase?

2011-06-20 Thread Bill Graham
on once it's released. > > Please let us know how it goes and if there's anything we can help with! > Will do, thanks! > > Gary > > > > On Mon, Jun 20, 2011 at 10:06 AM, Bill Graham wrote: > >> Thanks Dean, that sounds similar to the approach w

Re: any multitenancy suggestions for HBase?

2011-06-20 Thread Bill Graham
I believe. > Check this: http://search-hadoop.com/?q=acl&fc_project=HBase&fc_type=jira > > > Otis > > -- > We're hiring HBase / Hadoop / Hive / Mahout engineers with interest in Big > Data Mining and Analytics > > http://blog.sematext.com/2011

Re: any multitenancy suggestions for HBase?

2011-06-16 Thread Bill Graham
irst token of the Row Id > > i.e. "CUST123|someOtherRowQualifier". For all > > customer queries, we add > > their customerId as the row prefix and, of course, ensure > > that they are > > authorized within our app. > > > > On Mon, Jun 13, 2011 at 5:31 PM,

Re: Problem of output

2011-06-16 Thread Bill Graham
The output is in hex or Hexidecimal format. It's basically a way to represent binary data (which HBase stores) as text (which the shell produces). See http://en.wikipedia.org/wiki/Hexadecimal On Thu, Jun 16, 2011 at 8:26 AM, hbaser wrote: > > OK, but where did these "hex" come from? > > What is

Re: Difficulty using importtsv tool

2011-06-14 Thread Bill Graham
Try removing the spaces in the column list, i.e. commas only. On Tue, Jun 14, 2011 at 11:29 PM, James Ram wrote: > Hi, > > I'm having trouble with using the importtsv tool. > I ran the following command: > > hadoop jar hadoop_sws/hbase-0.90.0/hbase-0.90.0.jar importtsv > -Dimporttsv.columns=HB

Re: Best way to Import data from Cassandra to HBase

2011-06-14 Thread Bill Graham
Also, you might want to look at HBASE-3880, which is committed but not released yet. It allows you to specify a custom Mapper class when running ImportTsv. It seems like a similar patch to make the input format plug-able would be needed in your case though. On Tue, Jun 14, 2011 at 9:53 AM, Todd L

any multitenancy suggestions for HBase?

2011-06-13 Thread Bill Graham
Hello there, We have a number of different groups within our organization who will soon be working within the same HBase cluster and we're trying to set up some best practices to keep thinks organized. Since there are no HBase ACLs and no concept of multiple databases in the cluster, we're lookin

Re: Reading a Hdfs file using HBase

2011-06-06 Thread Bill Graham
You can load the HDFS files into HBase. Check out importtsv to generate HFiles and completebulkload to load them into a table: http://hbase.apache.org/bulk-loads.html On Mon, Jun 6, 2011 at 9:38 PM, James Ram wrote: > Hi, > > I too have the same situation. The data in HDFS should be mapped to

Re: exporting from hbase as text (tsv)

2011-06-06 Thread Bill Graham
You can do this in a few lines of Pig, check out the HBaseStorage class. You'll need to now the names of your column families, but besides that it could be done fairly generically. On Mon, Jun 6, 2011 at 3:57 PM, Jack Levin wrote: > Hello, does anyone have any tools you could share that would

Re: feature request (count)

2011-06-03 Thread Bill Graham
One alternative option is to calculate some stats during compactions and store that somewhere for retrieval. The metrics wouldn't be up to date of course, since they've be stats from the last compaction time. I think that would still be useful info to have, but it's different than what's being requ

Re: How to efficiently join HBase tables?

2011-05-31 Thread Bill Graham
We use Pig to join HBase tables using HBaseStorage which has worked well. If you're using HBase >= 0.89 you'll need to build from the trunk or the Pig 0.8 branch. On Tue, May 31, 2011 at 5:18 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > > The Hive-HBase integration allows you to c

Re: Any trigger like facility for HBase tables

2011-05-26 Thread Bill Graham
ser@hbase.apache.org > > >>> Cc: billgra...@gmail.com > > >>> Date: Tuesday, May 24, 2011, 1:48 PM > > >>> I don't think so. > > >>> > > >>> On Tue, May 24, 2011 at 1:45 PM, Himanish Kushary < > himan...@gmail.com

Re: Any trigger like facility for HBase tables

2011-05-24 Thread Bill Graham
As well as http://www.lilyproject.org/lily/about/playground/hbaserowlog.html I'd like to hear if anyone has had good or bad experiences using either of these techniques, as we'll soon have a need to implement update notifications as well. On Tue, May 24, 2011 at 11:31 AM, Ted Yu wrote: > Tak

Re: HBase Not Starting after improper shutdown

2011-05-23 Thread Bill Graham
Is there anything meaningful in the RS logs? I've seen situations like this where a RS is failing to start due to issues reading the WAL. If this is the case it would list which WAL is problematic, which is zero-length in my experience, so I delete it from HDFS and things start up. On Mon, May 23

Re: IO Error when using multiple HBaseStorage in PIG

2011-05-21 Thread Bill Graham
le.close(); > } > Keric can remove the HConnectionManager.deleteAllConnections() call so that > more than one STORE commands can be used at the same time. > > But I am not sure how HConnectionManager.deleteAllConnections() itself can > be triggered by a PIG command. > > Cheers > >

Re: IO Error when using multiple HBaseStorage in PIG

2011-05-20 Thread Bill Graham
ence the second table never gets written to If this is the case, then the question becomes just how do you write to two tables in one MR job using TableOutputFormat? 1- http://search-hadoop.com/m/IsdwtMF2pV/HTable+reuse/v=plain On Fri, May 20, 2011 at 2:29 PM, Bill Graham wrote: > Yes, th

Re: IO Error when using multiple HBaseStorage in PIG

2011-05-20 Thread Bill Graham
Yes, that's what it seems. I've opened a Pig JIRA for it: https://issues.apache.org/jira/browse/PIG-2085 On Thu, May 19, 2011 at 1:31 PM, Jean-Daniel Cryans wrote: > Your attachement didn't make it, it rarely does on the mailing lists. > I suggest you use a gist.github or a pastebin. > > Regardi

Re: any static column name behavior in hbase? (ie. not storing column name per row)

2011-05-11 Thread Bill Graham
HBase will always need to store the column name in each cell that uses it. The only way to reduce the size taken by storing repeated column names (besides using compression) is to instead store a small pointer to a lookup table that holds the column name. Check out OpenTSDB, which does something si

Re: MapReduce job reading directly from the HBase files in HDFS

2011-05-06 Thread Bill Graham
One big reason is that there will be updates in the memory store that aren't yet written to HFiles. You'll miss these. On Fri, May 6, 2011 at 12:27 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Is there an issue open or any particular reason that an MR job needs to > access > the HB

Re: What is the recommended way to get pig 0.8 to talk with CDH3u0 HBase

2011-04-24 Thread Bill Graham
I had this issue and had to add the HBase conf dir to HADOOP_CLASSPATH in conf/hadoop-env.sh on each of the nodes in the cluster so they could find Zookeeper. On Sun, Apr 24, 2011 at 1:04 PM, Dmitriy Ryaboy wrote: > I suspect the problem here is that you don't have your hbase config > directory

Re: HBase - Map Reduce - Client Question

2011-04-19 Thread Bill Graham
We've been using pig to read bulk data from hdfs, transform it and load it into HBase using the HBaseStorage class, which has worked well for us. If you try it out you'll want to build from the 0.9.0 branch (being cut as we speak I beleive) or the trunk. There's an open pig JIRA with a patch to dis

Re: HBase is not ready for Primetime

2011-04-12 Thread Bill Graham
Agreed. I've seen similar issues when upon startup where for whatever reason an hlog (often empty) can't be read, which hangs the startup process. Manually deleting it from HDFS clears the issue. On Tue, Apr 12, 2011 at 10:01 AM, Jinsong Hu wrote: > You probably should stop all master/regionserv

Re: Tips on pre-splitting

2011-03-29 Thread Bill Graham
ay since all you want are the > keys. > > On Tue, Mar 29, 2011 at 2:15 PM, Bill Graham wrote: >> >> 1. use Pig to read in our datasets, join/filter/transform/etc before >> writing the output back to HDFS with N reducers ordered by key, where >> N is the number of

Re: Tips on pre-splitting

2011-03-29 Thread Bill Graham
g. > Assuming reducer output file is SequenceFile, steps 2 and 3 can be > automated. > > On Tue, Mar 29, 2011 at 2:15 PM, Bill Graham wrote: >> >> I've been thinking about this topic lately so I'll fork from another >> discussion to ask if anyone has a good app

region in a bad state - how to manually fix

2011-03-29 Thread Bill Graham
Hi, We have an empty table that is somehow in a bad state that I'm unable to disable or drop. We're running 0.90.0 on CDH3b2. Is there a way that I can manually remove this table from HBase without making a mess of things? The table has 2 CFs and it's empty. When I do a scan I get this: org.apac

Tips on pre-splitting

2011-03-29 Thread Bill Graham
I've been thinking about this topic lately so I'll fork from another discussion to ask if anyone has a good approach to determining keys for pre-splitting from a known dataset. We have a key scenario similar to what Ted describes below. We periodically run MR jobs to transform and bulk load data f

Re: Row Counters

2011-03-16 Thread Bill Graham
Back to the issue of keeping a count, I've often wondered if this would be easy to do without much cost at compaction time? It of course wouldn't be a true real-time total but something like a compactedRowCount. It could be a useful metric to expose via JMX to get a feel for growth over time. On

Re: Data is always written to one node

2011-03-14 Thread Bill Graham
On Mon, Mar 14, 2011 at 8:54 PM, Stack wrote: > On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham wrote: >> Anyway, it's been about a week and all regions for the table are still >> on 1 node. I see messages like this in the logs every 5 minutes: >> >&

Re: Data is always written to one node

2011-03-14 Thread Bill Graham
I hope I'm not hijacking the thread but I'm seeing what I think is a similar issue. About a week ago I loaded a bunch of data into a newly created table. It took about an hour and resulted in 12 regions being created on a single node. (Afterwards I remembered a conversation with JD where he describ

Re: intersection of row ids

2011-03-11 Thread Bill Graham
You could also do this with MR easily using Pig's HBaseStorage and either an inner join or an outer join with a filter on null, depending on if you want matches or misses, respectively. On Fri, Mar 11, 2011 at 4:25 PM, Usman Waheed wrote: > I suggest it to be ROWCOL because you have many columns

Re: stop-hbase.sh bug or feature?

2011-03-04 Thread Bill Graham
For example: > > rs=$(cat ${HBASE_CONF_DIR}/regionservers | xargs) > > if ( whiptail --yesno "Do you want to shutdown the cluster with the > following regionserver $rs\n[y/n]" 10 40 ) > then >        # proceed with the shutdown > else >        # exit > fi > >

Re: Questions about HBase Cluster Replication

2011-03-03 Thread Bill Graham
Actually, how far behind replication is w.r.t. edit logs is different than how out of sync they are, but you get the idea. On Thu, Mar 3, 2011 at 9:07 AM, Bill Graham wrote: > One more question for the FAQ: > > 6. Is it possible for an admin to tell just how out of sync the two >

Re: Questions about HBase Cluster Replication

2011-03-03 Thread Bill Graham
One more question for the FAQ: 6. Is it possible for an admin to tell just how out of sync the two clusters are? Something like Seconds_Behind_Master in MySQL's SHOW SLAVE STATUS? On Wed, Mar 2, 2011 at 9:32 PM, Jean-Daniel Cryans wrote: > Although, I would add that this feature is still experi

Re: min, max

2011-03-03 Thread Bill Graham
This first region starts with an empty byte[] and the last region ends with one. Those in between have non-empy byte[]s to specify their boundaries. On Thu, Mar 3, 2011 at 7:18 AM, Weishung Chung wrote: > Thanks, Stack! > > Got a few more questions. > > Does every region start with an empty byte[

Re: stop-hbase.sh bug or feature?

2011-03-03 Thread Bill Graham
cause stopping the master is done by starting a VM and running oahh.master.HMaster stop with the remote configs. This seems to happen regardless of whether there's a match on a local pid file with a local master process. On Wed, Mar 2, 2011 at 8:57 PM, Stack wrote: > On Wed, Mar 2, 2011 at

stop-hbase.sh bug or feature?

2011-03-02 Thread Bill Graham
Hi, We had a troubling experience today that I wanted to share. Our dev cluster got completely shut down by a developer by mistake, without said developer even realizing it. Here's how... We have multiple sets of HBase configs checked into SVN that developers can checkout and point their HBASE_CO

Re: Posting with email to HBase User bounces today

2011-02-24 Thread Bill Graham
This happens often on apache lists when sending emails as HTML. Try sending as plain text instead. On Thu, Feb 24, 2011 at 10:35 AM, Mark Kerzner wrote: > > I keep getting this error: > > Delivery to the following recipient failed permanently: > >    user@hbase.apache.org > > Technical details of

Re: FilterList not working as expected

2011-02-18 Thread Bill Graham
Just to follow up, this appears to be a bug. I've created a JIRA. https://issues.apache.org/jira/browse/HBASE-3550 On Fri, Feb 18, 2011 at 10:57 AM, Bill Graham wrote: > Hi, > > I'm unable to get ColumnPrefixFilter working when I use it in a > FilterList and I'm wond

FilterList not working as expected

2011-02-18 Thread Bill Graham
Hi, I'm unable to get ColumnPrefixFilter working when I use it in a FilterList and I'm wondering if this is a bug or a mis-usage on my part. If I set ColumnPrefixFilter directly on the Scan object all works fine. The following code shows an example of scanning a table with a column descriptor 'inf

Re: multiple masters

2011-01-28 Thread Bill Graham
Thanks Stack, this is really helpful. On Fri, Jan 28, 2011 at 2:06 PM, Stack wrote: > On Fri, Jan 28, 2011 at 1:15 PM, Bill Graham wrote: >> I also don't have a solid understanding of the responsibilities of >> master, but it seems like it's job is really about

Re: multiple masters

2011-01-28 Thread Bill Graham
I also don't have a solid understanding of the responsibilities of master, but it seems like it's job is really about managing regions (i.e., coordinating splits and compactions, etc.) and updating ROOT and META. Is that correct? On Fri, Jan 28, 2011 at 9:31 AM, Weishung Chung wrote: > Great, th

HBaseStorage feature review

2011-01-27 Thread Bill Graham
Hello all, I'm working on a patch to HBaseStorage to support additional functionality with respect to how columns are specified and how HBase data is converted into Pig data structures. If you use Pig to read HBase data, please take a look at this JIRA and provide feedback if you have it: https:/

Re: question about long column family name and column name

2011-01-26 Thread Bill Graham
I can't say from experience, but here's a thread that implies that shorter column names are better. http://search-hadoop.com/m/oWZQd161GI22 On Tue, Jan 25, 2011 at 11:14 PM, JinChao Wen wrote: > Hi all, > > If there are lots of  very long column family name and column name in my > table,  is the

Re: Region is not online: -ROOT-,,0

2011-01-25 Thread Bill Graham
Thanks for the comments. Attached is the log file from the master after the restart. The last error message was repeated every second. See comments below. On Tue, Jan 25, 2011 at 7:20 PM, Stack wrote: > On Tue, Jan 25, 2011 at 3:27 PM, Bill Graham wrote: >> Hi, >> >> A

Re: Region is not online: -ROOT-,,0

2011-01-25 Thread Bill Graham
A. On Tue, Jan 25, 2011 at 3:27 PM, Bill Graham wrote: > Hi, > > A developer on our team created a table today and something failed and > we fell back into the dire scenario we were in earlier this week. When > I got on the scene 2 of our 4 regions had crashed. When I brought them &

Region is not online: -ROOT-,,0

2011-01-25 Thread Bill Graham
Hi, A developer on our team created a table today and something failed and we fell back into the dire scenario we were in earlier this week. When I got on the scene 2 of our 4 regions had crashed. When I brought them back up, they wouldn't come online and the master was scrolling messages like tho

Re: LZO Codec not found

2011-01-25 Thread Bill Graham
This wiki shows how to build the lzo jar: http://wiki.apache.org/hadoop/UsingLzoCompression You'll get that exception if the jar is not found in lib/. On Tue, Jan 25, 2011 at 10:38 AM, Peter Haidinyak wrote: > Hi >        I am using HBase version .89.20100924+28 and Hadoop version 0.20.2+737 >

Re: Lost .META., lost tables

2011-01-23 Thread Bill Graham
the new table without writing a MR job? If it's easy to save the data I will, but I can survive without it. More comments below. thanks, Bill On Sun, Jan 23, 2011 at 11:16 AM, Stack wrote: > On Sat, Jan 22, 2011 at 10:27 AM, Bill Graham wrote: >> Hi, >> >> Last n

Lost .META., lost tables

2011-01-22 Thread Bill Graham
Hi, Last night while experimenting with getting lzo set up I managed to somehow lose all .META. data and all my tables. My regions still exist in HDFS, but the shell tells me I have no tables. At this point I'm pretty sure I need to reinstall HBase clean-slate on HDFS, hence losing all data, but I

Re: delete using server's timestamp

2011-01-21 Thread Bill Graham
hen so does the row key. > > I hope this helps, > -ryan > > On Fri, Jan 21, 2011 at 3:26 PM, Bill Graham wrote: >> I follow the tombstone/compact/delete cycle of the column values, but >> I'm still unclear of the row key life cycle. >> >> Is it that the bytes t

Re: delete using server's timestamp

2011-01-21 Thread Bill Graham
d they disappear > for good.  (Barring other backups of course) > > Because of our variable length storage model, we dont store rows in > particular blocks and rewrite said blocks, so notions of rows > 'existing' or not, don't event apply to HBase as they do to RDBMS >

Re: delete using server's timestamp

2011-01-21 Thread Bill Graham
If you use some combination of delete requests and leave a row without any column data will the row/rowkey still exist? I'm thinking of the use case where you want to prune all old data, including row keys, from a table. On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson wrote: > There are 3 kinds of

Re: HBase fails to start - DataXceiver Version Mismatch

2011-01-10 Thread Bill Graham
is to happen. Very strange. I moved that directory out of the way and the master started fine. I can debug how the classpath gets set, but any ideas? On Mon, Jan 10, 2011 at 10:29 PM, Stack wrote: > For sure you removed the old hadoop from hbase/lib? > > On Mon, Jan 10, 2011 at 10:12 P

Re: HBase fails to start - DataXceiver Version Mismatch

2011-01-10 Thread Bill Graham
directory with the same version from your HDFS install. > -Todd > > On Mon, Jan 10, 2011 at 9:45 PM, Bill Graham wrote: >> >> Hi, >> >> Today I upgraded from Hadoop 0.20.1 to CHD3b2 0.20.2 to get the append >> functionality that HBase requires and now I can

HBase fails to start - DataXceiver Version Mismatch

2011-01-10 Thread Bill Graham
Hi, Today I upgraded from Hadoop 0.20.1 to CHD3b2 0.20.2 to get the append functionality that HBase requires and now I can't start HBase. Hadoop and HDFS seem to be working just fine, but when I start up the HBase master, I get this error in the NNs: 2011-01-10 21:20:36,134 ERROR org.apache.hadoo

Re: Scheduling map/reduce jobs

2011-01-05 Thread Bill Graham
Take a look at Oozie or Azkaban: http://www.quora.com/What-are-the-differences-advantages-disadvantages-of-Azkaban-vs-Oozie On Wed, Jan 5, 2011 at 9:35 AM, Peter Veentjer wrote: > He Guys, > > although it isn't completely related to HBase. Is there support for > scheduling map reduce jobs? > > E

Re: provide a 0.20-append tarball?

2010-12-21 Thread Bill Graham
Hi Andrew, Just to make sure I'm clear, are you saying that HBase 0.90.0 is incompatible with CDH3b3 due to the security changes? We're just getting going with HBase and have been running 0.90.0rc1 on an un-patched version of Hadoop in dev. We were planning on upgrading to CDH3b3 to get the sync