Re: Best practices in ID generation?

2009-07-08 Thread Bryan Duxbury
Not necessarily in context of hbase, but Rapleaf uses UUIDs/GUIDs, since they are crazy fast to generate and have no dependencies on external resources. In the context of hbase, a benefit of UUIDs is that they will be randomly distributed over your whole table, instead of consistently

Rapleaf hosting Hadoop tech discussion and presentations

2008-10-08 Thread Bryan Duxbury
Hello all, We're hosting a tech discussion on Hadoop at our Rapleaf office on October 21st at 6:45PM. This is an exclusive event and there are only a limited number of spots available. More details and registration at http://rapleafhadoop.eventbrite.com/ Topics include: - The Collector

Re: hbase as an embedded.

2008-06-23 Thread Bryan Duxbury
Certainly, it would be possible to make a version of HBase that could be embedded. At the moment, however, there would be an immense number of things that would have to change for embedding to work. -Bryan On Jun 23, 2008, at 6:43 AM, Shiraz Memon wrote: Hi, Thanks for your answer. Yes

Re: Migrate file system

2008-06-04 Thread Bryan Duxbury
This is a non-critical message. Your HBase should be fine otherwise. -Bryan On Jun 4, 2008, at 6:39 AM, Rohana Rajapakse wrote: Thanks a lot Daniel. Now it goes much further and starts hbase, but I am unable to create tables. I have noticed a warning in the log file that says: 2008-06-04

Re: querying by column value rather than key value

2008-05-29 Thread Bryan Duxbury
regions of the table - could this be accomplished ? On Thu, May 29, 2008 at 5:42 PM, Bryan Duxbury [EMAIL PROTECTED] wrote: In a regular hbase table, there is no most efficient way to do this. The only operation you have available is a table scan. If you find yourself looking for rows

Re: [ANN] hbase-0.1.2 release candidate 1 (releases are zero-based)

2008-05-13 Thread Bryan Duxbury
I downloaded the release and ran the unit tests successfully. +1 -Bryan On May 7, 2008, at 5:03 PM, stack wrote: The second 0.1.2 release candidate is available for download: http://people.apache.org/~stack/hbase-0.1.2-candidate-1/ Please take this release candidate for a spin. Report back

Re: [hbase-user] Secure, authenticated remote hbase api access

2008-05-13 Thread Bryan Duxbury
I know that there was talk on the Thrift mailing list of a SSL wrapper for the socket transport. If that exists, and it's in Java, it would be an easy addition and give you the secure side of things. As far as authentication: as you're probably aware, at this point, we don't really have

Re: Blog post about when to use HBase

2008-05-13 Thread Bryan Duxbury
the data into HDFS and processing it with MapReduce. Thanks, Naama On Wed, Mar 12, 2008 at 12:15 AM, Bryan Duxbury [EMAIL PROTECTED] wrote: I've written up a blog post discussing when I think it's appropriate to use HBase in response to some of the questions people usually ask. You can

Re: Scanner

2008-05-11 Thread Bryan Duxbury
The way to do parallel scanning is with a map/reduce job and TableInputFormat. This does all the work of parallelizing the scan, as well as whatever work you were doing. -Bryan On May 10, 2008, at 1:49 PM, Daniel Leffel wrote: Is there a parallel scanner (I didn't see it in the documents)?

Re: Is HBase suitable for ...

2008-04-28 Thread Bryan Duxbury
My replies and questions inline. On Apr 28, 2008, at 2:57 PM, Max Grigoriev wrote: Hi there, I'm making research to find right solution for our needs. We need persistent layer for groups of social network. These groups will have big amount of data ( ~100 GB) - users profiles, their

Re: Secondary indexes

2008-04-22 Thread Bryan Duxbury
This doesn't have to be all that complicated. Why not keep another HBase table as the index? The keys would be the column values. There'd be a single matchingRows column family, and the qualifiers and values would be the rows that match that column. Then, when you want to scan in column

Re: Secondary indexes

2008-04-22 Thread Bryan Duxbury
. Whereas if we have one SortedMap per Region, then I can quickly narrow down to (hopefully) a few regions based on key prefix. Though other usage / table loading patterns would surely benefit from this approach... On Tue, Apr 22, 2008 at 12:31 PM, Bryan Duxbury [EMAIL PROTECTED] wrote: This doesn't

Re: Regions Offline

2008-04-17 Thread Bryan Duxbury
For starters, what version of HBase/Hadoop are you using? There was a pretty bad bug in 0.16 that would cause regions to go offline and all sorts of other unexpected behavior. -Bryan On Apr 28, 2008, at 5:00 PM, David Alves wrote: Hi My system is quite simple: - two (one

Re: ID Service with HBase?

2008-04-16 Thread Bryan Duxbury
HBASE-493 was created, and seems similar. It's a write-if-not- modified-since. I would guess that you probably don't want to use HBase to maintain a distributed auto-increment. You need to think of some other approach the produces unique ids across concurrent access, like hash or GUID or

Re: Is the latest version of Hbase support multiple updates on same row at the same time?

2008-04-15 Thread Bryan Duxbury
Yes. Take a look at the BatchUpdate class in TRUNK. -Bryan On Apr 15, 2008, at 12:56 AM, Zhou wrote: Hi, Currently, I'm using the HBase version inside Hadoop 0.16.0 package I access HBase with a multi-threaded application. It appears that only one update of a row could be in progress at a

Re: Save lists relating to records

2008-04-08 Thread Bryan Duxbury
I think the approach of using a column family for the list and a column for each element is the way to go. It seems to be the most HBase-y way to lay the schema out. You can of course use multiple tables if you want, but we have no joins of any kind implemented in HBase, so it'd be up to

Re: region offline after DELETE

2008-04-07 Thread Bryan Duxbury
Looks like an HQL bug to me. Would you mind filing a ticket with the contents of this email? -Bryan On Apr 7, 2008, at 9:59 AM, Michaela Buergle wrote: Hi there, I have made first contact with region offline. As I've seen that topic mentioned in some of the devel-threads I thought I'd

Re: Lost tables

2008-03-27 Thread Bryan Duxbury
The pattern of events as you list them is the correct way to bring up and down an HBase cluster. Is this being run on a single node or multiple machines? What command are you using to start HBase? (bin/start-hbase.sh is what I use) Is there anything interesting in the HBase logs for either

Re: Getting to work with HBase

2008-03-03 Thread Bryan Duxbury
, 2008 at 6:00 PM, Bryan Duxbury [EMAIL PROTECTED] wrote: If you can use the Java API, you should use that. It has the most functionality and the least overhead. HQL is only meant to be used for administrative purposes, like managing tables. Aside from the fact that there's no good way to hook

Re: how to - get last inserted row id

2008-02-25 Thread Bryan Duxbury
There is no such thing as last inserted row id in HBase, unless you can produce it from your own application. There's no autoincrement- style magic ID in HBase, just the row keys you inserted with. Also, the rows are already sorted by the key you used to write the rows in the first place.

Re: HQL plan for Hbase 0.2

2008-02-10 Thread Bryan Duxbury
= webserver; /* or hadoop */ qualifier = access_log; /* or task_tracker_log */ resultTable = access_log_table; /* or task_tracker_log_table */ } On 2/10/08, Bryan Duxbury [EMAIL PROTECTED] wrote: I added a few comments. -Bryan On Feb 9, 2008, at 1:39 PM, stack