Not necessarily in context of hbase, but Rapleaf uses UUIDs/GUIDs,
since they are crazy fast to generate and have no dependencies on
external resources.
In the context of hbase, a benefit of UUIDs is that they will be
randomly distributed over your whole table, instead of consistently
Hello all,
We're hosting a tech discussion on Hadoop at our Rapleaf office on
October 21st at 6:45PM. This is an exclusive event and there are only
a limited number of spots available. More details and registration at
http://rapleafhadoop.eventbrite.com/
Topics include:
- The Collector
Certainly, it would be possible to make a version of HBase that could
be embedded. At the moment, however, there would be an immense number
of things that would have to change for embedding to work.
-Bryan
On Jun 23, 2008, at 6:43 AM, Shiraz Memon wrote:
Hi,
Thanks for your answer.
Yes
This is a non-critical message. Your HBase should be fine otherwise.
-Bryan
On Jun 4, 2008, at 6:39 AM, Rohana Rajapakse wrote:
Thanks a lot Daniel. Now it goes much further and starts hbase, but
I am
unable to create tables. I have noticed a warning in the log file
that says:
2008-06-04
regions of the table - could this be
accomplished ?
On Thu, May 29, 2008 at 5:42 PM, Bryan Duxbury [EMAIL PROTECTED]
wrote:
In a regular hbase table, there is no most efficient way to do
this. The
only operation you have available is a table scan.
If you find yourself looking for rows
I downloaded the release and ran the unit tests successfully. +1
-Bryan
On May 7, 2008, at 5:03 PM, stack wrote:
The second 0.1.2 release candidate is available for download:
http://people.apache.org/~stack/hbase-0.1.2-candidate-1/
Please take this release candidate for a spin. Report back
I know that there was talk on the Thrift mailing list of a SSL
wrapper for the socket transport. If that exists, and it's in Java,
it would be an easy addition and give you the secure side of things.
As far as authentication: as you're probably aware, at this point, we
don't really have
the data into HDFS and
processing it
with MapReduce.
Thanks, Naama
On Wed, Mar 12, 2008 at 12:15 AM, Bryan Duxbury [EMAIL PROTECTED]
wrote:
I've written up a blog post discussing when I think it's
appropriate to
use HBase in response to some of the questions people usually ask.
You can
The way to do parallel scanning is with a map/reduce job and
TableInputFormat. This does all the work of parallelizing the scan,
as well as whatever work you were doing.
-Bryan
On May 10, 2008, at 1:49 PM, Daniel Leffel wrote:
Is there a parallel scanner (I didn't see it in the documents)?
My replies and questions inline.
On Apr 28, 2008, at 2:57 PM, Max Grigoriev wrote:
Hi there,
I'm making research to find right solution for our needs.
We need persistent layer for groups of social network.
These groups will have big amount of data ( ~100 GB) - users
profiles, their
This doesn't have to be all that complicated.
Why not keep another HBase table as the index? The keys would be the
column values. There'd be a single matchingRows column family, and
the qualifiers and values would be the rows that match that column.
Then, when you want to scan in column
.
Whereas if we have one SortedMap per Region, then I can quickly narrow
down to (hopefully) a few regions based on key prefix.
Though other usage / table loading patterns would surely benefit from
this approach...
On Tue, Apr 22, 2008 at 12:31 PM, Bryan Duxbury [EMAIL PROTECTED]
wrote:
This doesn't
For starters, what version of HBase/Hadoop are you using? There was a
pretty bad bug in 0.16 that would cause regions to go offline and all
sorts of other unexpected behavior.
-Bryan
On Apr 28, 2008, at 5:00 PM, David Alves wrote:
Hi
My system is quite simple:
- two (one
HBASE-493 was created, and seems similar. It's a write-if-not-
modified-since.
I would guess that you probably don't want to use HBase to maintain a
distributed auto-increment. You need to think of some other approach
the produces unique ids across concurrent access, like hash or GUID
or
Yes. Take a look at the BatchUpdate class in TRUNK.
-Bryan
On Apr 15, 2008, at 12:56 AM, Zhou wrote:
Hi,
Currently, I'm using the HBase version inside Hadoop 0.16.0 package
I access HBase with a multi-threaded application.
It appears that only one update of a row could be in progress at a
I think the approach of using a column family for the list and a
column for each element is the way to go. It seems to be the most
HBase-y way to lay the schema out.
You can of course use multiple tables if you want, but we have no
joins of any kind implemented in HBase, so it'd be up to
Looks like an HQL bug to me. Would you mind filing a ticket with the
contents of this email?
-Bryan
On Apr 7, 2008, at 9:59 AM, Michaela Buergle wrote:
Hi there,
I have made first contact with region offline. As I've seen that
topic
mentioned in some of the devel-threads I thought I'd
The pattern of events as you list them is the correct way to bring up
and down an HBase cluster.
Is this being run on a single node or multiple machines? What command
are you using to start HBase? (bin/start-hbase.sh is what I use) Is
there anything interesting in the HBase logs for either
, 2008 at 6:00 PM, Bryan Duxbury [EMAIL PROTECTED]
wrote:
If you can use the Java API, you should use that. It has the most
functionality and the least overhead.
HQL is only meant to be used for administrative purposes, like
managing tables. Aside from the fact that there's no good way to hook
There is no such thing as last inserted row id in HBase, unless you
can produce it from your own application. There's no autoincrement-
style magic ID in HBase, just the row keys you inserted with.
Also, the rows are already sorted by the key you used to write the
rows in the first place.
= webserver; /* or hadoop */
qualifier = access_log; /* or task_tracker_log */
resultTable = access_log_table; /* or
task_tracker_log_table */
}
On 2/10/08, Bryan Duxbury [EMAIL PROTECTED] wrote:
I added a few comments.
-Bryan
On Feb 9, 2008, at 1:39 PM, stack
21 matches
Mail list logo