Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-15 Thread Ryan Rawson
, 2010 at 4:11 PM, Michael Segel wrote: > > > >> Date: Mon, 15 Mar 2010 08:15:10 +0100 >> Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible >> column         names/keys? >> From: timrobertson...@gmail.com >> To: hbase-user@hadoop.apache.o

RE: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-15 Thread Michael Segel
> Date: Mon, 15 Mar 2010 08:15:10 +0100 > Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible column > names/keys? > From: timrobertson...@gmail.com > To: hbase-user@hadoop.apache.org > > Sure, understood. UUID aims to be globally uniq

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-15 Thread tsuna
On Mon, Mar 15, 2010 at 12:21 AM, Tim Robertson wrote: > How do you use incrementColumnValue > To generate a row key please? You need a "special" row to act as a counter. This row will typically contain only a single cell, which stores the counter. I like to use the row key { 0 } (a byte array

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-15 Thread Ryan Rawson
gt; >> >> Look, the benefits of using the UUID definitely outweigh wrapping your >> own >> >> solution in 8bytes, even in memory caches. >> >> (Are you only storing values that are 16 bytes in length, or something >> much >> >> larger?) >

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-15 Thread Tim Robertson
length, or something > much > >> larger?) > > > > > > The values are much much larger (100s - 1000s bytes) but they aren't > going > > in to any in-memory structures. > > > > > > > >> > Date: Sun, 14 Mar 2010 19:09:48 +0100 > &g

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-15 Thread Ryan Rawson
in length? Definitely >> not >> > > 'overkill' if all you want the key to do is to guarantee uniqueness. >> > > >> > > Very easy to generate and extremely easy to use. You can even hash it >> and >> > > create version 5 UUIDs. >

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-15 Thread Tim Robertson
tremely easy to use. You can even hash it > and > > > create version 5 UUIDs. > > > > > > I don't understand why you'd want to try and generate an 8 byte (you > said 8 > > > character, assuming you meant latin-1 characterset), when you have a >

RE: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-14 Thread Michael Segel
g your own solution in 8bytes, even in memory caches. (Are you only storing values that are 16 bytes in length, or something much larger?) > Date: Sun, 14 Mar 2010 19:09:48 +0100 > Subject: Re: UUID as key wuz: RE: worth choosing the shortest possible column > names/keys? &

Re: UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-14 Thread Tim Robertson
-1 characterset), when you have a > standard way of doing it already. 8 byte vs 16 byte? C'monreally? > > JMHO > > -Mike > > > Date: Sat, 13 Mar 2010 09:01:38 +0100 > > Subject: Re: worth choosing the shortest possible column names/keys? > > From: timro

UUID as key wuz: RE: worth choosing the shortest possible column names/keys?

2010-03-14 Thread Michael Segel
nd why you'd want to try and generate an 8 byte (you said 8 character, assuming you meant latin-1 characterset), when you have a standard way of doing it already. 8 byte vs 16 byte? C'monreally? JMHO -Mike > Date: Sat, 13 Mar 2010 09:01:38 +0100 > Subject: Re: worth choosin

Re: worth choosing the shortest possible column names/keys?

2010-03-14 Thread Tim Robertson
Thanks Stack and others from me also. On Sun, Mar 14, 2010 at 10:22 AM, TuX RaceR wrote: > Thank you guys for your answers. I'll map descriptive names to short name > too ;) > Cheers > TuX > > > Lars Francke wrote: > >> Will I save a lot of space (especially if I have many small columns)? >>> >

Re: worth choosing the shortest possible column names/keys?

2010-03-14 Thread TuX RaceR
Thank you guys for your answers. I'll map descriptive names to short name too ;) Cheers TuX Lars Francke wrote: Will I save a lot of space (especially if I have many small columns)? I don't have any hard numbers for you but I tested it and I remember that on a dataset of about 10-20 GB I

Re: worth choosing the shortest possible column names/keys?

2010-03-13 Thread Stack
You looked at the murmurhash implementation that is in hbase Tim? It has good characteristics -- faster than jenkins and 32bit or 64bit product. See http://sites.google.com/site/murmurhash/. Convertion to java was done by Andrzej. Way cheaper than UUID'ing and much smaller. St.Ack On Sat, Ma

Re: worth choosing the shortest possible column names/keys?

2010-03-13 Thread Tim Robertson
Along similar lines... (sorry for hijacking thread) I assume that this is even more applicable for key choice given the way keys participate in indexes? I have been using UUID, but it is way overkill for my needs. What are others using? Is there convenient way of doing (e.g.) 8 characters strin

Re: worth choosing the shortest possible column names/keys?

2010-03-12 Thread Kay Kay
Some of our current experiences go along similiar lines , where we saw a ~20-30% of ram savings by using abbreviations in the key space. But the biggest advantage came actually with defining the right schema and column families, as per the query pattern of the jobs. We keep the column families

Re: worth choosing the shortest possible column names/keys?

2010-03-12 Thread Lars Francke
> Will I save a lot of space (especially if I have many small columns)? I don't have any hard numbers for you but I tested it and I remember that on a dataset of about 10-20 GB I could save about 200-500 MB (this was with compression enabled) just by not using descriptive sting qualifiers that wer

worth choosing the shortest possible column names/keys?

2010-03-12 Thread TuX RaceR
Hello Hbase Users List, In the SQL world, you can choose column names that clearly describe a field (i.e. long names) I believe it is different in Hbase. Is it worth choosing the shortest possible column names and keys ie: c1234:fn:John,ln:Doe intead of customer_1234:FirstName:John,LastName: