Re: HBase Developer's Pow-wow.

2012-09-10 Thread Matt Corgan
Jacques - i'll be there tomorrow. Look forward to talking. Some comments before then: - How to maintain consistency (maybe this is unimportant?) Not unimportant at all. In fact, I picture the whole secondary index conversation as a lower level goal of supporting consistent cross-region updates

RE: HBase Developer's Pow-wow.

2012-09-10 Thread Ramkrishna.S.Vasudevan
Hi Yes, a separate index table along with the main table and the master should ensure that the regions of both tables are collocated during assignments. The regions in index table can be same as that of the main table in the sense that both should have the same start and endkeys. Different ind

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Andrew Purtell
Regarding this: On Mon, Sep 10, 2012 at 12:13 PM, Matt Corgan wrote: > 1) Per-region or Per-table [...] > 1) > - Per-region: the index entries are stored on the same machine as the > primary rows > - Per-table: each index is stored in a separate table, requiring > cross-server consistency LarsH

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Jacques
On Mon, Sep 10, 2012 at 6:20 PM, Matt Corgan wrote: > ... snipping lots of helpful use cases... It seems like portions of what you discussed would probably be nominally impacted by indexes while other would be very impacted. Also seems like compound-qualifier indexing would potentially be of i

Re: Hbase Assignments in trunk.

2012-09-10 Thread Enis Söztutar
+1 on rethinking the assignment + splitting code paths, and using zk as a transactional database. Just my 2 cents w/o spending a lot of time on the details, but maybe we should stop keeping master and RS in memory metadata, but keep region-assignments in zk, and HM and RS just keep a consistent in-

Re: checkAndPut fails when using lock - HBASE-6725

2012-09-10 Thread lars hofhansl
Client controlled locks are (somewhat) broken in HBase. For example these locks will not survive a split or a move of a region to a different region server. We had a thread about this a while ago. My comment then was to deprecate client driven locks altogether. As for this specific issue, shoul

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Matt Corgan
One sparse use case for us is rate limit detection. We store user events in an Event table whose primary key is a unique timestamp (sharded to avoid hotspotting) and which has eventType and ipAddress columns. We manually keep a separate table (the index, also sharded) called EventByDateIpType wit

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Devaraj Das
Guys, if you want to join the Pow-Wow over phone, here are the details: Phone: 1 (605) 475-6700 Access code: 232-8385 See you all at Hortonworks. On Wed, Aug 29, 2012 at 9:21 PM, Ramkrishna.S.Vasudevan wrote: > > I would be interested on this may be for the Secondary index related > discussion.

Re: Please welcome our newest committer, the Mighty Gregory Chanan!

2012-09-10 Thread Gary Helmling
Welcome, Gregory! Congrats. On Mon, Sep 10, 2012 at 3:32 PM, lars hofhansl wrote: > Welcome Gregory! > > > > > - Original Message - > From: Stack > To: HBase Dev List > Cc: > Sent: Friday, September 7, 2012 2:30 PM > Subject: Please welcome our newest committer, the Mighty Gregory Cha

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Jacques
> > All of my use-cases would require Per-table indexes. Per-region is easier > to keep consistent at write-time, but is seems useless to me for the large > tables that hbase is designed for (because you have to hit every region for > each read). > Can you expound on use cases? The pros and cons

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Matt Corgan
Can indexing be boiled down to these questions to start? 1) Per-region or Per-table 2) Sync or Async 3) Client-managed or Server-managed 4) Schema or Schema-less Definitions: 1) - Per-region: the index entries are stored on the same machine as the primary rows - Per-table: each index is stored i

Re: HBase Developer's Pow-wow.

2012-09-10 Thread lars hofhansl
I'm back from the woods (and yes, I'm already reading the dev list, sigh) :) I'll be back at work tomorrow, but I might have to tie some other knots first.Let's see. I'd also be interested to join the talk about 2ndary indexing. In addition I can talk a bit about - the profiling I did, and may

Re: Please welcome our newest committer, the Mighty Gregory Chanan!

2012-09-10 Thread lars hofhansl
Welcome Gregory! - Original Message - From: Stack To: HBase Dev List Cc: Sent: Friday, September 7, 2012 2:30 PM Subject: Please welcome our newest committer, the Mighty Gregory Chanan! ... or Mr Fine Tooth Comb as I like to call him. In the HBase code base Gregory has scaled heigh

Re: Hbase Assignments in trunk.

2012-09-10 Thread lars hofhansl
I've been saying a while ago that we should require ZK 3.4.x for 0.96+. Distributed consensus without a "transaction" option always rang a bit weird to me. Maybe switch in 0.98+? -- Lars - Original Message - From: n keywal To: dev@hbase.apache.org Cc: Sent: Thursday, September 6, 20

checkAndPut fails when using lock - HBASE-6725

2012-09-10 Thread Nicolas Thiébaud
Hello devs, When calling checkAndPut concurrently with a previously held lock, several client threads may successfully mutate the value. This is due to checkAndMutate that reuses the lock provided by the client (even if the lock isn't on the mutated row) although several requests may be racing wit

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Jacques
> > > The use cases considered, at least over here at TM, all come down to > range scanning over values (e.g. WHERE INTEGER($value) < 50). So we > need a mapping such that a scan over the index returns either lists of > pointers to row:family:qualifier, or the value itself embedded in the > index,

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Jacques
See below On Mon, Sep 10, 2012 at 10:51 AM, Ted Yu wrote: > Jacques: > Thanks for your sharing. > > bq. row-level sharding as opposed to term > > Please elaborate on the above a little more: what is term sharding ? > If an index is basically a value (or term) pointing back to a row, there are t

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Andrew Purtell
On Mon, Sep 10, 2012 at 12:03 AM, Jacques wrote: >- How important is indexing column qualifiers themselves (similar to >Cassandra where people frequently utilize column qualifiers as "values" >with no actual values stored)? It would be good to have a secondary indexing option that can

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Andrew Purtell
Hi Jaques, > Does family level indexing make sense or is the real need for qualifier > level indexing? The use cases considered, at least over here at TM, all come down to range scanning over values (e.g. WHERE INTEGER($value) < 50). So we need a mapping such that a scan over the index returns ei

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Ted Yu
Jacques: Thanks for your sharing. bq. row-level sharding as opposed to term Please elaborate on the above a little more: what is term sharding ? bq. for what I will call a "local shadow family" I like this idea. User may request more than one index. Currently HBase is not so good at serving hig

Jenkins build is back to normal : HBase-0.92 #575

2012-09-10 Thread Apache Jenkins Server
See

Build failed in Jenkins: HBase-TRUNK-on-Hadoop-2.0.0 #168

2012-09-10 Thread Apache Jenkins Server
See Changes: [nkeywal] HBASE-6746 Impacts of HBASE-6435 vs. HDFS 2.0 trunk [jmhsieh] HBASE-5631 ADDENDUM (extra comments) -- [...truncated 12925 lines...] Forking command line: /bin/s

Build failed in Jenkins: HBase-TRUNK #3320

2012-09-10 Thread Apache Jenkins Server
See Changes: [nkeywal] HBASE-6746 Impacts of HBASE-6435 vs. HDFS 2.0 trunk -- [...truncated 2631 lines...] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.566 sec Running org.apach

Build failed in Jenkins: HBase-0.92 #574

2012-09-10 Thread Apache Jenkins Server
See -- [...truncated 1801 lines...] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.529 sec Running org.apache.hadoop.hbase.regionserver.wal.TestWALActionsListener Tests run: 1, Failures: 0,

Re: Please welcome our newest committer, the Mighty Gregory Chanan!

2012-09-10 Thread Lars George
Awesome, congrats Greg! On Sep 7, 2012, at 11:30 PM, Stack wrote: > ... or Mr Fine Tooth Comb as I like to call him. > > In the HBase code base Gregory has scaled heights -- check out how > much work he has done in hbase already -- but mostly he has been down > plumbing the depths converting al

Re: HBase Developer's Pow-wow.

2012-09-10 Thread Jacques
more food for thought on secondary indexing... *Additional questions*: - How important is indexing column qualifiers themselves (similar to Cassandra where people frequently utilize column qualifiers as "values" with no actual values stored)? - How important is indexing cell timestamp