Scan vs Put vs Get

2012-06-27 Thread Jean-Marc Spaggiari
Hi, I have a small piece of code, for testing, which is putting 1B lines in an existing table, getting 3000 lines and scanning 1. The table is one family, one column. Everything is done randomly. Put with Random key (24 bytes), fixed family and fixed column names with random content (24 byte

RE: Scan vs Put vs Get

2012-06-27 Thread Anoop Sam John
...@spaggiari.org] Sent: Thursday, June 28, 2012 5:04 AM To: user Subject: Scan vs Put vs Get Hi, I have a small piece of code, for testing, which is putting 1B lines in an existing table, getting 3000 lines and scanning 1. The table is one family, one column. Everything is done randomly. Put

Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal
> From: Jean-Marc Spaggiari [jean-m...@spaggiari.org] > Sent: Thursday, June 28, 2012 5:04 AM > To: user > Subject: Scan vs Put vs Get > > Hi, > > I have a small piece of code, for testing, which is putting 1B lines > in an existing table, getting 3000 lines and scannin

RE: Scan vs Put vs Get

2012-06-28 Thread Ramkrishna.S.Vasudevan
, June 28, 2012 2:00 PM > To: user@hbase.apache.org > Subject: Re: Scan vs Put vs Get > > Hi Jean-Marc, > > Interesting :-) > > Added to Anoop questions: > > What's the hbase version you're using? > > Is it repeatable, I mean if you try twice the

Re: Scan vs Put vs Get

2012-06-28 Thread Jean-Marc Spaggiari
// co BatchExample-9-Dump Print all results. } 2012/6/28, Ramkrishna.S.Vasudevan : > Hi > > You can also check the cache hit and cache miss statistics that appears on > the UI? > > In your random scan how many Regions are scanned whereas in gets may be > many > due to

RE: Scan vs Put vs Get

2012-06-28 Thread Ramkrishna.S.Vasudevan
> Sent: Thursday, June 28, 2012 4:44 PM > To: user@hbase.apache.org > Subject: Re: Scan vs Put vs Get > > Wow. First, thanks a lot all for jumping into this. > > Let me try to reply to everyone in a single post. > > > How many Gets you batch together in one call > I tr

Re: Scan vs Put vs Get

2012-06-28 Thread Jean-Marc Spaggiari
[mailto:jean-m...@spaggiari.org] >> Sent: Thursday, June 28, 2012 4:44 PM >> To: user@hbase.apache.org >> Subject: Re: Scan vs Put vs Get >> >> Wow. First, thanks a lot all for jumping into this. >> >> Let me try to reply to everyone in a single post. >

RE: Scan vs Put vs Get

2012-06-28 Thread Anoop Sam John
type for your CF HColumnDescriptor#setBloomFilterType() U can check with type BloomType.ROW -Anoop- _ From: Jean-Marc Spaggiari [jean-m...@spaggiari.org] Sent: Thursday, June 28, 2012 5:42 PM To: user@hbase.apache.org Subject: Re: Scan vs Put vs Get Oh! I ne

Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal
lue)results[i]).isEmptyColumn()) >                        System.out.println("Result[" + i + "]: " + > results[i]); // co > BatchExample-9-Dump Print all results. > } > > 2012/6/28, Ramkrishna.S.Vasudevan : >> Hi >> >> You can also check the cac

Re: Scan vs Put vs Get

2012-06-28 Thread Jean-Marc Spaggiari
tor#setBloomFilterType() U can check with type > BloomType.ROW > > -Anoop- > > _ > From: Jean-Marc Spaggiari [jean-m...@spaggiari.org] > Sent: Thursday, June 28, 2012 5:42 PM > To: user@hbase.apache.org > Subject: Re: Scan vs Put vs Get > &g

Re: Scan vs Put vs Get

2012-06-28 Thread Jean-Marc Spaggiari
true . This will enable the usage of >> bloom globally >> Now you need to set the bloom type for your CF >> HColumnDescriptor#setBloomFilterType() U can check with type >> BloomType.ROW >> >> -Anoop- >> >> _ >>

Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal
> Seems blocks you are getting from cache. >>> You can check with Blooms also once. >>> >>> You can enable the usage of bloom using the config param >>> "io.storefile.bloom.enabled" set to true  . This will enable the usage of >>> bloom globally &

Re: Scan vs Put vs Get

2012-06-28 Thread Jean-Marc Spaggiari
Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I mean, bad I did not figured that. Thanks for pointing that. That definitively explain the difference in the performances. I have activated the bloomfilters with this code: HBaseAdmin admin = new HBaseAdmin(config); HTable table =

Re: Scan vs Put vs Get

2012-06-28 Thread N Keywal
For the filter list my guess is that you're filtering out all rows because RandomRowFilter#chance is not initialized (it should be something like RandomRowFilter rrf = new RandomRowFilter(0.5);) But note that this test will never be comparable to the test with a list of gets. You can make it as slo

Re: Scan vs Put vs Get

2012-06-28 Thread Jean-Marc Spaggiari
Oh, sorry. You're right. You already said that and I forgot to update it. It's working fine when I add this parameter. And as you are saying, I can get the respons time I want by playing with the chance... I get (34758 lines/seconds) with 0.99 as the chance, and only (7564 lines/seconds) with 0.09