RE: Essential column family performance

2013-04-09 Thread Anoop Sam John
Good finding Lars & team :) -Anoop- From: lars hofhansl [la...@apache.org] Sent: Wednesday, April 10, 2013 9:46 AM To: user@hbase.apache.org Subject: Re: Essential column family performance That part did not show up in the profiling session. It was just t

Re: Efficient way to use different storage medium

2013-04-09 Thread Ted Yu
This one is under active discussion: HDFS-4672 Support tiered storage policies Cheers On Tue, Apr 9, 2013 at 10:02 PM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > Hi > > Interesting topic and we have a JIRA already raised for such a feature. > But still the work is in pro

Re: Efficient way to use different storage medium

2013-04-09 Thread ramkrishna vasudevan
Hi Interesting topic and we have a JIRA already raised for such a feature. But still the work is in progress https://issues.apache.org/jira/browse/HBASE-6572 https://issues.apache.org/jira/browse/HDFS-2832 Regards Ram On Tue, Apr 9, 2013 at 10:07 PM, Stack wrote: > On Tue, Apr 9, 2013 at 4:4

Re: Essential column family performance

2013-04-09 Thread lars hofhansl
That part did not show up in the profiling session. It was just the unnecessary seek that slowed it all down. -- Lars From: Ted Yu To: user@hbase.apache.org Sent: Tuesday, April 9, 2013 9:03 PM Subject: Re: Essential column family performance Looking at pop

Re: Essential column family performance

2013-04-09 Thread Ted Yu
Looking at populateFromJoinedHeap(): KeyValue kv = populateResult(results, this.joinedHeap, limit, joinedContinuationRow.getBuffer(), joinedContinuationRow .getRowOffset(), joinedContinuationRow.getRowLength(), metric); ... Collections.sort(results, comparator);

Re: Essential column family performance

2013-04-09 Thread Ted Yu
bq. with only 1 rows that would all fit in the memstore. This aspect should be enhanced in the test. Cheers On Tue, Apr 9, 2013 at 6:17 PM, Lars Hofhansl wrote: > Also the unittest tests with only 1 rows that would all fit in the > memstore. Seek vs reseek should make little difference

Re: Essential column family performance

2013-04-09 Thread Lars Hofhansl
Also the unittest tests with only 1 rows that would all fit in the memstore. Seek vs reseek should make little difference for the memstore. We tested with 1m and 10m rows, and flushed the memstore and compacted the store. Will do some more verification later tonight. -- Lars Lars H wro

Re: Essential column family performance

2013-04-09 Thread Lars H
Your slow scanner performance seems to vary as well. How come? Slow is with the feature off. I don't how reseek can be slower than seek in any scenario. -- Lars Ted Yu schrieb: >I tried using reseek() as suggested, along with my patch from HBASE-8306 (30% >selection rate, random distribution

Re: Too many open files (java.net.SocketException)

2013-04-09 Thread Ted
you might also want to just check what file-max is, more /proc/sys/fs/file-max I just checked on my fedora and ubuntu systems and it appears they default to 785130 and 2452636 respectively so you might not want to accidentally decrease those numbers. On 4/10/13, Andrew Purtell wrote: > Correct,

Re: Essential column family performance

2013-04-09 Thread Ted Yu
I tried using reseek() as suggested, along with my patch from HBASE-8306 (30% selection rate, random distribution and FAST_DIFF encoding on both column families). I got uneven results: 2013-04-09 16:59:01,324 INFO [main] regionserver.TestJoinedScanners(167): Slow scanner finished in 7.529083 seco

Re: Essential column family performance

2013-04-09 Thread lars hofhansl
We did some tests here. I ran this through the profiler against a local RegionServer and found the part that causes the slowdown is a seek called here:      boolean mayHaveData =   (nextJoinedKv != null && nextJoinedKv.matchingRow(currentRow, offset, length))   ||

Re: Hbase question

2013-04-09 Thread Ted Yu
Gary has provided nice summary of things to watch out for. One more thing I want to mention is that care should be taken w.r.t. coordinating the progress of the thread pool and normal region operations. There're already many threads running in the region server JVM. Adding one more thread pool may

Re: HBase tasks

2013-04-09 Thread Pavel Hančar
Hi, thanks for the answer. Yes I meant in-memory column family. But please, does it matter if I have two column families in separate tables or not? Or is it somehow stupid to have a table with only one CF? I have one column family with pictures and the other with two columns. In the first there ar

Re: Hbase question

2013-04-09 Thread Gary Helmling
Hi Rami, One thing to note for RegionObservers, is that each table region gets its own instance of each configured coprocessor. So if your cluster has N regions per region server, with your RegionObserver loaded on all tables, then each region server will have N instances of your coprocessor. Yo

Re: Hbase question

2013-04-09 Thread Ted Yu
Rami: Can you tell us what coprocessor hook you plan to use ? Thanks On Tue, Apr 9, 2013 at 10:51 AM, Rami Mankevich wrote: > First of all - thanks for the quick response. > > Basically threads I want to open are for my own internal structure > updates and I guess have no relations to HBase i

RE: Hbase question

2013-04-09 Thread Rami Mankevich
First of all - thanks for the quick response. Basically threads I want to open are for my own internal structure updates and I guess have no relations to HBase internal structures. All I want is initiations for some asynchronous structure updates as part of coprocessor execution in order not

Re: ANN: HBase Refcard available

2013-04-09 Thread Otis Gospodnetic
Hi Ted, Uh, it's been a while. CCing Alex, maybe he'll know? Otis -- HBASE Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm On Tue, Apr 9, 2013 at 2:06 PM, Ted Yu wrote: > Otis: > Thanks for your effort. > > Can you tell me which HBase version the refcard

Re: ANN: HBase Refcard available

2013-04-09 Thread Doug Meil
You beat me to it! :-) I just realized that right when I hit enter on my previous email. On 4/9/13 2:05 PM, "Otis Gospodnetic" wrote: >Hi Stack (cleaning your inbox? ;)) > >Looks like Doug did it a while back - >https://issues.apache.org/jira/browse/HBASE-6574 ? > >Otis >-- >HBASE Perfo

Re: ANN: HBase Refcard available

2013-04-09 Thread Doug Meil
I've been trying to keep pointers to stuff like that in the appendix in "other info about hbase" http://hbase.apache.org/book.html#other.info On 4/9/13 2:00 PM, "Stack" wrote: >Make a patch for the reference guide that points to this Otis? Or just >tell me where to insert? >Thanks, >St

Re: ANN: HBase Refcard available

2013-04-09 Thread Ted Yu
Otis: Thanks for your effort. Can you tell me which HBase version the refcard targets ? On Wed, Aug 8, 2012 at 4:14 PM, Otis Gospodnetic wrote: > Hi, > > We wrote an HBase Refcard and published it via DZone. Here is our very > brief announcement: > http://blog.sematext.com/2012/08/06/announci

Re: ANN: HBase Refcard available

2013-04-09 Thread Otis Gospodnetic
Hi Stack (cleaning your inbox? ;)) Looks like Doug did it a while back - https://issues.apache.org/jira/browse/HBASE-6574 ? Otis -- HBASE Performance Monitoring - http://sematext.com/spm/index.html On Tue, Apr 9, 2013 at 2:00 PM, Stack wrote: > Make a patch for the reference guide that poin

Re: ANN: HBase Refcard available

2013-04-09 Thread Stack
Make a patch for the reference guide that points to this Otis? Or just tell me where to insert? Thanks, St.Ack On Wed, Aug 8, 2012 at 4:14 PM, Otis Gospodnetic wrote: > Hi, > > We wrote an HBase Refcard and published it via DZone. Here is our very > brief announcement: > http://blog.sematext.

Re: Interactions between max versions and filters

2013-04-09 Thread Christophe Taton
Hi, Thanks for all your answers, that was very helpful. It appears we were using the non-intended behavior of the ColumnPaginationFilter. I now understand that: - max versions applies post-filtering. - ColumnPaginationFilter forces max-versions to 1 (and so does ColumnCountGetFilter). >From some

Re: Efficient way to use different storage medium

2013-04-09 Thread Stack
On Tue, Apr 9, 2013 at 4:41 AM, Bing Jiang wrote: > hi, > > There are some physical machines which each one contains a large ssd(2T) > and general disk(4T), > and we want to build our hdfs and hbase environment. > What kind of workload do you intend to run on these machines? Do you have enough

Re: Too many open files (java.net.SocketException)

2013-04-09 Thread Andrew Purtell
Correct, nproc has nothing to do with file table issues. I typically do something like this when setting up a node: echo "@hadoop soft nofile 65536" >> /etc/security/limits.conf echo "@hadoop hard nofile 65536" >> /etc/security/limits.conf where all accounts launching Hadoop daemons are in the '

Re: Too many open files (java.net.SocketException)

2013-04-09 Thread Jean-Marc Spaggiari
But there was not any trace looking like "OutOfMemoryError". nproc might has result with this, no? Not a SocketException? Anyway, I have increased it to 32768. I will see if I face that again. Thanks, JM 2013/4/9 Ted Yu : > According to http://hbase.apache.org/book.html#ulimit , you should incre

Re: Too many open files (java.net.SocketException)

2013-04-09 Thread Ted Yu
According to http://hbase.apache.org/book.html#ulimit , you should increase nproc setting. Cheers On Tue, Apr 9, 2013 at 8:33 AM, Jean-Marc Spaggiari wrote: > Hi, > > I just faced an issue this morning on one of my RS. > > Here is an extract of the logs > 2013-04-09 11:05:33,164 ERROR org.apach

Too many open files (java.net.SocketException)

2013-04-09 Thread Jean-Marc Spaggiari
Hi, I just faced an issue this morning on one of my RS. Here is an extract of the logs 2013-04-09 11:05:33,164 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file /hbase/entry_proposed/ae4a5d72d4613728ddbcc5a64262371b/.tmp/ed6a0154ef714cd88faf26061cf248d3 : java.net.SocketException: To

java.io.FileNotFoundException: File does not exist: /usr/lib/zookeeper/zookeeper-3.4.3-cdh4.0.1.jar

2013-04-09 Thread Dhanasekaran Anbalagan
Hi Guys, I am trying to execute sample PerformanceEvaluation test program on Hbase. not work properly. hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 100 It's says 13/04/09 08:50:43 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:j

Re: Efficient way to use different storage medium

2013-04-09 Thread Ted
Please take a look at https://issues.apache.org/jira/browse/HBASE-7404 Where bucket cache can be configured as secondary cache and utilize the speed of your ssd device. Cheers On Apr 9, 2013, at 4:41 AM, Bing Jiang wrote: > hi, > > There are some physical machines which each one contains a l

Re: Efficient way to use different storage medium

2013-04-09 Thread Jean-Marc Spaggiari
Hi Bing, If you mount all your drives into HDFS, some blocks are going to be on SSD and some on regular drives. So from reads are going to be fast, and some others are going to be slow. On a single machine, I don't think you can specify which table will be on which drive since the blocks are goin

Efficient way to use different storage medium

2013-04-09 Thread Bing Jiang
hi, There are some physical machines which each one contains a large ssd(2T) and general disk(4T), and we want to build our hdfs and hbase environment. If we use all storage(6T) as each machine provides, I want to know whether it is an efficient way to make advantage of ssd, or provide different

HBaseStorage. Inconsistent result.

2013-04-09 Thread Eugene Morozov
Hello everyone. I have following script: pages = LOAD 'hbase://mmpages' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('t:d', '-loadKey'); pages2 = FOREACH pages GENERATE $0; pages3 = DISTINCT pages2; g_pages = GROUP pages3 all PARALLEL 1; s_pages = FOREACH g_pages GENERATE 'count', COUNT(