Re: [Programmatic cluster monitoring] How to use the HBase monitoring APIs

2012-10-02 Thread techbuddy
Thanks for clarifying the usage part out Stack! As for programmatic monitoring, I was trying to figure out how to extend the already available metrics capture mechanism (available for Region server and Master server processes) to dump some *custom *metrics into a file (using the

Re: DoubleColumnInterpretor for my coprocessor

2012-10-02 Thread J Mohamed Zahoor
After some dig up. it was some goof up with the jar file i loaded.. It works in both the case flawlessly now. ./Zahoor On 26-Sep-2012, at 3:35 PM, Julian Wissmann julian.wissm...@sdace.de wrote: DoubleColumnInterpreter

Re: HBase vs. HDFS

2012-10-02 Thread Andrew Purtell
On Tue, Oct 2, 2012 at 9:05 AM, lars hofhansl lhofha...@yahoo.com wrote: You probably executed 120k next() RPC against your server, unless you enabled scanner caching. (On a related note, we should probably not default this to 1, but something more sensible, like 10 or 100). We use 100. --

Re: HBase vs. HDFS

2012-10-02 Thread Doug Meil
Hi there, Another thing to consider on top of the scan-caching is that that HBase is doing more in the process of scanning the table. See... http://hbase.apache.org/book.html#conceptual.view http://hbase.apache.org/book.html#regions.arch ... Specifically, processing the KeyValues,

Re: HBase vs. HDFS

2012-10-02 Thread gordoslocos
Thank you all! Setting a cache size helped a great deal. It's still slower though. I think it might be possible that the overhead of processing the data from the table might be the cause. I guess if HBase adds an indirection to the HDFS then it makes sense that it'd be slower, right? On

Re: HBase vs. HDFS

2012-10-02 Thread Doug Meil
If you take Hbase out of it and think of it from the standpoint of 2 programs, one of which opens a file and write the output to another file, and the other one which actually processes each row and then writes out results, the 2nd one is going to be slower because it's doing more, ceteris

Re: long garbage collecting pause

2012-10-02 Thread Damien Hardy
Hello 2012/10/2 Marcos Ortiz mlor...@uci.cu Another thing that I´m seeing is that one of your main process is compaction, so you can optimize all this inceasing the size of your regions (by defaulf the size of a region is 256 MB), but you will have in your hands a split/compaction storm

Re: long garbage collecting pause

2012-10-02 Thread Michael Segel
You really don't want to go to 20GB. Without knowing the number of regions... going beyond 1-2 GB may cause more headaches than its worth. Sorry, but I tend to be very cautious when it comes to tuning. -Mike On Oct 2, 2012, at 9:20 AM, Damien Hardy dha...@viadeoteam.com wrote: Hello

HBase table row key design question.

2012-10-02 Thread Jason Huang
Hello, I am designing a HBase table for users and hope to get some suggestions for my row key design. Thanks... This user table will have columns which include user information such as names, birthday, gender, address, phone number, etc... The first time user comes to us we will ask all these

HBase User Group in Paris

2012-10-02 Thread n keywal
Hi all, I was wondering how many HBase users there are in Paris (France...). Would you guys be interested in participating in a Paris-based user group? The idea would be to share HBase practises, with something like a meet-up per quarter. Reply to me directly or on the list, as you prefer.

RE: HAcid: multi-row transactions in HBase

2012-10-02 Thread de Souza Medeiros Andre
Hi Lars, That's an interesting observation, I haven't thought before about scans in HAcid. Your suggestion for a solution is really close to what I would do: implement HAcidScan as a HBase Scan that filters according to the cache column. Thanks, I will add this feature to the to do list. --

Re: HBase User Group in Paris

2012-10-02 Thread Bertrand Dechoux
For information, there is a Hadoop User Group France in Paris. https://twitter.com/hugfrance You might want to get in touch. HBase is clearly in topic. Regards Bertrand On Tue, Oct 2, 2012 at 4:32 PM, n keywal nkey...@gmail.com wrote: Hi all, I was wondering how many HBase users there

Re: HBase User Group in Paris

2012-10-02 Thread Bertrand Dechoux
And we are looking for speakers. It does not need to be formal/theoretical presentations. It can also be a feedback on your own experience. You can submit proposals on the website : http://hugfrance.fr/ Regards Bertrand On Tue, Oct 2, 2012 at 4:56 PM, Bertrand Dechoux decho...@gmail.com wrote:

Re: long garbage collecting pause

2012-10-02 Thread Greg Ross
Thanks for the suggestions. I was attempting to tune the GC via mapred.child.java.opts in the job's Oozie config instead of in hbase-env.sh. I think this is why my efforts were to no avail. It was likely having no effect on the read/write performance. Is there any way of specifying job-specific

Re: long garbage collecting pause

2012-10-02 Thread Marcos Ortiz
El 02/10/2012 11:32, Greg Ross escribió: Thanks for the suggestions. I was attempting to tune the GC via mapred.child.java.opts in the job's Oozie config instead of in hbase-env.sh. I think this is why my efforts were to no avail. It was likely having no effect on the read/write performance.

Re: HBase User Group in Paris

2012-10-02 Thread Adrien Mogenet
I'm in ! On Tue, Oct 2, 2012 at 5:21 PM, Bertrand Dechoux decho...@gmail.com wrote: And we are looking for speakers. It does not need to be formal/theoretical presentations. It can also be a feedback on your own experience. You can submit proposals on the website : http://hugfrance.fr/

Re: HBase table row key design question.

2012-10-02 Thread Doug Meil
Hi there, while this isn't an answer to some of the specific design questions, this chapter in the RefGuide can be helpful for general design.. http://hbase.apache.org/book.html#schema On 10/2/12 10:28 AM, Jason Huang jason.hu...@icare.com wrote: Hello, I am designing a HBase table for

Re: HBase table row key design question.

2012-10-02 Thread Jason Huang
Thanks Mohammad. The issue about phone number is that it tends to change over time and we think name and DOB are more reliable. SSN is more unique but the issue is that we can't force the user to provide it. Basically we have limited information that can be used. thanks, Jason On Tue, Oct 2,

Re: Problem with recreation of a phantom table

2012-10-02 Thread yuzhihong
Can you try using hbck ? In the future, don't remove anything before using hbck. Thanks On Oct 2, 2012, at 3:55 PM, Shumin Wu shumin...@gmail.com wrote: Hi, I am using HBase 0.92 and got stuck with deletion/recreation of a phantom table. The table became phantom because hbase server

HBase: small WAL transactions Q

2012-10-02 Thread Alex Baranau
Hello, May be silly question. Data in WAL is written in small transactions. One transaction is a set of KeyValues for specific (single) row. As we want each written transaction to be durable we write them into the WAL one-by-one (ideally with FS sync() calls, etc. on each write). Which is very

Re: HBase: small WAL transactions Q

2012-10-02 Thread lars hofhansl
This is an interesting observation. I have not thought about HBASE-5229 in terms of a performance improvement. Currently HRegion.mutateRowsWithLocks actually acquires locks on all rows first (since the contract here is a transaction), so (currently) you would get unnecessarily reduced

Re: HBase: small WAL transactions Q

2012-10-02 Thread Alex Baranau
Currently HRegion.mutateRowsWithLocks actually acquires locks on all rows first (since the contract here is a transaction), so (currently) you would get unnecessarily reduced concurrency using that API for changes that do not need to be atomic. Right, it's about unnecessarily reduced

Re: HBase: small WAL transactions Q

2012-10-02 Thread Ted Yu
That person should have been Lars, I think. On Tue, Oct 2, 2012 at 7:04 PM, Alex Baranau alex.barano...@gmail.comwrote: Currently HRegion.mutateRowsWithLocks actually acquires locks on all rows first (since the contract here is a transaction), so (currently) you would get unnecessarily

Re: [Programmatic cluster monitoring] How to use the HBase monitoring APIs

2012-10-02 Thread Otis Gospodnetic
Hi, Have a look at https://github.com/sematext/HBaseMetricsContext + http://blog.sematext.com/2011/07/31/extending-hadoop-metrics/ -- this may lead you in the right direction if you really really need to do this although I'm not sure if that stuff is outdated now. If you are just trying to

Re: HBase: small WAL transactions Q

2012-10-02 Thread lars hofhansl
Heh, yes. See HDFS-744 and HBASE-5954. And re: doMiniBatchMutation in HRegion, it does write multiple Puts (even for different row keys) into a single WALEdit. -- Lars From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Sent: Tuesday, October 2,

Re: [Programmatic cluster monitoring] How to use the HBase monitoring APIs

2012-10-02 Thread techbuddy
Thanks for that link Otis! This indeed allows completely overriding the default monitoring by Hbase, however what we are looking at really is capturing some additional metrics over and above what the monitoring is already generating. So, we figured a way to achieve that through co-processors as

Re: [Programmatic cluster monitoring] How to use the HBase monitoring APIs

2012-10-02 Thread Stack
On Mon, Oct 1, 2012 at 10:52 PM, techbuddy techbuddy...@gmail.com wrote: As for programmatic monitoring, I was trying to figure out how to extend the already available metrics capture mechanism (available for Region server and Master server processes) to dump some *custom *metrics into a file

RE: Column Qualifier space requirements

2012-10-02 Thread Anoop Sam John
It means that in order to save space I need to use smallest Column Qualifier (and sometimes it makes sense)... Yes However, why Column Family (byte array) is repeated for each KeyValue? Is it physically repeated for each cell? Yes CF byte[] also physically stored in every cell (every KV).. At