Re: Wrong input split locations after enabling reverse DNS

2013-01-28 Thread Robert Dyer
Just to follow up here, I did manage to test a patch on TableInputFormatBase.java and it resolved my issue. I filed https://issues.apache.org/jira/browse/HBASE-7693 and will attach the patch as soon as my Git updates. On Mon, Dec 17, 2012 at 8:45 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

Re: Storing images in Hbase

2013-01-28 Thread Adrien Mogenet
Could HCatalog be an option ? Le 26 janv. 2013 21:56, Jack Levin magn...@gmail.com a écrit : AFAIK, namenode would not like tracking 20 billion small files :) -jack On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed sahmed1...@gmail.com wrote: That's pretty amazing. What I am confused is, why

New htable slow

2013-01-28 Thread Lsshiu
Hi In my 0.90.6 hbase environment , each time when I try new htable, it was slow. Put also slow too. Can I turn on some kind of trace to know the exact time spent in each detail function call? Thanks.

Re: New htable slow

2013-01-28 Thread Mohammad Tariq
Hi there, Do you have too many Htable instances opened simultaneously?It's not advisable to do so. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 28, 2013 at 5:00 PM, Lsshiu lss...@gmail.com wrote: Hi In my 0.90.6 hbase environment , each time when I

Indexing Hbase Data

2013-01-28 Thread Mohammad Tariq
Hello list, I would like to have some suggestions on Hbase data indexing. What would you prefer? I never faced such requirement till now. This is the first time when there is a need of indexing, so thought of getting some expert comments and suggestions. Thank you so much for your

Re: Pre-split Region Boundaries

2013-01-28 Thread Amit Sela
We are pre-splitting our tables before bulk loading also but we don't use the RegionSplitter. We split manually (we did some testing and found the optimal split points) by putting into .META table a new HRegionInfo, assigning that region (HBaseAdmin.assign(region name)) and after you finish

Re: Indexing Hbase Data

2013-01-28 Thread Viral Bajaria
When you say indexing, are you referring to indexing the column qualifiers or the values that you are storing in the qualifier ? Regarding indexing, I remember someone had recommended this on the mailing list before: https://github.com/ykulbak/ihbase/wiki but it seems the development on that is

Re: Indexing Hbase Data

2013-01-28 Thread Mohammad Tariq
Hello Viral, Thank you so much for the quick response. Intention is to index the values. I'll have a look at ihbase. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 28, 2013 at 5:22 PM, Viral Bajaria viral.baja...@gmail.comwrote: When you say indexing,

Re: New htable slow

2013-01-28 Thread Lsshiu
Hi Tariq, Thanks for the tip, but I only opened a few htable instances. The total region numbers were quite many (more than 7 ) though. Hi there, Do you have too many Htable instances opened simultaneously?It's not advisable to do so. Warm Regards, Tariq

Re: Indexing Hbase Data

2013-01-28 Thread ramkrishna vasudevan
As a POC, just try to load the data into another table that has the rowkey that has the original row's value. Try to scan the index table first and then get the main table row key. First this should help, later can make this more better by using coprocessors. Regards Ram On Mon, Jan 28, 2013 at

Re: New htable slow

2013-01-28 Thread Mohammad Tariq
Use HTbalePool instead and see if it gives you better performance. Creating an HTable instance is a fairly expensive operation that takes a few seconds to complete. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Jan 28, 2013 at 5:35 PM, Lsshiu lss...@gmail.com

Re: Indexing Hbase Data

2013-01-28 Thread Mohammad Tariq
Thank you for the valuable reply sir. Actually I tried that and it works fine. But we need faster results. I was thinking of creating an index and have it loaded in the memory, at all times. so that fetches are faster. Is there any OOTB feature available in co-proc? Warm Regards, Tariq

Re: New htable slow

2013-01-28 Thread Lsshiu
Hi Tariq, Thanks for the update, I'll try it. Btw , will put operation makes any performance difference using htable or htablepool ? Use HTbalePool instead and see if it gives you better performance. Creating an HTable instance is a fairly expensive operation that takes a few seconds to

Re: Indexing Hbase Data

2013-01-28 Thread Jean-Marc Spaggiari
Hi Mohammad, I don't really see how you can get faster results than indexing the content as the row key in another table. Access is direct after that. What do you mean with faster resuls? To build the index? Or to read through it? JM 2013/1/28, Mohammad Tariq donta...@gmail.com: Thank you for

Re: How to get coprocessor list by client API

2013-01-28 Thread Jean-Marc Spaggiari
Hi Kyle, If you are not running a production cluster, you might think about getting the last 0.94.4 source code, apply HBASE-7654 and deploy it. That way you can use getCoprocessors which will send you the list you the list you are looking for... JM 2013/1/28, Kyle Lin kylelin2...@gmail.com:

Re: New htable slow

2013-01-28 Thread Mohammad Tariq
It does. These excerpts from Hbase Definitive Guide might help you in a better fashion : HTablePool: Instead of creating an HTable instance for every request from your client application, it makes much more sense to create one initially and subsequently reuse them. The primary reason for doing so

Re: Short-circuit reads

2013-01-28 Thread Jean-Marc Spaggiari
Thanks J-D. I found it with JConsole in hadoop/HBase/RegionServerStatistics/Attributes. JM 2013/1/27, Jean-Daniel Cryans jdcry...@apache.org: It's in the region server metrics and also published through JMX. J-D On Sun, Jan 27, 2013 at 2:55 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org

Re: New htable slow

2013-01-28 Thread Lsshiu
Hi Traiq, I'll try that and update the result , thanks. It does. These excerpts from Hbase Definitive Guide might help you in a better fashion : HTablePool: Instead of creating an HTable instance for every request from your client application, it makes much more sense to create one

how to model data based on time bucket

2013-01-28 Thread Oleg Ruchovets
Hi , I have such row data structure: event_id | time = event1 | 10:07 event2 | 10:10 event3 | 10:12 event4 | 10:20 event5 | 10:23 event6 | 10:25 Numbers of records is 50-100 million. Question: I need to find group of events starting form eventX and enters to the time window

Re: HBase vs Hadoop memory configuration.

2013-01-28 Thread Kevin O'dell
JM, You would control those through the hadoop-env.sh using JOBTRACKER_OPTS, TASKTRACKER_OPTS and then setting xmx for the desired heap. On Sun, Jan 27, 2013 at 11:33 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: From the UI: 15790 files and directories, 11292 blocks = 27082

Re: Indexing Hbase Data

2013-01-28 Thread Mohammad Tariq
Hello Jean, Actually it's to read the values faster. The problem goes like this : I have a table that has just 2 columns : 1- Stores some clause. 2- Stores all possible aliases for the original clause. These clauses are again

Re: how to model data based on time bucket

2013-01-28 Thread Rodrigo Ribeiro
You can use another table as a index, using a rowkey like '{time}:{event_id}', and then scan in the range [10:07, 10:15). On Mon, Jan 28, 2013 at 10:06 AM, Oleg Ruchovets oruchov...@gmail.comwrote: Hi , I have such row data structure: event_id | time = event1 | 10:07 event2 |

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Vandana Ayyalasomayajula
Hi viral, Try adding -Psecurity and then compiling. Thanks Vandana Sent from my iPhone On Jan 28, 2013, at 3:05 AM, Viral Bajaria viral.baja...@gmail.com wrote: Hi, Is anyone running hbase 0.94.4 against hadoop 0.23.5 ? If yes, how did you end up compiling hbase for hadoop 0.23 ? I

Re: how to model data based on time bucket

2013-01-28 Thread Oleg Ruchovets
Hi Rodrigo. Can you please explain in more details your solution.You said that I will have another table. How many table will I have? Will I have 2 tables? What will be the schema of the tables? I try to explain what I try to achive: I have ~50 million records like {time|event}. I want to

Re: how to model data based on time bucket

2013-01-28 Thread Michel Segel
Tough one in that if your events are keyed on time alone, you will hit a hot spot on write. Reads,not so much... TSDB would be a good start ... You may not need 'buckets' but just a time stamp and set up a start and stop key values. Sent from a remote device. Please excuse any typos... Mike

Re: how to model data based on time bucket

2013-01-28 Thread Oleg Ruchovets
Yes , I agree that using only timestamp it will cause hotspot. I can create prespliting for regions. I saw TSDB video and presentation and their data model. I think this is not suitable for my case. I looked thru google alot and for my surprise there is any post about such clasic problem.

Re: how to model data based on time bucket

2013-01-28 Thread Rodrigo Ribeiro
In the approach that i mentioned, you would need a table to retrieve the time of a certain event(if this information can retrieve in another way, you may ignore this table). It would be like you posted: event_id | time = event1 | 10:07 event2 | 10:10 event3 | 10:12 event4 | 10:20 And

Re: What's the maximum number of regions per region server?

2013-01-28 Thread Kevin O'dell
Hi James, How did the nodes crash? I am asking because it would be good to know where it hurts. As to your 6500 regions per region server, that is an order of magnitude high than we like to see. With that many regions you are going to run into a few issues: 1.) Small flushes due to memstore

Re: how to model data based on time bucket

2013-01-28 Thread Oleg Ruchovets
Yes. This is very interesting approach. Is it possible to read from main key and scan from another using map/reduce? I don't want to read from single client. I use hbase version 0.94.2.21. Thanks Oleg. On Mon, Jan 28, 2013 at 6:27 PM, Rodrigo Ribeiro rodrigui...@jusbrasil.com.br

Re: HBase 0.92 warnings about SLF4J bindings

2013-01-28 Thread Kavish Ahuja
first of all clear hdfs folder which you created while installing hadoop... its the same folder which contains zoopkeeper files. and then delete the file slf4j-log4j12-1.5.8.jar If you dont want to delete then simple move it outside the hbase folder to some other place. /home/ahuja/hbase/lib

Re: how to model data based on time bucket

2013-01-28 Thread Rodrigo Ribeiro
Yes, it's possible, Check this solution: http://stackoverflow.com/questions/11353911/extending-hadoops-tableinputformat-to-scan-with-a-prefix-used-for-distribution On Mon, Jan 28, 2013 at 2:07 PM, Oleg Ruchovets oruchov...@gmail.comwrote: Yes. This is very interesting approach. Is it

Re: how to model data based on time bucket

2013-01-28 Thread Oleg Ruchovets
I think I didn't explain correct. I want to read from 2 table in context of 1 mapreduce job. I mean I want to read one key from main table and scan range from another in the same mapreduce job.I only found MultiTableOutputFormat and there is no MultiTableInputFormat. Is there any workaround to

Re: Storing images in Hbase

2013-01-28 Thread Jack Levin
I've never tried it, HBASE worked out nicely for this task, caching and all is a bonus for files. -jack On Mon, Jan 28, 2013 at 2:01 AM, Adrien Mogenet adrien.moge...@gmail.com wrote: Could HCatalog be an option ? Le 26 janv. 2013 21:56, Jack Levin magn...@gmail.com a écrit : AFAIK, namenode

Re: Storing images in Hbase

2013-01-28 Thread Andrew Purtell
If I were to design a large object store on HBase, I would do the following: Under a threshold, store the object data into HBase. Over the threshold, store metadata for the object only into HBase and the object data itself into a file in HDFS. The threshold could be a fixed byte size like 100 MB,

Re: LRU stats

2013-01-28 Thread Jean-Daniel Cryans
IIRC when a file closes it will evict its own blocks since they won't be used after that. J-D On Sun, Jan 27, 2013 at 1:04 AM, Varun Sharma va...@pinterest.com wrote: Since i am using only 10 % of allocated cache, I think EvictionThread never ran - hence, I see the value 0. What's mysterious

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Viral Bajaria
Thanks Vandana for reply. I tried that but no luck. It still throws the same error. I thought there might have been a typo and you meant -D and not -P but none of them worked. I verified that the hadoop-auth code base does not have KerberosUtil class anymore. So I am guessing there is some, but I

Re: Tables vs CFs vs Cs

2013-01-28 Thread Andrew Purtell
IPv6 can support up to 281,474,976,710,656 networks. Assuming you only want to group by networks, that is already a potentially very large keyspace. The *minimum* number of distinct addresses a V6 network can contain (the smallest advertisable prefix is /48) is 1,208,925,819,614,629,174,706,176.

Re: read from multiple hbase table using mapreduce

2013-01-28 Thread Jean-Marc Spaggiari
Hi Oleg, the mapreduce job will allow you to scan the first table per row, but in you job you will have to use the client API to access the 2nd table. You can use the setup and cleanup method to create or open the 2nd table you want to access. JM 2013/1/28, Oleg Ruchovets oruchov...@gmail.com:

Re: Storing images in Hbase

2013-01-28 Thread yiyu jia
Hi jack, thank you for sharing! Hello Andrew, You mentioned an interesting topic, which is cache. My question is why I need cache between HBase and HDFS if I have cache configured between HBase and its caller application? Let's say I have an web application which use HBase as data source at

Re: Storing images in Hbase

2013-01-28 Thread Andrew Purtell
You bring up a very common consideration I think. For static content, such as images, then a cache can help offload read load from the datastore. This fits into this conversation. For dynamic content, then an external caching may not be helpful as you say, although blockcache within HBase will

Re: Storing images in Hbase

2013-01-28 Thread yiyu jia
Hi Andy, Thanks a lot for sharing. Yes. I am not talking about static content caching, which may be called as internal CDN today. I am asking some techniques of configuring cache on different layers with concerning about avoiding duplicate caching on different layers. thanks and regards, Yiyu

Re: Storing images in Hbase

2013-01-28 Thread Andrew Purtell
In that case, then hypothetically speaking, you could disable HBase blockcache on the table containing static content and rely on an external reverse proxy tier, and enable HBase blockcache on the tables that you are using as part of generation of dynamic content. On Mon, Jan 28, 2013 at 1:44

Re: Tables vs CFs vs Cs

2013-01-28 Thread Asaf Mesika
I would go on using the row-key, on one table. = Row Key Structure = group-depthA groupB groupC groupD group group-depth: 1..4, encoded as 1 byte A-D group; encoded as 1 byte and not as string Examples: 1192 2192168 31921681 4192168110 Column Qualifier: c - stands for counters Column

Max storefile size in CDH3u4

2013-01-28 Thread Lashing
Hi I'm running high in region number, can someone tell me what's the max storefile size in CDH3u4, thanks.

Re: Max storefile size in CDH3u4

2013-01-28 Thread Kevin O'dell
What are you currently using? Also, what is your current region per node count? On Jan 28, 2013 6:50 PM, Lashing lss...@gmail.com wrote: Hi I'm running high in region number, can someone tell me what's the max storefile size in CDH3u4, thanks.

Re: Max storefile size in CDH3u4

2013-01-28 Thread Bryan Beaudreault
4GB On Mon, Jan 28, 2013 at 6:49 PM, Lashing lss...@gmail.com wrote: Hi I'm running high in region number, can someone tell me what's the max storefile size in CDH3u4, thanks.

Re: Max storefile size in CDH3u4

2013-01-28 Thread Lsshiu
Thanks. So I have to upgrade to CDH4 in order to reduce the region number ? Bryan Beaudreault bbeaudrea...@hubspot.com 4GB On Mon, Jan 28, 2013 at 6:49 PM, Lashing lss...@gmail.com wrote: Hi I'm running high in region number, can someone tell me what's the max storefile size in

Re: Max storefile size in CDH3u4

2013-01-28 Thread Lsshiu
3gb More than one thousand. Kevin O'dell kevin.od...@cloudera.com What are you currently using? Also, what is your current region per node count? On Jan 28, 2013 6:50 PM, Lashing lss...@gmail.com wrote: Hi I'm running high in region number, can someone tell me what's the max

Re: Max storefile size in CDH3u4

2013-01-28 Thread Kevin O'dell
Lsshiu, That is quite high. Also, you are right on the cusp of the recommended region size for HBase .90. If you can make the upgrade I would recommend upgrading to CDH4(.92) so that you can take advantage of HFilev2 and use 10 - 20GB region sizes. If not, you can go between 4 - 10GB on .90,

RegionServer crashes silently under heavy RowKey scans

2013-01-28 Thread Jim Abramson
Hi, We are testing HBase for some read-heavy batch operations, and encountering frequent, silent RegionServer crashes. The application does many thousands of very selective row scans on a dataset containing several hundred million rows (less than 200GB overall), via thrift. We have

Re: Find the tablename in Observer

2013-01-28 Thread Ted Yu
void prePut(final ObserverContextRegionCoprocessorEnvironment c, final Put put, final WALEdit edit, final boolean writeToWAL) ((RegionCoprocessorEnvironment)c.getEnvironment()).getRegion().getRegionInfo().getTableName() Cheers On Mon, Jan 28, 2013 at 4:56 PM, Rajgopal Vaithiyanathan

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Vandana Ayyalasomayajula
May be thats the issue. Try downloading the source from 0.94 branch and use the maven command with -Psecurity and -Dhadoop.profile=23. That should work. Thanks Vandana On Jan 28, 2013, at 11:48 AM, Viral Bajaria wrote: Thanks Vandana for reply. I tried that but no luck. It still throws the

Re: Find the tablename in Observer

2013-01-28 Thread Rajgopal Vaithiyanathan
Great. Thanks.. is there anyway that I can get it before prePut() ?? Like from the constructor or from the start() method too ? i followed the code of CoprocessorEnvironment and didn't seem to get anything out of it. On Mon, Jan 28, 2013 at 5:09 PM, Ted Yu yuzhih...@gmail.com wrote: void

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Ted Yu
I tried compiling tip of 0.94 with (and without) -Psecurity. In both cases I got: [ERROR] /Users/tyu/94-hbase/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:[41,53] cannot find symbol [ERROR] symbol : class KerberosUtil [ERROR] location: package

Re: Find the tablename in Observer

2013-01-28 Thread Ted Yu
start() method of which class ? If you use Eclipse, you can navigate through the classes and find out the answer - that was what I did :-) You can also place a breakpoint in the following method : public void prePut(final ObserverContextRegionCoprocessorEnvironment c,

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Stack
The below seems like a good suggestion by Vandana. I will say that focus is on support for hadoop 1 and 2. There has not been much call for us to support 0.23.x If you can figure what needs fixing, we could try adding the fix to 0.94 (In trunk a patch to add a compatibility module for

Re: RegionServer crashes silently under heavy RowKey scans

2013-01-28 Thread Stack
On Mon, Jan 28, 2013 at 12:14 PM, Jim Abramson j...@magnetic.com wrote: Hi, We are testing HBase for some read-heavy batch operations, and encountering frequent, silent RegionServer crashes. 'Silent' is interesting. Which files did you check? .log and the .out? Nothing in the latter?

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Viral Bajaria
Tried all of it, I think I will have to defer this to the hadoop mailing list because it seems there is a missing class in hadoop 0.23 branches, not sure if that is intentional. The class exists in trunk and hadoop 2.0 branches. Though the surprising part is that it does not exist in 0.23. Does

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Viral Bajaria
Just closing the loop here, it might help someone else to hand patch their build process before I get the patches in the hadoop branch, no changes required for hbase. I backported the latest version of KerberosUtil from hadoop 2.0 branch and recompiled hadoop-common/hadoop-auth and then installed

Re: hbase 0.94.4 with hadoop 0.23.5

2013-01-28 Thread Stack
On Mon, Jan 28, 2013 at 6:26 PM, Viral Bajaria viral.baja...@gmail.comwrote: Just closing the loop here, it might help someone else to hand patch their build process before I get the patches in the hadoop branch, no changes required for hbase. I backported the latest version of KerberosUtil

RE: Find the tablename in Observer

2013-01-28 Thread Anoop Sam John
Will the CoprocessorEnvironment reference in the start() method be instanceof RegionCoprocessorEnvironment too No. It will be reference of RegionEnvironment . This is not a public class so you wont be able to do the casting. As I read your need, you want to get the table name just once and

Re: Find the tablename in Observer

2013-01-28 Thread Gary Helmling
Will the CoprocessorEnvironment reference in the start() method be instanceof RegionCoprocessorEnvironment too No. It will be reference of RegionEnvironment . This is not a public class so you wont be able to do the casting. Since RegionEnvionment implements RegionCoprocessorEnvironment,

RE: Find the tablename in Observer

2013-01-28 Thread Anoop Sam John
Oh sorry... Not checked the interface... We were doing in postOpen()... Thaks Gary for correcting me...:) -Anoop- From: Gary Helmling [ghelml...@gmail.com] Sent: Tuesday, January 29, 2013 11:29 AM To: user@hbase.apache.org Subject: Re: Find the tablename

WARN CleanerChore - Error while cleaning the logs

2013-01-28 Thread Mesika, Asaf
Hi, Recently after upgrading to 0.94.3, my unit test which is using HBase mini cluster keeps throwing this warning. Why does it want to delete a table folder? Can someone elaborate on this exception? My test it self is setting up two tables, in which only one is used. The one in the errors is