Just to follow up here, I did manage to test a patch on
TableInputFormatBase.java and it resolved my issue.
I filed https://issues.apache.org/jira/browse/HBASE-7693 and will attach
the patch as soon as my Git updates.
On Mon, Dec 17, 2012 at 8:45 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:
Could HCatalog be an option ?
Le 26 janv. 2013 21:56, Jack Levin magn...@gmail.com a écrit :
AFAIK, namenode would not like tracking 20 billion small files :)
-jack
On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed sahmed1...@gmail.com wrote:
That's pretty amazing.
What I am confused is, why
Hi
In my 0.90.6 hbase environment , each time when I try new htable, it was
slow. Put also slow too. Can I turn on some kind of trace to know the exact
time spent in each detail function call? Thanks.
Hi there,
Do you have too many Htable instances opened simultaneously?It's not
advisable to do so.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Jan 28, 2013 at 5:00 PM, Lsshiu lss...@gmail.com wrote:
Hi
In my 0.90.6 hbase environment , each time when I
Hello list,
I would like to have some suggestions on Hbase data indexing. What
would you prefer? I never faced such requirement till now. This is the
first time when there is a need of indexing, so thought of getting some
expert comments and suggestions.
Thank you so much for your
We are pre-splitting our tables before bulk loading also but we don't use
the RegionSplitter.
We split manually (we did some testing and found the optimal split points)
by putting into .META table a new HRegionInfo, assigning that region
(HBaseAdmin.assign(region name)) and after you finish
When you say indexing, are you referring to indexing the column qualifiers
or the values that you are storing in the qualifier ?
Regarding indexing, I remember someone had recommended this on the mailing
list before: https://github.com/ykulbak/ihbase/wiki but it seems the
development on that is
Hello Viral,
Thank you so much for the quick response. Intention is to index the
values. I'll have a look at ihbase.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Jan 28, 2013 at 5:22 PM, Viral Bajaria viral.baja...@gmail.comwrote:
When you say indexing,
Hi Tariq,
Thanks for the tip, but I only opened a few htable instances. The total
region numbers were quite many (more than 7 ) though.
Hi there,
Do you have too many Htable instances opened simultaneously?It's not
advisable to do so.
Warm Regards,
Tariq
As a POC, just try to load the data into another table that has the rowkey
that has the original row's value.
Try to scan the index table first and then get the main table row key.
First this should help, later can make this more better by using
coprocessors.
Regards
Ram
On Mon, Jan 28, 2013 at
Use HTbalePool instead and see if it gives you better performance.
Creating an HTable instance is a fairly expensive operation that takes a
few seconds to complete.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Jan 28, 2013 at 5:35 PM, Lsshiu lss...@gmail.com
Thank you for the valuable reply sir. Actually I tried that and it works
fine. But we need faster results. I was thinking of creating an index and
have it loaded in the memory, at all times. so that fetches are faster. Is
there any OOTB feature available in co-proc?
Warm Regards,
Tariq
Hi Tariq,
Thanks for the update, I'll try it. Btw , will put operation makes any
performance difference using htable or htablepool ?
Use HTbalePool instead and see if it gives you better performance.
Creating an HTable instance is a fairly expensive operation that takes a
few seconds to
Hi Mohammad,
I don't really see how you can get faster results than indexing the
content as the row key in another table. Access is direct after that.
What do you mean with faster resuls? To build the index? Or to read
through it?
JM
2013/1/28, Mohammad Tariq donta...@gmail.com:
Thank you for
Hi Kyle,
If you are not running a production cluster, you might think about
getting the last 0.94.4 source code, apply HBASE-7654 and deploy it.
That way you can use getCoprocessors which will send you the list you
the list you are looking for...
JM
2013/1/28, Kyle Lin kylelin2...@gmail.com:
It does. These excerpts from Hbase Definitive Guide might help you in a
better fashion :
HTablePool:
Instead of creating an HTable instance for every request from your client
application, it
makes much more sense to create one initially and subsequently reuse them.
The primary reason for doing so
Thanks J-D.
I found it with JConsole in hadoop/HBase/RegionServerStatistics/Attributes.
JM
2013/1/27, Jean-Daniel Cryans jdcry...@apache.org:
It's in the region server metrics and also published through JMX.
J-D
On Sun, Jan 27, 2013 at 2:55 PM, Jean-Marc Spaggiari
jean-m...@spaggiari.org
Hi Traiq,
I'll try that and update the result , thanks.
It does. These excerpts from Hbase Definitive Guide might help you in a
better fashion :
HTablePool:
Instead of creating an HTable instance for every request from your client
application, it
makes much more sense to create one
Hi ,
I have such row data structure:
event_id | time
=
event1 | 10:07
event2 | 10:10
event3 | 10:12
event4 | 10:20
event5 | 10:23
event6 | 10:25
Numbers of records is 50-100 million.
Question:
I need to find group of events starting form eventX and enters to the time
window
JM,
You would control those through the hadoop-env.sh using JOBTRACKER_OPTS,
TASKTRACKER_OPTS and then setting xmx for the desired heap.
On Sun, Jan 27, 2013 at 11:33 AM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
From the UI:
15790 files and directories, 11292 blocks = 27082
Hello Jean,
Actually it's to read the values faster. The problem goes like
this :
I have a table that has just 2 columns : 1- Stores some clause.
2- Stores all
possible aliases for the original clause.
These clauses are again
You can use another table as a index, using a rowkey like
'{time}:{event_id}', and then scan in the range [10:07, 10:15).
On Mon, Jan 28, 2013 at 10:06 AM, Oleg Ruchovets oruchov...@gmail.comwrote:
Hi ,
I have such row data structure:
event_id | time
=
event1 | 10:07
event2 |
Hi viral,
Try adding -Psecurity and then compiling.
Thanks
Vandana
Sent from my iPhone
On Jan 28, 2013, at 3:05 AM, Viral Bajaria viral.baja...@gmail.com wrote:
Hi,
Is anyone running hbase 0.94.4 against hadoop 0.23.5 ? If yes, how did you
end up compiling hbase for hadoop 0.23 ?
I
Hi Rodrigo.
Can you please explain in more details your solution.You said that I will
have another table. How many table will I have? Will I have 2 tables? What
will be the schema of the tables?
I try to explain what I try to achive:
I have ~50 million records like {time|event}. I want to
Tough one in that if your events are keyed on time alone, you will hit a hot
spot on write. Reads,not so much...
TSDB would be a good start ...
You may not need 'buckets' but just a time stamp and set up a start and stop
key values.
Sent from a remote device. Please excuse any typos...
Mike
Yes ,
I agree that using only timestamp it will cause hotspot. I can create
prespliting for regions.
I saw TSDB video and presentation and their data model. I think this is not
suitable for my case.
I looked thru google alot and for my surprise there is any post about such
clasic problem.
In the approach that i mentioned, you would need a table to retrieve the
time of a certain event(if this information can retrieve in another way,
you may ignore this table). It would be like you posted:
event_id | time
=
event1 | 10:07
event2 | 10:10
event3 | 10:12
event4 | 10:20
And
Hi James,
How did the nodes crash? I am asking because it would be good to know
where it hurts. As to your 6500 regions per region server, that is an
order of magnitude high than we like to see. With that many regions you
are going to run into a few issues:
1.) Small flushes due to memstore
Yes.
This is very interesting approach.
Is it possible to read from main key and scan from another using
map/reduce? I don't want to read from single client. I use hbase version
0.94.2.21.
Thanks
Oleg.
On Mon, Jan 28, 2013 at 6:27 PM, Rodrigo Ribeiro
rodrigui...@jusbrasil.com.br
first of all clear hdfs folder which you created while installing hadoop... its
the same folder which contains zoopkeeper files.
and then delete the file slf4j-log4j12-1.5.8.jar
If you dont want to delete then simple move it outside the hbase folder to some
other place.
/home/ahuja/hbase/lib
Yes, it's possible,
Check this solution:
http://stackoverflow.com/questions/11353911/extending-hadoops-tableinputformat-to-scan-with-a-prefix-used-for-distribution
On Mon, Jan 28, 2013 at 2:07 PM, Oleg Ruchovets oruchov...@gmail.comwrote:
Yes.
This is very interesting approach.
Is it
I think I didn't explain correct.
I want to read from 2 table in context of 1 mapreduce job.
I mean I want to read one key from main table and scan range from another
in the same mapreduce job.I only found MultiTableOutputFormat and there is
no MultiTableInputFormat. Is there any workaround to
I've never tried it, HBASE worked out nicely for this task, caching
and all is a bonus for files.
-jack
On Mon, Jan 28, 2013 at 2:01 AM, Adrien Mogenet
adrien.moge...@gmail.com wrote:
Could HCatalog be an option ?
Le 26 janv. 2013 21:56, Jack Levin magn...@gmail.com a écrit :
AFAIK, namenode
If I were to design a large object store on HBase, I would do the
following: Under a threshold, store the object data into HBase. Over the
threshold, store metadata for the object only into HBase and the object
data itself into a file in HDFS. The threshold could be a fixed byte size
like 100 MB,
IIRC when a file closes it will evict its own blocks since they won't
be used after that.
J-D
On Sun, Jan 27, 2013 at 1:04 AM, Varun Sharma va...@pinterest.com wrote:
Since i am using only 10 % of allocated cache, I think EvictionThread never
ran - hence, I see the value 0. What's mysterious
Thanks Vandana for reply. I tried that but no luck. It still throws the
same error. I thought there might have been a typo and you meant -D and not
-P but none of them worked.
I verified that the hadoop-auth code base does not have KerberosUtil class
anymore. So I am guessing there is some, but I
IPv6 can support up to 281,474,976,710,656 networks. Assuming you only want
to group by networks, that is already a potentially very large keyspace.
The *minimum* number of distinct addresses a V6 network can contain (the
smallest advertisable prefix is /48) is 1,208,925,819,614,629,174,706,176.
Hi Oleg,
the mapreduce job will allow you to scan the first table per row, but
in you job you will have to use the client API to access the 2nd
table.
You can use the setup and cleanup method to create or open the 2nd
table you want to access.
JM
2013/1/28, Oleg Ruchovets oruchov...@gmail.com:
Hi jack,
thank you for sharing!
Hello Andrew,
You mentioned an interesting topic, which is cache. My question is why I
need cache between HBase and HDFS if I have cache configured between HBase
and its caller application?
Let's say I have an web application which use HBase as data source at
You bring up a very common consideration I think.
For static content, such as images, then a cache can help offload read load
from the datastore. This fits into this conversation.
For dynamic content, then an external caching may not be helpful as you
say, although blockcache within HBase will
Hi Andy,
Thanks a lot for sharing. Yes. I am not talking about static content
caching, which may be called as internal CDN today.
I am asking some techniques of configuring cache on different layers with
concerning about avoiding duplicate caching on different layers.
thanks and regards,
Yiyu
In that case, then hypothetically speaking, you could disable HBase
blockcache on the table containing static content and rely on an external
reverse proxy tier, and enable HBase blockcache on the tables that you are
using as part of generation of dynamic content.
On Mon, Jan 28, 2013 at 1:44
I would go on using the row-key, on one table.
= Row Key Structure =
group-depthA groupB groupC groupD group
group-depth: 1..4, encoded as 1 byte
A-D group; encoded as 1 byte and not as string
Examples:
1192
2192168
31921681
4192168110
Column Qualifier: c - stands for counters
Column
Hi
I'm running high in region number, can someone tell me what's the max
storefile size in CDH3u4, thanks.
What are you currently using? Also, what is your current region per node
count?
On Jan 28, 2013 6:50 PM, Lashing lss...@gmail.com wrote:
Hi
I'm running high in region number, can someone tell me what's the max
storefile size in CDH3u4, thanks.
4GB
On Mon, Jan 28, 2013 at 6:49 PM, Lashing lss...@gmail.com wrote:
Hi
I'm running high in region number, can someone tell me what's the max
storefile size in CDH3u4, thanks.
Thanks.
So I have to upgrade to CDH4 in order to reduce the region number ?
Bryan Beaudreault bbeaudrea...@hubspot.com
4GB
On Mon, Jan 28, 2013 at 6:49 PM, Lashing lss...@gmail.com wrote:
Hi
I'm running high in region number, can someone tell me what's the max
storefile size in
3gb
More than one thousand.
Kevin O'dell kevin.od...@cloudera.com
What are you currently using? Also, what is your current region per node
count?
On Jan 28, 2013 6:50 PM, Lashing lss...@gmail.com wrote:
Hi
I'm running high in region number, can someone tell me what's the max
Lsshiu,
That is quite high. Also, you are right on the cusp of the recommended
region size for HBase .90. If you can make the upgrade I would recommend
upgrading to CDH4(.92) so that you can take advantage of HFilev2 and use 10
- 20GB region sizes. If not, you can go between 4 - 10GB on .90,
Hi,
We are testing HBase for some read-heavy batch operations, and encountering
frequent, silent RegionServer crashes. The application does many thousands of
very selective row scans on a dataset containing several hundred million rows
(less than 200GB overall), via thrift.
We have
void prePut(final ObserverContextRegionCoprocessorEnvironment c,
final Put put, final WALEdit edit, final boolean writeToWAL)
((RegionCoprocessorEnvironment)c.getEnvironment()).getRegion().getRegionInfo().getTableName()
Cheers
On Mon, Jan 28, 2013 at 4:56 PM, Rajgopal Vaithiyanathan
May be thats the issue. Try downloading the source from 0.94 branch and use the
maven command with -Psecurity and -Dhadoop.profile=23.
That should work.
Thanks
Vandana
On Jan 28, 2013, at 11:48 AM, Viral Bajaria wrote:
Thanks Vandana for reply. I tried that but no luck. It still throws the
Great. Thanks..
is there anyway that I can get it before prePut() ??
Like from the constructor or from the start() method too ? i followed the
code of CoprocessorEnvironment and didn't seem to get anything out of it.
On Mon, Jan 28, 2013 at 5:09 PM, Ted Yu yuzhih...@gmail.com wrote:
void
I tried compiling tip of 0.94 with (and without) -Psecurity.
In both cases I got:
[ERROR]
/Users/tyu/94-hbase/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java:[41,53]
cannot find symbol
[ERROR] symbol : class KerberosUtil
[ERROR] location: package
start() method of which class ?
If you use Eclipse, you can navigate through the classes and find out the
answer - that was what I did :-)
You can also place a breakpoint in the following method :
public void prePut(final ObserverContextRegionCoprocessorEnvironment
c,
The below seems like a good suggestion by Vandana.
I will say that focus is on support for hadoop 1 and 2. There has not
been much call for us to support 0.23.x If you can figure what needs
fixing, we could try adding the fix to 0.94 (In trunk a patch to add a
compatibility module for
On Mon, Jan 28, 2013 at 12:14 PM, Jim Abramson j...@magnetic.com wrote:
Hi,
We are testing HBase for some read-heavy batch operations, and
encountering frequent, silent RegionServer crashes.
'Silent' is interesting. Which files did you check? .log and the .out?
Nothing in the latter?
Tried all of it, I think I will have to defer this to the hadoop mailing
list because it seems there is a missing class in hadoop 0.23 branches, not
sure if that is intentional. The class exists in trunk and hadoop 2.0
branches. Though the surprising part is that it does not exist in 0.23.
Does
Just closing the loop here, it might help someone else to hand patch their
build process before I get the patches in the hadoop branch, no changes
required for hbase.
I backported the latest version of KerberosUtil from hadoop 2.0 branch and
recompiled hadoop-common/hadoop-auth and then installed
On Mon, Jan 28, 2013 at 6:26 PM, Viral Bajaria viral.baja...@gmail.comwrote:
Just closing the loop here, it might help someone else to hand patch their
build process before I get the patches in the hadoop branch, no changes
required for hbase.
I backported the latest version of KerberosUtil
Will the CoprocessorEnvironment reference in the start() method be
instanceof RegionCoprocessorEnvironment too
No. It will be reference of RegionEnvironment . This is not a public class so
you wont be able to do the casting.
As I read your need, you want to get the table name just once and
Will the CoprocessorEnvironment reference in the start() method be
instanceof RegionCoprocessorEnvironment too
No. It will be reference of RegionEnvironment . This is not a public class
so you wont be able to do the casting.
Since RegionEnvionment implements RegionCoprocessorEnvironment,
Oh sorry...
Not checked the interface... We were doing in postOpen()...
Thaks Gary for correcting me...:)
-Anoop-
From: Gary Helmling [ghelml...@gmail.com]
Sent: Tuesday, January 29, 2013 11:29 AM
To: user@hbase.apache.org
Subject: Re: Find the tablename
Hi,
Recently after upgrading to 0.94.3, my unit test which is using HBase mini
cluster keeps throwing this warning.
Why does it want to delete a table folder?
Can someone elaborate on this exception?
My test it self is setting up two tables, in which only one is used. The one in
the errors is
64 matches
Mail list logo