Hi All,
I am new to NoSQL world, I need help/suggestion to design Hbase schema for
the below requirement,
It is a report generation application using hadoop. Now I want to store a
particular user's report history in Hbase. The user's email id will be used
to track all his previous ran report
Dear Jean-Daniel,
The issue is solved. I think the book in the HBase the Definitive Guide
does not give sufficient descriptions about the pseudo-distributed mode.
Thanks so much!
Bing
On Tue, Feb 14, 2012 at 7:27 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:
Is zookeeper running properly?
hi,
Well no, i can't figure out what is the problem, but i saw that someone
else had the same problem (see email: LeaseException despite high
hbase.regionserver.lease.period)
What can i tell is the following:
Last week the problem was consistent
1. I updated hbase.regionserver.lease.period=30
Hi,
U can set the max versions for that table as INTEGER.MAX , so that the
records are identified uniquely by means of timestamp ( milliseconds ) in
which they are inserted . In hbase each and every cell in the table is
indexed so if u have more number of columns , u can store them as a
Hi All,
I'm doing an investigation in performance and scalability improvements for
one of solutions. I'm currently in a phase where I try to understand if
HBase (+MapReduce) could provide the scalability needed.
This is the current situation:
- assume daily inflow of 10 GB of data (20+ milion
Hi,
Lars blog (1) mentions that data locality for the region servers is lost
when HBase cluster is restarted. It's also mentioned at the end that work
is going in HBase to assign regions to RS taking data locality into
consideration. The blog entry is 18 months old and so I would like to know
if
Hi,
On Tue, Feb 14, 2012 at 7:13 AM, Praveen Sripati
praveensrip...@gmail.com wrote:
Lars blog (1) mentions that data locality for the region servers is lost
when HBase cluster is restarted. It's also mentioned at the end that work
is going in HBase to assign regions to RS taking data locality
AFAIK it is possible, just make sure regionservers can see hadoop jar
(which is true by default). Actually, you can call anything from these
methods ;)
On Tue, Feb 14, 2012 at 9:15 AM, NNever nnever...@gmail.com wrote:
As we know in HBase coprocessor methods such as prePut, we can operate
Region allocation is kept in the next restart (
https://issues.apache.org/jira/browse/HBASE-2896 ). This is also present in
the CDH3 code.
Nevertheless if you have a server that did not start correctly you will
have region that will move from it and locality will not remain (even after
you start
Thank you Doug..
Onemore question is, If a particular region is found by looking at the
range handeled by it, How is search performed within that region to find
requested rowKey? Is it by linear search or binary search or any other
algorithm? Or for every row in that region, is there any hash
Why don't you prefix the columns with an execution date (reverse order so
the last execution is the first one?)
that is:
email id (row key) - (columns) appName:reportName,
appName:executionDate_startDate, appName:executionDate_endDate, appName:
executionDate_status
So all execution for a specific
Thanks Sanel.
I try to use
*FileSystem fs = FileSystem.get(HBaseConfiguration.create());*
*fs.delete(new Path(...))*
in corpocessor's preDelete method.
There is no exception, but the target-path file has not deleted after those
code also.
I don't know why...
It's late night here now. I'll try
Hi,
sorry for a very late reply on this topic, but i was busy and now i
promised to report back.
I implemented your suggested hack :) It is actually only few lines of
code. One for getting the machines hostname and one for retrieving the
destination of the get request. Then i set up two counters,
Hi all,
I've been trying to run a battery of tests to really understand our cluster's
performance, and I'm employing PerformanceEvaluation to do that (picking up
where Tim Robertson left off, elsewhere on the list). I'm seeing two strange
things that I hope someone can help with:
1) With a
Keys are stored in sorted order, it's basically a binary search.
On 2/14/12 9:31 AM, Vamshi Krishna vamshi2...@gmail.com wrote:
Thank you Doug..
Onemore question is, If a particular region is found by looking at the
range handeled by it, How is search performed within that region to find
On Tue, Feb 14, 2012 at 6:35 AM, NNever nnever...@gmail.com wrote:
Thanks Sanel.
I try to use
*FileSystem fs = FileSystem.get(HBaseConfiguration.create());*
*fs.delete(new Path(...))*
in corpocessor's preDelete method.
There is no exception, but the target-path file has not deleted after
I say basically because inside a Region there are Stores, and for each
Store there are StoreFiles. For more info see:
http://hbase.apache.org/book.html#regions.arch
On 2/14/12 11:06 AM, Doug Meil doug.m...@explorysmedical.com wrote:
Keys are stored in sorted order, it's basically a
On Tue, Feb 14, 2012 at 7:56 AM, Oliver Meyn (GBIF) om...@gbif.org wrote:
1) With a command line like 'hbase
org.apache.hadoop.hbase.PerformanceEvaluation randomWrite 10' I see 100
mappers spawned, rather than the expected 10. I expect 10 because that's
what the usage text implies, and
Hi all, here's a (not-so) hypothetical question... How does a given column
family name or a qualifier impact storage? Would a long family or qualifer
like this:
my-descriptive-but-long-column-family-name:my-descriptive-but-long-qualifier
--vs. a short column family and qualifier:--
We are assuming the longer cf/qual would be written to HDFS billions of
times and would be wasteful. Is that a correct assumption?
Yes, also that's covered a bit in: http://hbase.apache.org/book.html#keysize
Does the answer change if you use Snappy compression?
Any compression will make it
Hi there,
I am pretty new to HBase and i am trying to understand the best
practice to do the scan based on two/multiple partial scans for the
row key.
For example, I have a row key like: orderId-timeStamp-item. The
orderId has nothing to with the timeStamp and i have a requirement to
scan rows
James,
Are your orderIds ordered? You say a range of orderIds, which implies that
(i.e. they're sequential numbers like 001, 002, etc, not hashes or random
values). If so, then a single scan can hit the rows for multiple contiguous
orderIds (you'd set the start and stop rows based on a prefix
Hey Christopher,
Thanks for reporting back. One thing about this is unless you have
contention at your top of the rack switches, issuing a get on the
local node or a remote one shouldn't be very different. What is going
to make a big difference is if you have to hit disk or not.
J-D
On Tue, Feb
Also see here...
http://hbase.apache.org/book.html#keyvalue
Compression will make it better on disk, but it will inflate over the wire.
On 2/14/12 12:40 PM, Jean-Daniel Cryans jdcry...@apache.org wrote:
We are assuming the longer cf/qual would be written to HDFS billions of
times and
On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk mikael.sit...@gmail.com wrote:
hi,
Well no, i can't figure out what is the problem, but i saw that someone
else had the same problem (see email: LeaseException despite high
hbase.regionserver.lease.period)
What can i tell is the following:
Last
And what would be missing? It's all open source so this is the moment
where you can forever leave a trace in HBase :)
J-D
On Tue, Feb 14, 2012 at 12:35 AM, Bing Li lbl...@gmail.com wrote:
Dear Jean-Daniel,
The issue is solved. I think the book in the HBase the Definitive Guide
does not give
To stress what JD just said, the HBase book/Ref Guide (i.e., the online
book that is a part of HBase) is open source and the best source of the
material (especially the Troubleshooting chapter) is user experience.
Minor clarification: HBase the Definitive Guide is a great book by
O'Reilly, but
Please see answer inline
Thanks
Mikael.S
On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:
On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk mikael.sit...@gmail.com
wrote:
hi,
Well no, i can't figure out what is the problem, but i saw that someone
else had the
Thanks Todd!
I check disk bandwidth by first running hparm on it, (this shows me a
read b/w of around 56Mbps)
and then running iftop while the benchmarks run (This shows me that reads
are only around 10-15Mbps: but
this could definitely be because random seeks are a bottleneck)
The iostat output
Thank you Ian! Yes, the orderIds are ordered.
I might try timeStamp filter. But it still doesn't provide the early
out feature. not sure how the performance it could be. Do you think it
might be worth having a custom filter to do two partial scans?
Thanks again.
James
On Wed, Feb 15, 2012 at
It works. Thanks Stack and Sanel~
2012/2/15 Stack st...@duboce.net
On Tue, Feb 14, 2012 at 6:35 AM, NNever nnever...@gmail.com wrote:
Thanks Sanel.
I try to use
*FileSystem fs = FileSystem.get(HBaseConfiguration.create());*
*fs.delete(new Path(...))*
in corpocessor's preDelete
Yep, definitely bound on seeks - see the 100% util, and the r/s 100.
The bandwidth provided by random IO from a disk is going to be much
smaller than the sequential IO you see from hdparm
-Todd
On Tue, Feb 14, 2012 at 3:06 PM, Bharath Ravi bharathra...@gmail.com wrote:
Thanks Todd!
I check
On Tue, Feb 14, 2012 at 8:14 AM, Stack st...@duboce.net wrote:
2) With that same randomWrite command line above, I would expect a resulting
table with 10 * (1024 * 1024) rows (so 10485700 = roughly 10M rows).
Instead what I'm seeing is that the randomWrite job reports writing that
many
Hi St.Ack,
i don't wanna be a pain in the back, but any progress on this?
Cheers,
Ulrich
On Tue, Feb 7, 2012 at 8:40 PM, Stack st...@duboce.net wrote:
This is my fault. I'm working on it. Will update list when done.
Sorry its taking me so long.
St.Ack
On Tue, Feb 7, 2012 at 9:11 AM,
34 matches
Mail list logo