dears,
I'm fresh to hbase. I just checkout hbase trunk rev-792389, and test its
performance by means of org.apache.hadoop.hbase.PerformanceEvaluation
(Detailed testing results are listed below). It's strange that the scan
speed is as slower as randomRead. I haven't change any configuration
paramet
Hi,
i'm new to hbase MapReduce and want to do following:
- create daily statistics with sql queries against a sql database
- store statistic results in hbase
- run daily MapReduce on that results to compute monthly statistics
I stored this data in hbase table 'route_conversion_statistics'.
My
Hi All,
I am testing 0.20.0-alpha, r785472 and am coming up with an issue I can't seem
to figure out. I am accessing hbase from php via thrift. The php script is
pulling data from our pgsql server and dumping it into hbase. Hbase is running
on a 6 node hadoop cluster (0.20.0-plus4681, r767961)
Even,
The scan probably warmed the cache here. Do the same experiment with a
fresh HBase for the scan and the random reads.
J-D
On Thu, Jul 9, 2009 at 5:14 AM, Qingyan(Evan) Liu wrote:
> dears,
>
> I'm fresh to hbase. I just checkout hbase trunk rev-792389, and test its
> performance by means of
Dear J-D,
Here's my another two tests. I changed the order of the tests. Before each
test, I restarted both hbase & hadoop. All are 50,000 rows with sizeof 1KB.
(1) randomWrite-randomRead-randomRead-scan-scan-randomRead
7117ms-15966ms-16678ms-10429ms-10730ms-15641ms
(2) randomWrite-scan-scan-ran
Not every test is created equal, different tests are testing different
things, and different environments/setups/configurations can yield
different results.
I posted the utility (HBench) I used to generate the statistics from
those slides up in a jira. You can grab it and try it out to see wh
First, I recommend upgrading to the latest HBase 0.19 release, 0.19.3.
You have a few choices, but in short you want to use filters.
http://hadoop.apache.org/hbase/docs/r0.19.3/api/org/apache/hadoop/hbase/filter/package-summary.html
Specifically, you should look at the RegExpRowFilter:
http://
Hi Again,
Since the tests mentioned below, I have finally figured out how to build
and run from the trunk. I have re-created my hbase install from svn,
configured it, updated my thrift client library, and my current import
has been through more than 5 region splits without failing.
Next step, wri
Of course, as luck should have it... I spoke too soon. I am still
suffering from that region split problem, but it doesn't seem to happen
on every region split.
I do know for sure that with the final split, the new daughter regions
were re-assigned to the original parent's server. It made it throu
My recommendation would be to not use thrift for bulk imports.
Travis Hegner wrote:
Of course, as luck should have it... I spoke too soon. I am still
suffering from that region split problem, but it doesn't seem to happen
on every region split.
I do know for sure that with the final split, the
I am not extremely java savvy quite yet... is there an alternative way
to access Hbase from PHP? I have read about the REST libraries, but
haven't tried them yet. Are they sufficient for bulk import? Or, is a
bulk import something that simply must be done from java, without
exception?
Thanks for t
Hello,
I'm using HBase(trying to) from another language (R) which has it's
own classloader (call it RCL)
RCL takes its own classpath (in which i've included all Hbase JARS and
conf folder) and
did the following
cfg <- .jnew("org/apache/hadoop/hbase/HBaseConfiguration") ## Create
a new Java objec
You need to include everything in lib/*.jar. this includes the hadoop jar...
The problem might be that rcl isn't registering itself so the recursive
class loaders aren't using rcl?
On Jul 9, 2009 2:15 PM, "Saptarshi Guha" wrote:
Hello,
I'm using HBase(trying to) from another language (R) which
It's not that it must be done from java, it's just that the other
interfaces add a great deal of overhead and also do not let you do the
same kind of batching that helps significantly with performance.
If you don't care about the time it takes, then you could stick with
thrift. Try to throttl
Hello,
Thanks for the tip. I have added all the jar files in HBASE_HOME and
HBASE_HOME/lib and HBASE_HOME/conf is in the classpath.
Going through the code of Configuration.java and
HBaseConfiguration.java, the latter is a simple subclass and
setClassLoader replaces the classloader with a user suppl
I am having an issue after deleting a row in the HBase 0.20 alpha. After
deleting the row using the Delete object, I cannot put a row back that uses
the same key as the deleted row. No exceptions occur in my code.
E.g.
HBaseConfiguration config = new HBaseConfiguration();
HTable table = new HTabl
The hbase code calls either classforname or uses the system implied
classloader when it refers to other classes. Maybe there is something there?
Its using the default java classloader (which doesn't have your classpath)
maybe?
On Jul 9, 2009 2:38 PM, "Saptarshi Guha" wrote:
Hello,
Thanks for the
Can you try with the latest trunk? Many bugs, including delete bugs, were
fixed.
On Jul 9, 2009 2:43 PM, "Bryan Keller" wrote:
I am having an issue after deleting a row in the HBase 0.20 alpha. After
deleting the row using the Delete object, I cannot put a row back that uses
the same key as the
I'm using DBInputFormat to upload data into HBase table. The query takes a
while to run, meaning the split is taking a while. I upped the timeout like
so:
mapred.task.timeout
180
This avoided my map tasks from being killed. I have 50 map tasks and 9
reduce tasks on a 5
The other 9 show status as "initializing" for a long time, as the percentage
of one task continues to increase.
llpind wrote:
>
> I'm using DBInputFormat to upload data into HBase table. The query takes
> a while to run, meaning the split is taking a while. I upped the timeout
> like so:
>
I can't get the trunk to run.
Looks like the way Zookeeper starts has change, and it tries to map my
DHCP-assigned IP address to the list of quorum servers in the
hbase-default.xml file
(which is only "localhost" by default) and complains it can't be
found. I tried adding my IP there but still can'
Hi Bryan,
For the latest trunk, are you using your own zoo.cfg, or overwriting the
options from hbase-default.xml?
-n
On Thu, Jul 9, 2009 at 4:20 PM, Bryan Keller wrote:
> I can't get the trunk to run.
> Looks like the way Zookeeper starts has change, and it tries to map my
> DHCP-assigned IP
I'm not changing anything, just using the default hbase-default.xml as-is
(which appears to be setup for standalone mode), and not creating a zoo.cfg.
The only thing I am changing is the JAVA_HOME in hbase-env.sh.
On Thu, Jul 9, 2009 at 4:36 PM, Nitay wrote:
> Hi Bryan,
>
> For the latest trunk,
Hi,
It looks like HTablePool is designed to have one instance of HTablePool per
table.
I am confused by the static map inside HTablePool class. If we can
instantiate one HTablePool per table, what's the use of the map?
Furthermore, the map is static and there is no way to add multiple tables to
i
Thank JG a lot!
I've just svn update and test the new codes, which
setScannerCaching(30). Performance of scan is now very high: 5460ms at
offset 0 for 1 rows.
So, the conclusion is clear, switch on the prefetch will greatly boost
the scan speed.
Thank you all kind guys.
sincerely,
Evan
On large map-reduce runs with small rows, I set scanner caching to
1000-3000 rows. This seemingly minor change allows me to reach 4.5m
row reads/sec (~ 40 bytes per row). Without that, single row fetch is
stupid slow.
I don't think we can set a reasonable value here for 2 reasons:
- for those wh
Hi all,
1. In this configuration property:
hbase.hstore.compactionThreshold
3
If more than this number of HStoreFiles in any one HStore
(one HStoreFile is written per flush of memcache) then a compaction
is run to rewrite all HStoreFiles files as one. Larger numbers
re: #2: in fact we don't know that... I know that I ran run 200-400
regions on a regionserver with a heap size of 4-5gb. More even. I
bet I could have 1000 regions open on 4gb ram. Each region is ~ 1mb
of all the time data, so there we go.
As for compactions, they are fairly fast, 0-30s or so d
Hi Ryan,
Thanks.
If your regionsize is about 250MB, than 400 regions can store 100GB data on
each regionserver.
Now, if you have 100TB data, then you need 1000 regionservers.
We are not google or yahoo who have so many nodes.
Schubert
On Fri, Jul 10, 2009 at 12:29 PM, Ryan Rawson wrote:
> re:
That size is not memory-resident, so the total data size is not an
issue. The index size is what limits you with RAM, and its about 1 MB
per region (256MB region).
-ryan
On Thu, Jul 9, 2009 at 9:51 PM, zsongbo wrote:
> Hi Ryan,
>
> Thanks.
>
> If your regionsize is about 250MB, than 400 regions
Something checked in yesterday (7/9/09) caused the startup problem for me. I
rolled back (svn update -r {20090709}), and HBase started up. The delete
problem I was having is fixed. So I'll use that rev (r792422) for now.
Thanks,
Bryan
On Thu, Jul 9, 2009 at 4:41 PM, Bryan Keller wrote:
31 matches
Mail list logo