Ran some tests and it seems that single-use Scanner requests are not that
bad after all. I guess the important part is to set row caching to 1 and
correctly close every scanner afterwards.
On Mon, Aug 27, 2012 at 4:33 PM, Ferdy Galema wrote:
> I want to do a lot of random reads, but I need
I want to do a lot of random reads, but I need to get the first row after
the requested key. I know I can make a scanner every time (with a specified
startrow) and close it after a single result is fetched, but this seems
like a lot overhead.
Something like HTable's getRowOrBefore method, but then
Hi,
Perhaps you want to take a look at MultipleInputs. I'm not sure if it works
for TableInputFormat, but at least you can use it for inspiration.
Ferdy.
On Mon, Aug 6, 2012 at 3:02 PM, Amlan Roy wrote:
> Hi,
>
> If TableMapper and TableMapReduceUtil.initTableMapperJob() does not support
> mul
s set to 1241. The remaining size would still be
> higher than zero and so would the countdown (its value would be 1). So
> it's gonna try to get the nextScanner. If you have just one region it
> would stop there.
>
> But that would be the case if you have 1 region and did not
I was experiencing aborted scans on certain conditions. In these cases I
was simply missing so many rows that only a fraction was inputted, without
warning. After lots of testing I was able to pinpoint and reproduce the
error when scanning over a single region, single column family, single
store fi
Some mapred jobs running scans on our HBase could not succeed because of
the dreaded LeaseException or ScannerTimeoutException, even with
hbase.client.scanner.caching set to 1 and long timeout properties. Mind you
that no row is ever bigger than 5MB (sure it's bigger then most use cases
but still i
Does HBASE-3584 also allow buffering of the mutations? (With the 0.90
branch it is only possible to buffer Put operations).
On Fri, Jul 6, 2012 at 1:50 AM, lars hofhansl wrote:
> I'll let the Cloudera folks speak, but I has assumed CDH4 would include
> HBase 0.94.
>
> -- Lars
>
>
>
> ___
Sure:
https://issues.apache.org/jira/browse/HBASE-5607
@Marcos: Thanks, that are useful links.
On Tue, Mar 20, 2012 at 6:27 PM, Stack wrote:
> 2012/3/20 Ferdy Galema :
> > A nice solution server-side would be to dynamically adjust the
> > scanner-caching value when the response
at
org.apache.hadoop.hbase.ipc.ByteBufferOutputStream.(ByteBufferOutputStream.java:44)
at
org.apache.hadoop.hbase.ipc.ByteBufferOutputStream.(ByteBufferOutputStream.java:37)
Ferdy.
2012/3/16 Ferdy Galema
> CPU resources never was a problem, munin shows there is enough idle time.
&
n my
> > experience it's not very likely.
> >
> > Setting swappiness to 0 just means it's not going to page anything out
> > until it really needs to do it, meaning it's possible to swap. The
> > only way to guarantee no swapping whatsoever is g
Correction 2300300300 is the current heap size for regionservers (I misread
another process.)
On Tue, Mar 6, 2012 at 11:37 AM, Ferdy Galema wrote:
> Thanks for the replies. That is a lot of useful information. I admit that
> I'm still running a bit behind when it comes to truly under
ess size.
> > This seems to occur most often in lightly loaded installations. It
> might
> > be interesting to have a look around your processes with pmap or some
> tool
> > like that. In many situations, you'll probably be able to account for
> each
> > Java heap
Hi,
I'm running regionservers with 2GB heap and following tuning options:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:NewRatio=16
-XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly
-XX:MaxGCPauseMillis=100
A regionserver aborted (YouAreDeadException) and this was printed in
As far as I can tell there is not yet a build-in mechanism you can use
for this. You could implement your own InputFormat, something like
MultiTableInputFormat. If you need different map functions for the two
tables, perhaps something similar to Hadoop's MultipleInputs should do
the trick.
On
You can use the merge tool to combine adjacent regions. It requires a
bit of manual work because you need to specify the regions by hand. The
cluster also needs to be offline (I recommend to keep zookeeper running
though). Check if merging succeeded with the hbck tool.
There are some jira issu
2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server at /
127.0.0.1:60020 could not be reached after 1 tries, giving up.
This means the regionserver could not be reached. Check the regionserver
logs to see why. Perhaps it failed to start? Is the HDFS fully functional?
Ferdy.
On 05
Not out of the box. I use the following resource for packaging lzo with
the cloudera release:
https://github.com/toddlipcon/hadoop-lzo-packager
On 05/18/2011 08:35 AM, Pete Haidinyak wrote:
Does the Cloudera VM have LZO data compression available? If not,
since its a 32 bit system what's the be
A rowcounter is a scan job, so you should use
hbase.client.scanner.caching for better scan performance. (Depending on
your value sizes, set to 1000 or something like that).
For us, 1 zookeeper is able to manage our 15node cluster perfectly fine.
On 05/11/2011 02:40 PM, byambajargal wrote:
Hel
With HBase 0.90.1 (CDH3B4), whenever a regionserver is shutdown because
of an error, I'll restart it using start-hbase.sh. After that the node
is up and running again, but the hbck tool still lists a dead server.
(Though it does list the correct amount of live servers.) Is this a bug?
Ferdy
Thanks. This seems very useful. Just to add, in terms of
compression/decompression speed, we're having very good performance with
the lzf codec. It is Apache licensed and pure Java code with no external
dependancies. See https://github.com/ning/compress/
Ferdy
On 04/05/2011 12:55 AM, Jean-Dan
iel Cryans wrote:
I feel like I'm not understanding your need correctly, could you
elicit what you think HBase you should be doing in order to give you a
better life?
Thx,
J-D
On Mon, Mar 21, 2011 at 5:22 PM, Ferdy Galema wrote:
These methods are certainly helpful, whenever I ever nee
min.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[][])
- use the bulk loader: http://hbase.apache.org/bulk-loads.html
J-D
On Fri, Mar 18, 2011 at 5:46 AM, Ferdy Galema wrote:
On second thought, removing the obsolete regionfolders was easily done by
hand. This way I can merge regions
On second thought, removing the obsolete regionfolders was easily done
by hand. This way I can merge regions with the merge tool.
However, I'm still bothered by the (performance) issues I ran into. Any
advice would be helpful.
On 03/18/2011 11:06 AM, Ferdy Galema wrote:
After export
After exporting a tabel of about 30M rows (each row has about 500
columns, totalling 400GB of data), there were several issues when trying
to import it again on an empty HBase. (HBase version is 0.90.1-CDH3B4,
deployed on 15 nodes. LZO is enabled.)
The reason for this export/import is to both
=LZO+Compression
St.Ack
On Wed, Mar 16, 2011 at 10:28 AM, Ferdy Galema wrote:
We upgraded to Hadoop 0.20.1 and Hbase 0.90.1 (both CDH3B4). We are using
64bit machines.
Starting goes great, only right after the first compaction we get this
error:
Uncaught exception in service thread regionserver6
We upgraded to Hadoop 0.20.1 and Hbase 0.90.1 (both CDH3B4). We are
using 64bit machines.
Starting goes great, only right after the first compaction we get this
error:
Uncaught exception in service thread regionserver60020.compactor
java.lang.AbstractMethodError:
com.hadoop.compression.lzo.Lz
Using Hbase 0.20.5 with Hadoop CDH2 0.20.1+169.89 I noticed something
very strange.
When overwriting a certain column in a column family with 1 VERSIONS,
and removing that value later (for example after several minutes) the
older value still shows when listing all the KeyValues of the row.
Al
27 matches
Mail list logo