Hi Alex,

We once had a class called Historian that would log the splits etc.  It was 
abandoned as it was not implemented right. Currently we only have those logs 
and I am personally agreeing with you that this is not good for the average (or 
even advanced) user. In the future we may be able to use Coprocessors to handle 
region split info being written somewhere, but for now there is a void.

I 100% agree with you that this info should be in a metric as it is vital 
information of what the system is doing and all the core devs are used to read 
the logs directly, which I think is ok for them but madness otherwise. Please 
have look if there is a JIRA to this event and if not create a new one so that 
we can improve this for everyone. I am strongly for these sort of additions and 
will help you getting this committed if you could provide a patch. 

Lars

On Nov 22, 2010, at 9:38, Alex Baranau <alex.barano...@gmail.com> wrote:

> Hello Lars,
> 
>> But ideally you would do a post
>> mortem on the master and slave logs for Hadoop and HBase, since that
>> would give you a better insight of the events. For example, when did
>> the system start to flush, when did it start compacting, when did the
>> HDFS start to go slow?
> 
> I wonder, does it makes sense to expose these events somehow to the HBase
> web interface in an easy accessible way: e.g. list of time points of splits
> (for each regionserver), compaction/flush events (or rate), etc. Looks like
> this info is valuable for many users and most importantly I believe can
> affect their configuration-related decisions, so showing the data on web
> interface instead of making users dig into logs makes sense and brings HBase
> towards "easy-to-install&use" a bit. Thoughts?
> 
> btw, I found that in JMX we expose only flush and compaction related data.
> Nothing related to splits. Could you give a hint why? Also we have only time
> and size being exposed, probably count (number of actions) would be good to
> expose too: thus one can see flush/compaction/split(?) rate and make
> judgement on whether some configuration is properly set (e.g.
> hbase.hregion.memstore.flush.size).
> 
> Thanks,
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
> 
> On Fri, Nov 19, 2010 at 5:16 PM, Lars George <lars.geo...@gmail.com> wrote:
> 
>> Hi Henning,
>> 
>> And you what you have seen is often difficult to explain. What I
>> listed are the obvious contenders. But ideally you would do a post
>> mortem on the master and slave logs for Hadoop and HBase, since that
>> would give you a better insight of the events. For example, when did
>> the system start to flush, when did it start compacting, when did the
>> HDFS start to go slow? And so on. One thing that I would highly
>> recommend (if you haven't done so already) is getting graphing going.
>> Use the build in Ganglia support and you may be able to at least
>> determine the overall load on the system and various metrics of Hadoop
>> and HBase.
>> 
>> Did you use the normal Puts or did you set it to cache Puts and write
>> them in bulk? See HTable.setWriteBufferSize() and
>> HTable.setAutoFlush() for details (but please note that you then do
>> need to call HTable.flushCommits() in your close() method of the
>> mapper class). That will help a lot speeding up writing data.
>> 
>> Lars
>> 
>> On Fri, Nov 19, 2010 at 3:43 PM, Henning Blohm <henning.bl...@zfabrik.de>
>> wrote:
>>> Hi Lars,
>>> 
>>> thanks. Yes, this is just the first test setup. Eventually the data
>>> load will be significantly higher.
>>> 
>>> At the moment (looking at the master after the run) the number of
>>> regions is well-distributed (684,685,685 regions). The overall
>>> HDFS use is  ~700G. (replication factor is 3 btw).
>>> 
>>> I will want to upgrade as soon as that makes sense. It seems there is
>>> "release" after 0.20.6 - that's why we are still with that one.
>>> 
>>> When I do that run again, I will check the master UI and see how things
>>> develop there. As for the current run: I do not expect
>>> to get stable numbers early in the run. What looked suspicous was that
>>> things got gradually worse until well into 30 hours after
>>> the start of the run and then even got better. An unexpected load
>>> behavior for me (would have expected early changes but then
>>> some stable behavior up to the end).
>>> 
>>> Thanks,
>>> Henning
>>> 
>>> Am Freitag, den 19.11.2010, 15:21 +0100 schrieb Lars George:
>>> 
>>>> Hi Henning,
>>>> 
>>>> Could you look at the Master UI while doing the import? The issue with
>>>> a cold bulk import is that you are hitting one region server
>>>> initially, and while it is filling up its in-memory structures all is
>>>> nice and dandy. Then ou start to tax the server as it has to flush
>>>> data out and it becomes slower responding to the mappers still
>>>> hammering it. Only after a while the regions become large enough so
>>>> that they get split and load starts to spread across 2 machines, then
>>>> 3. Eventually you have enough regions to handle your data and you will
>>>> see an average of the performance you could expect from a loaded
>>>> cluster. For that reason we have added a bulk loading feature that
>>>> helps building the region files externally and then swap them in.
>>>> 
>>>> When you check the UI you can actually see this behavior as the
>>>> operations-per-second (ops) are bound to one server initially. Well,
>>>> could be two as one of them has to also serve META. If that is the
>>>> same machine then you are penalized twice.
>>>> 
>>>> In addition you start to run into minor compaction load while HBase
>>>> tries to do housekeeping during your load.
>>>> 
>>>> With 0.89 you could pre-split the regions into what you see eventually
>>>> when your job is complete. Please use the UI to check and let us know
>>>> how many regions you end up with in total (out of interest mainly). If
>>>> you had that done before the import then the load is split right from
>>>> the start.
>>>> 
>>>> In general 0.89 is much better performance wise when it comes to bulk
>>>> loads so you may want to try it out as well. The 0.90RC is up so a
>>>> release is imminent and saves you from having to upgrade soon. Also,
>>>> 0.90 is the first with Hadoop's append fix, so that you do not lose
>>>> any data from wonky server behavior.
>>>> 
>>>> And to wrap this up, 3 data nodes is not too great. If you ask anyone
>>>> with a serious production type setup you will see 10 machines and
>>>> more, I'd say 20-30 machines and up. Some would say "Use MySQL for
>>>> this little data" but that is not fair given that we do not know what
>>>> your targets are. Bottom line is, you will see issues (like slowness)
>>>> with 3 nodes that 8 or 10 nodes will never show.
>>>> 
>>>> HTH,
>>>> Lars
>>>> 
>>>> 
>>>> On Fri, Nov 19, 2010 at 2:09 PM, Henning Blohm <
>> henning.bl...@zfabrik.de> wrote:
>>>>> We have a Hadoop 0.20.2 + Hbase 0.20.6 setup with three data nodes
>>>>> (12GB, 1.5TB each) and one master node (24GB, 1.5TB). We store a
>>>>> relatively simple
>>>>> table in HBase (1 column familiy, 5 columns, rowkey about 100chars).
>>>>> 
>>>>> In order to better understand the load behavior, I wanted to put
>> 5*10^8
>>>>> rows into that table. I wrote an M/R job that uses a Split Input
>> Format
>>>>> to split the
>>>>> 5*10^8 logical row keys (essentially just counting from 0 to 5*10^8-1)
>>>>> into 1000 chunks of 500000 keys and then let the map do the actual job
>>>>> of writing the corresponding rows (with some random column values)
>> into
>>>>> hbase.
>>>>> 
>>>>> So there are 1000 map tasks, no reducer. Each task writes 500000 rows
>>>>> into Hbase. We have 6 mapper slots, i.e. 24 mappers running parallel.
>>>>> 
>>>>> The whole job runs for approx. 48 hours. Initially the map tasks need
>>>>> around 30 min. each. After a while things take longer and longer,
>>>>> eventually
>>>>> reaching > 2h. It tops around the 850s task after which things speed
>> up
>>>>> again improving to about 48min. in the end, until completed.
>>>>> 
>>>>> It's all dedicated machines and there is nothing else running. The map
>>>>> tasks have 200m heap and when checking with vmstat in between I cannot
>>>>> observe swapping.
>>>>> 
>>>>> Also, on the master it seems that heap utilization is not at the limit
>>>>> and no swapping either. All Hadoop and Hbase processes have
>>>>> 1G heap.
>>>>> 
>>>>> Any idea what would cause the strong variation (or degradation) of
>> write
>>>>> performance?
>>>>> Is there a way of finding out where time gets lost?
>>>>> 
>>>>> Thanks,
>>>>> Henning
>>>>> 
>>>>> 
>>> 
>> 

Reply via email to