Re: map task performance degradation - any idea why?

Lars George Fri, 19 Nov 2010 07:16:37 -0800

Hi Henning,

And you what you have seen is often difficult to explain. What I
listed are the obvious contenders. But ideally you would do a post
mortem on the master and slave logs for Hadoop and HBase, since that
would give you a better insight of the events. For example, when did
the system start to flush, when did it start compacting, when did the
HDFS start to go slow? And so on. One thing that I would highly
recommend (if you haven't done so already) is getting graphing going.
Use the build in Ganglia support and you may be able to at least
determine the overall load on the system and various metrics of Hadoop
and HBase.


Did you use the normal Puts or did you set it to cache Puts and write
them in bulk? See HTable.setWriteBufferSize() and
HTable.setAutoFlush() for details (but please note that you then do
need to call HTable.flushCommits() in your close() method of the
mapper class). That will help a lot speeding up writing data.

Lars

On Fri, Nov 19, 2010 at 3:43 PM, Henning Blohm <henning.bl...@zfabrik.de> wrote:
> Hi Lars,
>
>  thanks. Yes, this is just the first test setup. Eventually the data
> load will be significantly higher.
>
> At the moment (looking at the master after the run) the number of
> regions is well-distributed (684,685,685 regions). The overall
> HDFS use is  ~700G. (replication factor is 3 btw).
>
> I will want to upgrade as soon as that makes sense. It seems there is
> "release" after 0.20.6 - that's why we are still with that one.
>
> When I do that run again, I will check the master UI and see how things
> develop there. As for the current run: I do not expect
> to get stable numbers early in the run. What looked suspicous was that
> things got gradually worse until well into 30 hours after
> the start of the run and then even got better. An unexpected load
> behavior for me (would have expected early changes but then
> some stable behavior up to the end).
>
> Thanks,
>  Henning
>
> Am Freitag, den 19.11.2010, 15:21 +0100 schrieb Lars George:
>
>> Hi Henning,
>>
>> Could you look at the Master UI while doing the import? The issue with
>> a cold bulk import is that you are hitting one region server
>> initially, and while it is filling up its in-memory structures all is
>> nice and dandy. Then ou start to tax the server as it has to flush
>> data out and it becomes slower responding to the mappers still
>> hammering it. Only after a while the regions become large enough so
>> that they get split and load starts to spread across 2 machines, then
>> 3. Eventually you have enough regions to handle your data and you will
>> see an average of the performance you could expect from a loaded
>> cluster. For that reason we have added a bulk loading feature that
>> helps building the region files externally and then swap them in.
>>
>> When you check the UI you can actually see this behavior as the
>> operations-per-second (ops) are bound to one server initially. Well,
>> could be two as one of them has to also serve META. If that is the
>> same machine then you are penalized twice.
>>
>> In addition you start to run into minor compaction load while HBase
>> tries to do housekeeping during your load.
>>
>> With 0.89 you could pre-split the regions into what you see eventually
>> when your job is complete. Please use the UI to check and let us know
>> how many regions you end up with in total (out of interest mainly). If
>> you had that done before the import then the load is split right from
>> the start.
>>
>> In general 0.89 is much better performance wise when it comes to bulk
>> loads so you may want to try it out as well. The 0.90RC is up so a
>> release is imminent and saves you from having to upgrade soon. Also,
>> 0.90 is the first with Hadoop's append fix, so that you do not lose
>> any data from wonky server behavior.
>>
>> And to wrap this up, 3 data nodes is not too great. If you ask anyone
>> with a serious production type setup you will see 10 machines and
>> more, I'd say 20-30 machines and up. Some would say "Use MySQL for
>> this little data" but that is not fair given that we do not know what
>> your targets are. Bottom line is, you will see issues (like slowness)
>> with 3 nodes that 8 or 10 nodes will never show.
>>
>> HTH,
>> Lars
>>
>>
>> On Fri, Nov 19, 2010 at 2:09 PM, Henning Blohm <henning.bl...@zfabrik.de> 
>> wrote:
>> > We have a Hadoop 0.20.2 + Hbase 0.20.6 setup with three data nodes
>> > (12GB, 1.5TB each) and one master node (24GB, 1.5TB). We store a
>> > relatively simple
>> > table in HBase (1 column familiy, 5 columns, rowkey about 100chars).
>> >
>> > In order to better understand the load behavior, I wanted to put 5*10^8
>> > rows into that table. I wrote an M/R job that uses a Split Input Format
>> > to split the
>> > 5*10^8 logical row keys (essentially just counting from 0 to 5*10^8-1)
>> > into 1000 chunks of 500000 keys and then let the map do the actual job
>> > of writing the corresponding rows (with some random column values) into
>> > hbase.
>> >
>> > So there are 1000 map tasks, no reducer. Each task writes 500000 rows
>> > into Hbase. We have 6 mapper slots, i.e. 24 mappers running parallel.
>> >
>> > The whole job runs for approx. 48 hours. Initially the map tasks need
>> > around 30 min. each. After a while things take longer and longer,
>> > eventually
>> > reaching > 2h. It tops around the 850s task after which things speed up
>> > again improving to about 48min. in the end, until completed.
>> >
>> > It's all dedicated machines and there is nothing else running. The map
>> > tasks have 200m heap and when checking with vmstat in between I cannot
>> > observe swapping.
>> >
>> > Also, on the master it seems that heap utilization is not at the limit
>> > and no swapping either. All Hadoop and Hbase processes have
>> > 1G heap.
>> >
>> > Any idea what would cause the strong variation (or degradation) of write
>> > performance?
>> > Is there a way of finding out where time gets lost?
>> >
>> > Thanks,
>> >  Henning
>> >
>> >
>

Re: map task performance degradation - any idea why?

Reply via email to