Yeah, turning of the WAL would have been my next suggestion. Apart from that Ganglia is really easily set up - you might want to consider getting used to it now :)
On Fri, Nov 19, 2010 at 4:29 PM, Henning Blohm <henning.bl...@zfabrik.de> wrote: > Hi Lars, > > we do not have anything like ganglia up. Unfortunately. > > I use regular puts with autoflush turned off, with a buffer of 4MB > (could be bigger right?). We write to WAL. > > I flush every 1000 recs. > > I will try again - maybe over the weekend - and see if I can find out > more. > > Thanks, > Henning > > Am Freitag, den 19.11.2010, 16:16 +0100 schrieb Lars George: > >> Hi Henning, >> >> And you what you have seen is often difficult to explain. What I >> listed are the obvious contenders. But ideally you would do a post >> mortem on the master and slave logs for Hadoop and HBase, since that >> would give you a better insight of the events. For example, when did >> the system start to flush, when did it start compacting, when did the >> HDFS start to go slow? And so on. One thing that I would highly >> recommend (if you haven't done so already) is getting graphing going. >> Use the build in Ganglia support and you may be able to at least >> determine the overall load on the system and various metrics of Hadoop >> and HBase. >> >> Did you use the normal Puts or did you set it to cache Puts and write >> them in bulk? See HTable.setWriteBufferSize() and >> HTable.setAutoFlush() for details (but please note that you then do >> need to call HTable.flushCommits() in your close() method of the >> mapper class). That will help a lot speeding up writing data. >> >> Lars >> >> On Fri, Nov 19, 2010 at 3:43 PM, Henning Blohm <henning.bl...@zfabrik.de> >> wrote: >> > Hi Lars, >> > >> > thanks. Yes, this is just the first test setup. Eventually the data >> > load will be significantly higher. >> > >> > At the moment (looking at the master after the run) the number of >> > regions is well-distributed (684,685,685 regions). The overall >> > HDFS use is ~700G. (replication factor is 3 btw). >> > >> > I will want to upgrade as soon as that makes sense. It seems there is >> > "release" after 0.20.6 - that's why we are still with that one. >> > >> > When I do that run again, I will check the master UI and see how things >> > develop there. As for the current run: I do not expect >> > to get stable numbers early in the run. What looked suspicous was that >> > things got gradually worse until well into 30 hours after >> > the start of the run and then even got better. An unexpected load >> > behavior for me (would have expected early changes but then >> > some stable behavior up to the end). >> > >> > Thanks, >> > Henning >> > >> > Am Freitag, den 19.11.2010, 15:21 +0100 schrieb Lars George: >> > >> >> Hi Henning, >> >> >> >> Could you look at the Master UI while doing the import? The issue with >> >> a cold bulk import is that you are hitting one region server >> >> initially, and while it is filling up its in-memory structures all is >> >> nice and dandy. Then ou start to tax the server as it has to flush >> >> data out and it becomes slower responding to the mappers still >> >> hammering it. Only after a while the regions become large enough so >> >> that they get split and load starts to spread across 2 machines, then >> >> 3. Eventually you have enough regions to handle your data and you will >> >> see an average of the performance you could expect from a loaded >> >> cluster. For that reason we have added a bulk loading feature that >> >> helps building the region files externally and then swap them in. >> >> >> >> When you check the UI you can actually see this behavior as the >> >> operations-per-second (ops) are bound to one server initially. Well, >> >> could be two as one of them has to also serve META. If that is the >> >> same machine then you are penalized twice. >> >> >> >> In addition you start to run into minor compaction load while HBase >> >> tries to do housekeeping during your load. >> >> >> >> With 0.89 you could pre-split the regions into what you see eventually >> >> when your job is complete. Please use the UI to check and let us know >> >> how many regions you end up with in total (out of interest mainly). If >> >> you had that done before the import then the load is split right from >> >> the start. >> >> >> >> In general 0.89 is much better performance wise when it comes to bulk >> >> loads so you may want to try it out as well. The 0.90RC is up so a >> >> release is imminent and saves you from having to upgrade soon. Also, >> >> 0.90 is the first with Hadoop's append fix, so that you do not lose >> >> any data from wonky server behavior. >> >> >> >> And to wrap this up, 3 data nodes is not too great. If you ask anyone >> >> with a serious production type setup you will see 10 machines and >> >> more, I'd say 20-30 machines and up. Some would say "Use MySQL for >> >> this little data" but that is not fair given that we do not know what >> >> your targets are. Bottom line is, you will see issues (like slowness) >> >> with 3 nodes that 8 or 10 nodes will never show. >> >> >> >> HTH, >> >> Lars >> >> >> >> >> >> On Fri, Nov 19, 2010 at 2:09 PM, Henning Blohm <henning.bl...@zfabrik.de> >> >> wrote: >> >> > We have a Hadoop 0.20.2 + Hbase 0.20.6 setup with three data nodes >> >> > (12GB, 1.5TB each) and one master node (24GB, 1.5TB). We store a >> >> > relatively simple >> >> > table in HBase (1 column familiy, 5 columns, rowkey about 100chars). >> >> > >> >> > In order to better understand the load behavior, I wanted to put 5*10^8 >> >> > rows into that table. I wrote an M/R job that uses a Split Input Format >> >> > to split the >> >> > 5*10^8 logical row keys (essentially just counting from 0 to 5*10^8-1) >> >> > into 1000 chunks of 500000 keys and then let the map do the actual job >> >> > of writing the corresponding rows (with some random column values) into >> >> > hbase. >> >> > >> >> > So there are 1000 map tasks, no reducer. Each task writes 500000 rows >> >> > into Hbase. We have 6 mapper slots, i.e. 24 mappers running parallel. >> >> > >> >> > The whole job runs for approx. 48 hours. Initially the map tasks need >> >> > around 30 min. each. After a while things take longer and longer, >> >> > eventually >> >> > reaching > 2h. It tops around the 850s task after which things speed up >> >> > again improving to about 48min. in the end, until completed. >> >> > >> >> > It's all dedicated machines and there is nothing else running. The map >> >> > tasks have 200m heap and when checking with vmstat in between I cannot >> >> > observe swapping. >> >> > >> >> > Also, on the master it seems that heap utilization is not at the limit >> >> > and no swapping either. All Hadoop and Hbase processes have >> >> > 1G heap. >> >> > >> >> > Any idea what would cause the strong variation (or degradation) of write >> >> > performance? >> >> > Is there a way of finding out where time gets lost? >> >> > >> >> > Thanks, >> >> > Henning >> >> > >> >> > >> > >