mortbay, huge files and the ulimit

Björn-Elmar Macek Wed, 29 Aug 2012 06:53:56 -0700

Hi there,

i am currently running a job where i selfjoin a 63 gigabyte big csv fileon 20 physically distinct nodes with 15GB each:

While the mapping works just fine and is low cost, the reducer does themain work: holding a hashmap with elements to join with and finding jointuples for evry incoming key-value-pair.

The jobs works perfectly on small files with 2 gigabytes, but starts toget "unstable" as the file size goes up: this becomes evident with alook into the tasktracker's logs saying:


ERROR org.mortbay.log: /mapOutput
java.lang.IllegalStateException: Committed
    at org.mortbay.jetty.Response.resetBuffer(Response.java:1023)
    at org.mortbay.jetty.Response.sendError(Response.java:240)

atorg.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3945)

    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

atorg.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)atorg.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)atorg.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)atorg.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)atorg.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)atorg.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)atorg.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)atorg.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)atorg.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)atorg.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)atorg.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

    at org.mortbay.jetty.Server.handle(Server.java:326)

atorg.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)atorg.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

atorg.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)atorg.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

And while it is no problem at the beginning of the reduce process, wherethis happens only on a few nodes and rarely, it becomes crucial as theprogress rises. The reason for this (afaik from reading articles), isthat there are memory or file handle problems. I addressed the memoryproblem by conitiously purging the map of outdated elements evry 5million processed key-value-pairs. And i set mapred.child.ulimit to100000000 (ulimit in the shell tells me it is 400000000).

Anyway i am still running into those mortbay errors and i start towonder, if hadoop can manage the job with this algorithmn anyways. Bypure naive math it should be:i explicily assigned 10GB memory to each JVM on each node and setmapred.child.java.opts to "-Xmx10240m -XX:+UseCompressedOops-XX:-UseGCOverheadLimit" (its a 64 bit environment and largedatastructures cause the GC to throw exceptions). This would naivelymake 18 slave machines with 10GB each resulting in an overall memory of180GB - three times as much as needed... i would think. So if thePartitioner distributes them just about equally to all nodes i shouldnot run into any errors, do i?


Can anybody help me with this issue?

Best regards,
Elmar

mortbay, huge files and the ulimit

Reply via email to