Hi,

Can you paste the output of "df -h" command here.

Regards
Sidharth

On Wednesday, April 12, 2017, Albert Chu <ch...@llnl.gov> wrote:

> Hi,
>
> I have a cluster where we have a parallel networked file system for our
> major data storage and our nodes have ~750G of local SSD space.  To
> speed up things, we configure yarn.nodemanager.local-dirs to use the
> local SSD for local caching.
>
> Recently, I've been trying to do a terasort of 2 terabytes of data over
> 8 nodes w/ Hadoop 2.7.3.  So that's about 6000 gigs of local SSD space
> for caching, or 5400 gigs when hadoop uses its 90% disk full checking
> limit.
>
> I always get diskfull errors such as the below when running:
>
> 2017-04-11 12:31:44,062 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection:
> Directory /l/ssd/achutest/localstore/yarn-nm error, used space above
> threshold of 90.0%, removing from list of valid directories
> 2017-04-11 12:31:44,063 INFO 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService:
> Disk(s) failed: 1/1 local-dirs are bad: /l/ssd/achutest/localstore/
> yarn-nm;
> 2017-04-11 12:31:44,063 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService:
> Most of the disks failed. 1/1 local-dirs are bad:
> /l/ssd/achutest/localstore/yarn-nm;
>
> What I don't understand is how I am getting diskfull errors.  Within
> terasort, I should have at most 2000 gigs of mapped intermediate data
> and at most 2000 gigs of merged data in reducers.  Even assuming some
> overhead from Hadoop, I should have more than enough space for this
> benchmark to complete given maps and reducers are spread out evenly
> across nodes.
>
> So my assumption is something else is being cached in local-dirs that
> I'm not accounting for.  Is there any other data I should consider when
> coming up with my estimates?
>
> One guess I had.  Is it possible spilled data from reducer merges are
> not deleted until a reducer completes?  Given my example above, the
> total amount of merged data in reducers may exceed 2000 gigs at some
> point?
>
> Al
>
> --
> Albert Chu
> ch...@llnl.gov <javascript:;>
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org <javascript:;>
> For additional commands, e-mail: user-h...@hadoop.apache.org
> <javascript:;>
>
>

-- 
Regards
Sidharth Kumar | Mob: +91 8197 555 599 | LinkedIn
<https://www.linkedin.com/in/sidharthkumar2792/>

Reply via email to