Re: Could not find any valid local directory for output/file.out

Eyeris Rodriguez Rueda Fri, 08 Feb 2013 11:16:46 -0800

Hi lewis an tejas again.
I have point the hadoop.tmp.dir property but nutch still consuming to much 
space for me.
Is posible to reduce the space of nutch in my tmp folder with some property of 
a fetcher process? I always get an exception because the hard disk is full. my 
crawldb only have 150 MB not more. but my tmp folder continue increasing 
without control until 60 GB, and fail at this point.
please any help





----- Mensaje original -----
De: "Eyeris Rodriguez Rueda" <eru...@uci.cu>
Para: user@nutch.apache.org
Enviados: Viernes, 8 de Febrero 2013 10:45:52
Asunto: Re: Could not find any valid local directory for output/file.out

Thanks a lot. lewis and tejas, you are very helpfull for me.
It function ok, I have pointed to another partition and ok.
Problem solved.





----- Mensaje original -----
De: "Tejas Patil" <tejas.patil...@gmail.com>
Para: user@nutch.apache.org
Enviados: Jueves, 7 de Febrero 2013 16:32:33
Asunto: Re: Could not find any valid local directory for output/file.out

On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <eru...@uci.cu>wrote:

> Thank to all for your replies.
> If i want to change the default location for hadoop job(/tmp), where i can
> do that ?, because my nutch-site.xml not include nothing pointing to /tmp.
>
Add this property to nutch-site.xml with appropriate value:

<property>
<name>hadoop.tmp.dir</name>
<value>XXXXXXXXXX</value>
</property>



> So I have readed about nutch and hadoop but im not sure to understand at
> all. Is posible to use nutch 1.5.1 in distributed mode ?

yes


> In this case what i need to do for that, I really appreciated your answer
> because I can´t find a good documentation for this topic.
>
For distributed mode, Nutch is called from runtime/deploy. The conf files
should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
So modify the runtime/local/conf/nutch-site.xml to set
http.agent.nameproperly.  I am assuming that the hadoop setup is in
place and hadoop
variables are exported. Now, run the nutch commands from runtime/deploy.

Thanks,
Tejas Patil

>
>
>
> ----- Mensaje original -----
> De: "Tejas Patil" <tejas.patil...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Jueves, 7 de Febrero 2013 14:04:26
> Asunto: Re: Could not find any valid local directory for output/file.out
>
> Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> hadoop to store temporary data required for a job. If you dont over-ride
> hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> case, /tmp doesnt have ample space left so better over-ride that property
> and point it to some other location which has ample space.
>
> Thanks,
> Tejas Patil
>
>
> On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <eru...@uci.cu
> >wrote:
>
> > Thanks lewis by your answer.
> > My doubt is why /tmp is increasing while crawl process is doing, and why
> > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > site not have properties hadoop.tmp.dir. I need reduce the space used for
> > that folder because I only have 40 GB for nutch machine and 50 GB for
> solr
> > machine. Please some advice or explanation will be accepted.
> > Thanks for your time.
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Lewis John Mcgibbney" <lewis.mcgibb...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Hi,
> >
> >
> >
> https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> >
> > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <eru...@uci.cu>
> > wrote:
> > > Hi all.
> > > I have a problem when i do a crawl for few hour or days, im using nutch
> > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> fix
> > this problem, im intersted in make a crawl process without limit with 10
> > cicles or more but i have problem with space on hard disk, i have
> detected
> > that /etc/tmp have 29 GB used and is not good for me, any body can help
> me
> > or give some advices for configure nutch to make at least one crawl
> process
> > without problems ?
> > >
> > > here some features of my environment
> > > Ram 2 GB
> > > CPU:QuadCore(but im using only 2 cores)
> > > Hard Disk:40 GB
> > > Threads:50
> > > db.fetch.interval.default=2 days
> > >
> > >
> > >
> > > this is a part of my log file when nutch fails:
> > >
> > > ****************************************************************
> > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=49
> > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > spinWaiting=39, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=48
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=47
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=46
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=44
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=45
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=40
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=39
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=38
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=37
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=36
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=35
> > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=34
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=33
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=32
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=31
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=30
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=29
> > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=28
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=27
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=26
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=25
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=24
> > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=23
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=22
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=21
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=20
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=19
> > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=18
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=41
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=17
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=15
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=13
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=12
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=42
> > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=43
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=9
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=10
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=11
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=14
> > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=16
> > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=8
> > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=7
> > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=6
> > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=5
> > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=4
> > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=3
> > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=2
> > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=1
> > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > FetcherThread, activeThreads=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > spinWaiting=0, fetchQueues.totalSize=0
> > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> any
> > valid local directory for output/file.out
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > >         at
> >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > >         at
> >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > >         at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > >         at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > >
> >
> > --
> > *Lewis*
> >
>

Re: Could not find any valid local directory for output/file.out

Reply via email to