Hadoop stores temporary files there such as shuffling map output data, you need 
it! But you can rf -r it after a complete crawl cycle. Do not clear it while a 
job is running, it's going to miss it's temp files.
 
-----Original message-----
> From:Eyeris Rodriguez Rueda <eru...@uci.cu>
> Sent: Fri 08-Feb-2013 20:53
> To: user@nutch.apache.org
> Subject: Re: Could not find any valid local directory for output/file.out
> 
> Im using ubuntu server 12.04 only for nutch, I have asigned 40 GB for this. 
> Is /tmp needed for nutch crawl process ? or i can make a crontab for delete 
> /tmp content without problem for nutch crawl.
> 
> 
> 
> 
> ----- Mensaje original -----
> De: "Tejas Patil" <tejas.patil...@gmail.com>
> Para: user@nutch.apache.org
> Enviados: Viernes, 8 de Febrero 2013 14:33:25
> Asunto: Re: Could not find any valid local directory for output/file.out
> 
> I dont think there is any such property. Maybe its time for you to cleanup
> /tmp :)
> 
> Thanks,
> Tejas Patil
> 
> 
> On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <eru...@uci.cu>wrote:
> 
> > Hi lewis an tejas again.
> > I have point the hadoop.tmp.dir property but nutch still consuming to much
> > space for me.
> > Is posible to reduce the space of nutch in my tmp folder with some
> > property of a fetcher process? I always get an exception because the hard
> > disk is full. my crawldb only have 150 MB not more. but my tmp folder
> > continue increasing without control until 60 GB, and fail at this point.
> > please any help
> >
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Eyeris Rodriguez Rueda" <eru...@uci.cu>
> > Para: user@nutch.apache.org
> > Enviados: Viernes, 8 de Febrero 2013 10:45:52
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > Thanks a lot. lewis and tejas, you are very helpfull for me.
> > It function ok, I have pointed to another partition and ok.
> > Problem solved.
> >
> >
> >
> >
> >
> > ----- Mensaje original -----
> > De: "Tejas Patil" <tejas.patil...@gmail.com>
> > Para: user@nutch.apache.org
> > Enviados: Jueves, 7 de Febrero 2013 16:32:33
> > Asunto: Re: Could not find any valid local directory for output/file.out
> >
> > On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <eru...@uci.cu
> > >wrote:
> >
> > > Thank to all for your replies.
> > > If i want to change the default location for hadoop job(/tmp), where i
> > can
> > > do that ?, because my nutch-site.xml not include nothing pointing to
> > /tmp.
> > >
> > Add this property to nutch-site.xml with appropriate value:
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>XXXXXXXXXX</value>
> > </property>
> >
> >
> >
> > > So I have readed about nutch and hadoop but im not sure to understand at
> > > all. Is posible to use nutch 1.5.1 in distributed mode ?
> >
> > yes
> >
> >
> > > In this case what i need to do for that, I really appreciated your answer
> > > because I canĀ“t find a good documentation for this topic.
> > >
> > For distributed mode, Nutch is called from runtime/deploy. The conf files
> > should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
> > So modify the runtime/local/conf/nutch-site.xml to set
> > http.agent.nameproperly.  I am assuming that the hadoop setup is in
> > place and hadoop
> > variables are exported. Now, run the nutch commands from runtime/deploy.
> >
> > Thanks,
> > Tejas Patil
> >
> > >
> > >
> > >
> > > ----- Mensaje original -----
> > > De: "Tejas Patil" <tejas.patil...@gmail.com>
> > > Para: user@nutch.apache.org
> > > Enviados: Jueves, 7 de Febrero 2013 14:04:26
> > > Asunto: Re: Could not find any valid local directory for output/file.out
> > >
> > > Nutch jobs are executed by Hadoop. "/tmp" is the default location used by
> > > hadoop to store temporary data required for a job. If you dont over-ride
> > > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> > > case, /tmp doesnt have ample space left so better over-ride that property
> > > and point it to some other location which has ample space.
> > >
> > > Thanks,
> > > Tejas Patil
> > >
> > >
> > > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <eru...@uci.cu
> > > >wrote:
> > >
> > > > Thanks lewis by your answer.
> > > > My doubt is why /tmp is increasing while crawl process is doing, and
> > why
> > > > nutch use that folder. Im using nutch 1.5.1 in single mode and my nutch
> > > > site not have properties hadoop.tmp.dir. I need reduce the space used
> > for
> > > > that folder because I only have 40 GB for nutch machine and 50 GB for
> > > solr
> > > > machine. Please some advice or explanation will be accepted.
> > > > Thanks for your time.
> > > >
> > > >
> > > >
> > > > ----- Mensaje original -----
> > > > De: "Lewis John Mcgibbney" <lewis.mcgibb...@gmail.com>
> > > > Para: user@nutch.apache.org
> > > > Enviados: Jueves, 7 de Febrero 2013 13:06:11
> > > > Asunto: Re: Could not find any valid local directory for
> > output/file.out
> > > >
> > > > Hi,
> > > >
> > > >
> > > >
> > >
> > https://wiki.apache.org/nutch/NutchGotchas#DiskErrorException_while_fetching
> > > >
> > > > On Thursday, February 7, 2013, Eyeris Rodriguez Rueda <eru...@uci.cu>
> > > > wrote:
> > > > > Hi all.
> > > > > I have a problem when i do a crawl for few hour or days, im using
> > nutch
> > > > 1.5.1 and solr 3.6, but the crawl process fails and i dont know how to
> > > fix
> > > > this problem, im intersted in make a crawl process without limit with
> > 10
> > > > cicles or more but i have problem with space on hard disk, i have
> > > detected
> > > > that /etc/tmp have 29 GB used and is not good for me, any body can help
> > > me
> > > > or give some advices for configure nutch to make at least one crawl
> > > process
> > > > without problems ?
> > > > >
> > > > > here some features of my environment
> > > > > Ram 2 GB
> > > > > CPU:QuadCore(but im using only 2 cores)
> > > > > Hard Disk:40 GB
> > > > > Threads:50
> > > > > db.fetch.interval.default=2 days
> > > > >
> > > > >
> > > > >
> > > > > this is a part of my log file when nutch fails:
> > > > >
> > > > > ****************************************************************
> > > > > 2013-02-06 18:45:25,961 INFO  fetcher.Fetcher - fetching
> > > > http://bibliodoc.uci.cu/TD/TD_03349_10.pdf
> > > > > 2013-02-06 18:45:25,964 INFO  fetcher.Fetcher - fetching
> > > > http://bibliodoc.uci.cu/TD/TD_0442_07.pdf
> > > > > 2013-02-06 18:45:25,977 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=49
> > > > > 2013-02-06 18:45:26,109 INFO  fetcher.Fetcher - -activeThreads=49,
> > > > spinWaiting=39, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:26,180 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=48
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=47
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=46
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=44
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=45
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=40
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=39
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=38
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=37
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=36
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=35
> > > > > 2013-02-06 18:45:26,332 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=34
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=33
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=32
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=31
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=30
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=29
> > > > > 2013-02-06 18:45:26,333 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=28
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=27
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=26
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=25
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=24
> > > > > 2013-02-06 18:45:26,334 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=23
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=22
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=21
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=20
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=19
> > > > > 2013-02-06 18:45:26,335 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=18
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=41
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=17
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=15
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=13
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=12
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=42
> > > > > 2013-02-06 18:45:26,331 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=43
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=9
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=10
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=11
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=14
> > > > > 2013-02-06 18:45:26,336 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=16
> > > > > 2013-02-06 18:45:26,404 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=8
> > > > > 2013-02-06 18:45:26,630 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=7
> > > > > 2013-02-06 18:45:27,069 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=6
> > > > > 2013-02-06 18:45:27,110 INFO  fetcher.Fetcher - -activeThreads=6,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:27,129 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=5
> > > > > 2013-02-06 18:45:28,110 INFO  fetcher.Fetcher - -activeThreads=5,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:28,502 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=4
> > > > > 2013-02-06 18:45:29,111 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:30,123 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:31,127 INFO  fetcher.Fetcher - -activeThreads=4,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:31,187 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=3
> > > > > 2013-02-06 18:45:32,171 INFO  fetcher.Fetcher - -activeThreads=3,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:32,206 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=2
> > > > > 2013-02-06 18:45:33,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:34,173 INFO  fetcher.Fetcher - -activeThreads=2,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:34,205 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=1
> > > > > 2013-02-06 18:45:34,457 INFO  fetcher.Fetcher - -finishing thread
> > > > FetcherThread, activeThreads=0
> > > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0,
> > > > spinWaiting=0, fetchQueues.totalSize=0
> > > > > 2013-02-06 18:45:35,174 INFO  fetcher.Fetcher - -activeThreads=0
> > > > > 2013-02-06 18:45:35,742 WARN  mapred.LocalJobRunner - job_local_0015
> > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > > any
> > > > valid local directory for output/file.out
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:69)
> > > > >         at
> > > >
> > > >
> > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1640)
> > > > >         at
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1323)
> > > > >         at
> > > > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
> > > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > > > >         at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > > > >
> > > >
> > > > --
> > > > *Lewis*
> > > >
> > >
> >
> 

Reply via email to