The /tmp directory is not cleaned up IIRC. You're safe to empty it as long a
you don't have a job running ;)
-----Original message-----
> From:Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
> Sent: Fri 08-Feb-2013 20:48
> To: user@nutch.apache.org
> Subject: Re: Could not find any valid local directory for output/file.out
>
> +1
> This is a ridiculous size of tmp for a crawldb of minimal size.
> There is clearly something wrong
>
> On Friday, February 8, 2013, Tejas Patil <tejas.patil...@gmail.com> wrote:
> > I dont think there is any such property. Maybe its time for you to cleanup
> > /tmp :)
> >
> > Thanks,
> > Tejas Patil
> >
> >
> > On Fri, Feb 8, 2013 at 11:16 AM, Eyeris Rodriguez Rueda <eru...@uci.cu
> >wrote:
> >
> >> Hi lewis an tejas again.
> >> I have point the hadoop.tmp.dir property but nutch still consuming to
> much
> >> space for me.
> >> Is posible to reduce the space of nutch in my tmp folder with some
> >> property of a fetcher process? I always get an exception because the hard
> >> disk is full. my crawldb only have 150 MB not more. but my tmp folder
> >> continue increasing without control until 60 GB, and fail at this point.
> >> please any help
> >>
> >>
> >>
> >>
> >> ----- Mensaje original -----
> >> De: "Eyeris Rodriguez Rueda" <eru...@uci.cu>
> >> Para: user@nutch.apache.org
> >> Enviados: Viernes, 8 de Febrero 2013 10:45:52
> >> Asunto: Re: Could not find any valid local directory for output/file.out
> >>
> >> Thanks a lot. lewis and tejas, you are very helpfull for me.
> >> It function ok, I have pointed to another partition and ok.
> >> Problem solved.
> >>
> >>
> >>
> >>
> >>
> >> ----- Mensaje original -----
> >> De: "Tejas Patil" <tejas.patil...@gmail.com>
> >> Para: user@nutch.apache.org
> >> Enviados: Jueves, 7 de Febrero 2013 16:32:33
> >> Asunto: Re: Could not find any valid local directory for output/file.out
> >>
> >> On Thu, Feb 7, 2013 at 12:47 PM, Eyeris Rodriguez Rueda <eru...@uci.cu
> >> >wrote:
> >>
> >> > Thank to all for your replies.
> >> > If i want to change the default location for hadoop job(/tmp), where i
> >> can
> >> > do that ?, because my nutch-site.xml not include nothing pointing to
> >> /tmp.
> >> >
> >> Add this property to nutch-site.xml with appropriate value:
> >>
> >> <property>
> >> <name>hadoop.tmp.dir</name>
> >> <value>XXXXXXXXXX</value>
> >> </property>
> >>
> >>
> >>
> >> > So I have readed about nutch and hadoop but im not sure to understand
> at
> >> > all. Is posible to use nutch 1.5.1 in distributed mode ?
> >>
> >> yes
> >>
> >>
> >> > In this case what i need to do for that, I really appreciated your
> answer
> >> > because I canĀ“t find a good documentation for this topic.
> >> >
> >> For distributed mode, Nutch is called from runtime/deploy. The conf files
> >> should be modified in runtime/local/conf, not in $NUTCH_HOME/conf.
> >> So modify the runtime/local/conf/nutch-site.xml to set
> >> http.agent.nameproperly. I am assuming that the hadoop setup is in
> >> place and hadoop
> >> variables are exported. Now, run the nutch commands from runtime/deploy.
> >>
> >> Thanks,
> >> Tejas Patil
> >>
> >> >
> >> >
> >> >
> >> > ----- Mensaje original -----
> >> > De: "Tejas Patil" <tejas.patil...@gmail.com>
> >> > Para: user@nutch.apache.org
> >> > Enviados: Jueves, 7 de Febrero 2013 14:04:26
> >> > Asunto: Re: Could not find any valid local directory for
> output/file.out
> >> >
> >> > Nutch jobs are executed by Hadoop. "/tmp" is the default location used
> by
> >> > hadoop to store temporary data required for a job. If you dont
> over-ride
> >> > hadoop.tmp.dir in any config file, it will use /tmp by default. In your
> >> > case, /tmp doesnt have ample space left so better over-ride that
> property
> >> > and point it to some other location which has ample space.
> >> >
> >> > Thanks,
> >> > Tejas Patil
> >> >
> >> >
> >> > On Thu, Feb 7, 2013 at 10:38 AM, Eyeris Rodriguez Rueda <eru...@uci.cu
> >> > >wrote:
> >> >
> >> > > Thanks lewis by your answer.
> >> > > My doubt is why /tmp is increasing while crawl process is doing, and
> >> why
> >> > > nutch use that folder. Im using nutch 1.5.1 in single mode and my
> nutch
> >> > > site not have properties hadoop.tmp.dir. I need reduce the space used
> >> for
> >> > > that folder because I only have 40 GB for nutch machine and 50 GB for
> >> > solr
> >> > > machine. Please some advice or expla
>
> --
> *Lewis*
>