Hi Folks,
Just wanted to make folk aware of some work Continuum Analytics have been
doing on bringing Nutch to the Python community.
https://github.com/ContinuumIO/nutchpy
Comtinuum are the folks behind most of the scientific Python stuff you've
ever used. If you've used Python before, then
Yep it's awesome work funded by the DARPA memex project and our team. Cc'ing
Andy Terrel for awareness thanks Lewis!
Sent from my iPhone
On Jan 9, 2015, at 6:04 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com
wrote:
Hi Folks,
Just wanted to make folk aware of some work Continuum
Thanks, I got this error while installing
[INFO] Scanning for projects...
[INFO]
[INFO]
[INFO] Building seqreader-app 1.0-SNAPSHOT
[INFO]
[INFO]
Sent from my iPhone
Begin forwarded message:
From: Shadi Saleh propat...@gmail.commailto:propat...@gmail.com
Date: January 9, 2015 at 6:28:11 PM PST
To: user user@nutch.apache.orgmailto:user@nutch.apache.org
Subject: Re: nutchpy
Reply-To: user@nutch.apache.orgmailto:user@nutch.apache.org
Dear all,
I added to file :
nutchpy/seqreader-app/pom.xml
the following:
plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-compiler-plugin/artifactId
configuration
compilerVersion1.5/compilerVersion
source1.5/source
Hi user@ dev@,
This thread is a VOTE for releasing Apache Nutch 2.3.
Quite incredibly we addressed 143 issues as per the release report
http://s.apache.org/nutch_2.3
The release candidate comprises the following components.
* A staging repository [0] containing various Maven artifacts
* A
Do you have enough memory? 50 thtreads and PDF's and and older Tika version
will get you in trouble. That PDFBox version eats memory! Try upgrading to the
latest PDFBox, you can drop jars manually and reference them in Tika's
plugin.xml.
M
-Original message-
From:Paul Rogers
Thanks Markus
I will try that and see if it fixes things.
The server has 24GB of memory but only about 1GB free without the nutch
process running!!
Are the PDFBox files in Tika 1.6 (PDFBox 1.8.6) likely to have fixed this
or should I go for 1.8.8 on the PDFBox site?
Thanks again
P
On 9
Hi Guys
I am using nutch 1.8 to fetch pdf documents from an http server. The jobs
have been running OK until recently when I started getting the following
error:
-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=2500
fetching
Hi Markus
Rebooting the server frees up 23GB of memory.
Have installed PDFBox 1.8.8 and am running fetch again. Will update you on
results.
Thanks
P
On 9 January 2015 at 14:11, Paul Rogers paul.roge...@gmail.com wrote:
Thanks Markus
I will try that and see if it fixes things.
The
10 matches
Mail list logo