|
I guess I have to read up on segments. I don't know what they are
yet. Looking at the Mail Archive of this List ( http://www.mail-archive.com/[email protected]/msg01607.html ), I found: ./bin/nutch readseg -dump crawl/segments/XXX/ dump_folder -nofetch -nogenerate -noparse -noparsedata -noparsetex That command will dump all the HTML sourcecode in one file. That's a good start. However the desired result in my case is to create a new field in the schema containing the source code, like: <htmlsource></htmlsource> Is that even possible? Best, Stephan Am 09.05.2012 18:05, schrieb Lewis John Mcgibbney: Which segments are you trying to generate from? Do you maybe need to include them individually? or use a wildcard? --
stephan |
Title: [Fwd: RE: Weekly Report]
- Re: HTTP ERROR 400 Stephan Kristyn
- Re: HTTP ERROR 400 Lewis John Mcgibbney
- Re: HTTP ERROR 400 Stephan Kristyn
- Re: HTTP ERROR 400 Lewis John Mcgibbney
- RE: HTTP ERROR 400 Stephan Kristyn
- RE: HTTP ERROR 400 Stephan Kristyn
- Re: HTTP ERROR 400 Stephan Kristyn
- Re: HTTP ERROR 400 Stephan Kristyn
- Re: HTTP ERROR 400 Stephan Kristyn
- Re: HTTP ERROR 400 Lewis John Mcgibbney
- Re: HTTP ERROR 400 Stephan Kristyn
- Re: HTTP ERROR 400 Markus Jelsma
- Re: HTTP ERROR 400 Tolga
- Re: HTTP ERROR 400 Markus Jelsma
- HTTP error 400 Tolga
- Re: HTTP error 400 Markus Jelsma
- Re: HTTP error 400 Tolga
- Re: HTTP error 400 Stephan Kristyn
- Re: HTTP error 400 keesp
- Re: HTTP error 400 Michael Erickson
- Re: HTTP error 400 Lewis John Mcgibbney

