Hello guys,

I was able to get nutch 1.4 in the most basic of basic setups - local and
default options for the most part. While I am getting some results in Solr,
it's not getting all the prices and variations from the pages.

Previously, I learned nutch could get all this information and the export
is in base64, and the field it comes in under is "binaryContent".

So, I need to know how to get binaryContent or base64 results out of
nutch.  I tried to run bin/nutch and find it there but it's giving me the
following list (which I don't see any way from these):

readdb
mergedb
readlinkdb
inject
generate
freegen
fetch
parse
readseg
mergesegs
updatedb
invertlinks
mergelinkdb
index
dedup
dump
commoncrawldump
solrindex
solrdedup
solrclean
clean
parsechecker
indexchecker
filterchecker
normalizerchecker
domainstats
protocolstats
crawlcomplete
webgraph
linkrank
scoreupdater
nodedumper
plugin
junit
startserver
webapp
warc
updatehostdb
readhostdb
sitemap
CLASSNAME


Please if any of you could let me know how it's done in 1.4 it would be
highly appreciated.

Thank you!!

Eric

Reply via email to