Is there a way to preserve the HTML that is being crawled from Nutch 1.7? Specifically, instead of normalizing the information that is crawled into a long string value then assigning that to the ‘content’ key (if viewing in JSON), I’d like to see the markup itself as indexed.
Thanks, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.

