Hey Jamshaid,
We cannot see any screenshot being attached. Could you upload it somewhere
and share the url ?


On Thu, Jun 13, 2013 at 11:25 PM, Jamshaid Ashraf <jamshaid...@gmail.com>wrote:

> Hi,
>
> Thanks for prompt reply!
>
> I have set debug point on following line in plugin code in eclipse but get
> "source not found" screen when debugging plugin code in eclipse. Please see
> attached screen shot.
>
> String content = new String(page.getContent().array());
>
> What might cause this to happen and how can I fix it?
>
> Regards,
> Jamshaid
>
>
> On Thu, Jun 13, 2013 at 8:34 PM, feng lu <amuseme...@gmail.com> wrote:
>
>> Hi
>>
>> I checked the ParseFilter interface in Nutch 2.x like this.
>>
>> Parse filter(String url, WebPage page, Parse parse,HTMLMetaTags metaTags,
>> DocumentFragment doc);
>>
>> you can through this method to get the raw content of html page.
>>
>> String content = new String(page.getContent().array());
>>
>> and get the parsed text through parse.getText() method.
>>
>>
>>
>>
>>
>> On Thu, Jun 13, 2013 at 11:10 PM, Jamshaid Ashraf <jamshaid...@gmail.com
>> >wrote:
>>
>> > Hi,
>> >
>> > Since I'm using nutch 2.2 ParseFilter plugin and I need to extract
>> custom
>> > information from parsed raw html (preferably using JSoup) ... but I
>> still
>> > could't find out how to get the raw html in @override filter () method
>> . As
>> > all the examples I have found are in Nutch 1.x api and doens't work with
>> > new Nutch 2.x api.
>> >
>> >
>> > Thanks in advance!
>> >
>> > Regards,
>> > Jamshaid
>> >
>>
>>
>>
>> --
>> Don't Grow Old, Grow Up... :-)
>>
>
>

Reply via email to