Hi
In our case it is really in the segment, and ends up in the index. Are there
any known issues with parse filters? In that filter we do set the Parse object
as class attribute but we reset it with the new Parse object right after
filter() is called.
I also cannot think of the custom Tika Con
Hi Markus,
sounds somewhat similar to NUTCH-1252 but that was rather trivial
and easy to reproduce.
Sebastian
2012/11/30 Markus Jelsma :
> Hi,
>
> We've got an issue where one in a few thousand records partially contains
> another record's ParseMeta data. To be specific, record A ends up with t
- Mensaje original -
De: "Markus Jelsma"
Para: user@nutch.apache.org
Enviados: Viernes, 30 de Noviembre 2012 11:28:23
Asunto: RE: Access crawled content or parsed data of previous crawled url
-Original message-
> From:Jorge Luis Betancourt Gonzalez
> Sent: Fri 30-Nov-2012 17:2
Hi,
We've got an issue where one in a few thousand records partially contains
another record's ParseMeta data. To be specific, record A ends up with the
ParseMeta data of record B that is added by one of our custom parse plugins.
I'm unsure as to where the problem really is because the parse pl
-Original message-
> From:Jorge Luis Betancourt Gonzalez
> Sent: Fri 30-Nov-2012 17:22
> To: user@nutch.apache.org
> Subject: Re: Access crawled content or parsed data of previous crawled url
>
>
>
>
> I was thinking in this a lot since yesterday and I realize that I really
> don't
- Mensaje original -
De: "Markus Jelsma"
Para: user@nutch.apache.org
Enviados: Jueves, 29 de Noviembre 2012 17:13:32
Asunto: RE: Access crawled content or parsed data of previous crawled url
-Original message-
> From:Jorge Luis Betancourt Gonzalez
> Sent: Thu 29-Nov-2012 22:51
Thanks Markus this helps a lot!
- Mensaje original -
De: "Markus Jelsma"
Para: user@nutch.apache.org
Enviados: Viernes, 30 de Noviembre 2012 10:50:55
Asunto: RE: Fetch content inside nutch parse
See how the indexchecker fetches URL's:
http://svn.apache.org/viewvc/nutch/trunk/src/java/org
See how the indexchecker fetches URL's:
http://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java?view=markup
-Original message-
> From:Jorge Luis Betancourt Gonzalez
> Sent: Fri 30-Nov-2012 16:46
> To: user@nutch.apache.org
> Subject: Fetch
8 matches
Mail list logo