Seems to me it's just the breadcrumb of the page popping up in Solr's highlighter snippet?

In Thu, 5 Apr 2012 22:02:31 +0100, Lewis John Mcgibbney <lewis.mcgibb...@gmail.com> wrote:
I can't see any of your attachments as they're not permitted on list.

Can you provide an URL?

On Thu, Apr 5, 2012 at 9:56 PM, alessio crisantemi <
alessio.crisant...@gmail.com> wrote:

Dear Lewis, thank you for your fast reply.
But just thiat's my problem! I don't compred wich is the field that crates
this raw.

But I see a date (eg: "Mercoledì Apr 04") followed by the word "parent" anche after ">" and the the ame of categories (Home NEWSLOT/VLT SCOMMESSE
ONLINE LOTTERIE Politica Video Live Score").

Do you know wich field of default nutch configuration generate the 'parent'
raw.

as you can see in the attachement, this raw is into the content field,
between 'str' tags.
..
suggestions?
tx
a.

Il giorno 05 aprile 2012 22:45, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> ha scritto:

> Hi Alessio,
>
> You need to determine in which field the unwanted content exists. Once > you've done this you could write an indexing filter to remove this from
> your document prior to indexing.
>
> Lewis
>
> On Thu, Apr 5, 2012 at 9:41 PM, alessio crisantemi <
> alessio.crisant...@gmail.com> wrote:
>
> >
> >
> > ---------- Messaggio inoltrato ----------
> > Da: alessio crisantemi <alessio.crisant...@gmail.com>
> > Date: 05 aprile 2012 22:32
> > Oggetto: request about snippets
> > A: user@nutch.apache.org
> >
> >
> > Dear all,
> > I configured my Nutch (1.4) for works with Solr (1.4.1) and I crawl and
> > index with success my website.
> >
> > I have only a problem with the results of my researches.
> > Into all results, the snippets have a raw with a string where I can
read
> > all the categories of my website. I attached a screen shot for explain: > > here, the no good raw is "Mercoledì Apr 04 parent"> Home NEWSLOT/VLT
> > SCOMMESSE ONLINE LOTTERIE Politica Video Live Score ")
> >
> > This is a problem, because if solr read for any page the same raw, when
> my
> > query is the same word of this raw (eg: 'ONLINe') I have all my solr
> index
> > like a result.
> >
> > When I can jump this raw during my crawling? Is possible exclude this
> raw?
> > thank you in adavande
> > alessio
> >
> >
>
>
> --
> *Lewis*
>


--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Reply via email to