I will then try deactivating the parse-metatags plugin.... 

Btw do you or anyone know what modifications exactly are required on side of 
Apache Solr to get the metatags working?

Regards



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <[email protected]>
To: ML mail <[email protected]>; [email protected]
Cc: 
Sent: Friday, May 11, 2012 10:38 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hi.
I only have index-metatags plugins in my nutch-site.xml and is function 
succesfully I also was trying with parse-metatags without positive result and 
finaly dont use it.
also make sure that your schema in nutch is the same in solr.

if your index is not big you can erase the folder of your solr index and nutch 
data.
nutch(crawldb, linkdb, segment)
solr(index, spellchecker).




**************************************************************************

----- Mensaje original -----
De: "ML mail" <[email protected]>
Para: "Ing. Eyeris Rodriguez Rueda" <[email protected]>, [email protected]
Enviados: Viernes, 11 de Mayo 2012 19:37:13
Asunto: Re: Indexing HTML metatags from Nutch into Solr

Hi,

Actually I have already done all that, as I followed the Nutch Wiki for this 
purpose: http://wiki.apache.org/nutch/IndexMetatags

Now your suggestion about cleaning my segments as well as solr index then 
re-index is a good idea. Could you just help me on the commands to achieve 
these 3 steps?

Many thanks!



----- Original Message -----
From: Ing. Eyeris Rodriguez Rueda <[email protected]>
To: [email protected]; ML mail <[email protected]>
Cc:
Sent: Friday, May 11, 2012 7:55 PM
Subject: Re: Indexing HTML metatags from Nutch into Solr

Hello, I am using index-metatags plugins(I supose that you have index-metatags 
plugins on nutch's plugins folder).
Fist you need to include on nutch-site some like this
|index-(basic|anchor|metatags|more)|
also you need to include the metadata names that you want to index(in this file 
also):
<property>
    <name>metatags.names</name>
    
<value>category;keywords;author;comments;description;subject;last_modified</value>
    <description>For plugin index-metatags: Indicate here the name of the
    html meta tag that should be
    parsed. Use a semicolon separated list if you want multiple
    tags, or use '*' to index all.
    Example: description;keywords;role
</description>
</property>
>I have only 
>this(category;keywords;author;comments;description;subject;last_modified).
after you have to configure your solrindex-mapping like this:
<field dest="subject" source="subject" />
<field dest="description" source="description" />
<field dest="comments" source="comments" />
<field dest="author" source="author"/>
<field dest="keywords" source="keywords" />
<field dest="category" source="category" />
<field dest="lastModified" source="lastModified"/>

I suggest clean your segments and solr index and reindex again.
I think that your problem will be solved with this.

****************************************************************************************

----- Mensaje original -----
De: "ML mail" <[email protected]>
Para: [email protected]
Enviados: Viernes, 11 de Mayo 2012 6:40:36
Asunto: Indexing HTML metatags from Nutch into Solr

Hello,

I am using Nutch 1.4 with Solr 3.6.0 and would like to get the HTML keywords 
and description metatags indexed into Solr. On the Nutch side I have followed 
thehttp://wiki.apache.org/nutch/IndexMetatags to get nutch parsing the 
extracting the metatags (using index-metatags and parse-metatags plugins) but 
now when I run the solrindex they simply don't get indexed. 

In Solr I am using the schema.xml provided by Nutch and have added the 
following fields for the metatags:
 
        <!-- fields for the metatags plugin -->
        <field name="metatag.description" type="text" stored="true" 
indexed="true"/>
        <field name="metatag.keywords" type="text" stored="true" 
indexed="true"/>

and have created a solrindex-mapping.xml file as follow:

<mapping>
<fields>
<field dest="description" source="metatag.description"/>
<field dest="keywords" source="metatag.keywords"/>
</fields>
</mapping>

the rest is pretty much a default install of Solr. So now my question is why 
can't I see the metatags indexed in solr? Did I forget maybe to configure 
something in Solr?

Any suggestions are welcome.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Reply via email to