Thanks Lewis.
Well i am running nutch on Hadoop, and using solr to index.
In the solr console what i observer is that it took 5 urls out 0f 6 urls
specified in the seed link.
When i saw the code, it is throwing the error in the following function
(SolrDeleteDuplicates.java) in the second line...
BOOST_FIELD is string and in the function it is being casted to string, and
hence CCE
public void readSolrDocument(SolrDocument doc) {
id = (String)doc.getFieldValue(SolrConstants.ID_FIELD);
boost = (Float)doc.getFieldValue(SolrConstants.BOOST_FIELD);
Date buffer = (Date)doc.getFieldValue(SolrConstants.TIMESTAMP_FIELD);
tstamp = buffer.getTime();
}
And the specific error in the task is as follows:
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Float
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecord.readSolrDocument(SolrDeleteDuplicates.java:128)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:271)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Any help is appreciated.
Regards,
Som Shekhar Sharma
On Thu, Jul 26, 2012 at 5:21 PM, Lewis John Mcgibbney <
[email protected]> wrote:
> Hi,
>
> On Thu, Jul 26, 2012 at 4:00 AM, shekhar sharma <[email protected]>
> wrote:
> > Hello,
> > i am getting class cast exception while indexing the pages using solr.
>
> I don't think this is the case at all. I think your getting CCE when
> using solrdedup. These are two completely different indexing tools.
>
> > SolrIndexer: finished at 2012-07-26 08:02:55, elapsed: 00:00:33
>
> > As you can see the UrlNormalizing and UrlFiltering , both are
> false...while
> > doing crawling using Nutch it is true..
>
> Please see
> http://wiki.apache.org/nutch/bin/nutch%20solrindex
> If you wish to provide these params via CLI then they need to be
> explicitly defined.
>
> >
> > i am using Nutch trunk(1.6) and Solr trunk (5.0) and i am using
> > schema-solr4.xml (Came with Nutch source, i renamed to schema.xml) and
> > copied to example/solr/collections1/conf folder...
>
> Please check that all your fields are correctly defined within the
> Schema. I must admit I have not personally tried this schema with Solr
> trunk 5.X and I haven't heard/seen it tried to date therefore all
> default configurations may not work flawlessly.
>
> Best
> Lewis
>
>
> --
> Lewis
>