It could be interesting finding out what exactly causes such huge speed difference. For me the speed increase is on the 10x order...crazy!
On Wed, Feb 15, 2012 at 9:35 PM, Markus Jelsma <[email protected]>wrote: > > > You're both correct, after changing the type for tstamp and lastModified > > from long to date, no error anymore. > > > > Next thing I need to do is setup cygwin/svn to be able to get fresh > > svn/trunch code...it's so cool to be up-to-date. Nutch-1.4 is just > > ridiculously faster than 1.2 :-) > > > > Is it faster? I read such a thing before somewhere on the list but i really > don't know why it would be faster. Must be a case of bad settings in 1.2 i > guess. > > > > > Thanks!! > > > > Remi > > > > On Wed, Feb 15, 2012 at 9:14 PM, Markus Jelsma > > > > <[email protected]>wrote: > > > That was likely an old schema. In trunk (or was it already in1.4) it is > > > of type date. > > > http://svn.apache.org/viewvc/nutch/trunk/conf/schema.xml?view=markup > > > > > > > Remi, I had a similar problem but for a custom field that I was > trying > > > > to post to Solr (via solrindex) as a type="date" in the schema.xml. > > > > Turns > > > > > > out > > > > > > > my date string was formatted incorrectly (it was missing the trailing > > > > Z). From the error message it appears that perhaps the field into > > > > which this field is going in is set as long or int. If you set it to > > > > type="date" it should take it (and you can do Solr's date arithmetic > > > > on it. > > > > > > > > On Feb 15, 2012, at 11:01 AM, remi tassing wrote: > > > > > Awesome! > > > > > > > > > > Pushing this to Solr gives me an error (solrindex): > > > > > SEVERE: java.lang.NumberFormatException: For input string: > > > > > "2012-02-08T14:40:09.416Z" > > > > > > > > > > at java.lang.NumberFormatException.forInputString(Unknown > > > > > > Source) > > > > > > > > But I'll try to figure this out on my own > > > > > > > > > > I really appreciate your help! > > > > > > > > > > Remi > > > > > > > > > > On Wed, Feb 15, 2012 at 8:18 PM, Markus Jelsma > > > > > > > > > > <[email protected]>wrote: > > > > >> sure, use the indexchecker tool. > > > > >> > > > > >>> Is it any quick way to see the impact of index-more? I deleted > the > > > > >>> parse related folders in the segment and re-parsed it but when I > > > > >>> readseg there > > > > >> > > > > >> is > > > > >> > > > > >>> no.difference.... > > > > >>> > > > > >>> On Wednesday, February 15, 2012, Lewis John Mcgibbney < > > > > >>> > > > > >>> [email protected]> wrote: > > > > >>>> Hi, > > > > >>>> > > > > >>>> On Wed, Feb 15, 2012 at 4:00 PM, remi tassing < > > > > > > [email protected]> > > > > > > > >>> wrote: > > > > >>>>> tstamp shows a string of digits like 20020123123212 > > > > >>>> > > > > >>>> This is OK. yyyy-mm-dd-hh-mm-ssZ It is however hellishly old ! > > > > >>>> > > > > >>>>> Never heard of the plugin "index-more" and it's poorly > > > > >>>>> documented. > > > > >>>> > > > > >>>> Well it's been included in 1.2 onwards so I'm very surprised @ > > > > >>>> that. If > > > > >>> > > > > >>> you > > > > >>> > > > > >>>> feel like it then please feel free to add documentation, this is > > > > >>>> always something we are after and would be a great help to the > > > > >>>> community. > > > > >>>> > > > > >>>> After > > > > >>>> > > > > >>>>> adding this to plugins.include, I'll need to run solrindex or > is > > > > >>>>> it necessary to re-parse or recrawl (I think this less likely > > > > >>>>> IMO)? > > > > >>>> > > > > >>>> If you wish to have the fields we are able to extract with > > > > > > index-more > > > > > > > >>>> e.g. > > > > >>>> > > > > >>>> <!-- fields for index-more plugin --> 81 <field name="type" > > > > >>>> type="string" stored="true" indexed="true" 82 > > > > >>>> multiValued="true"/> 83 <field name="contentLength" type="long" > > > > >>>> stored="true" 84 indexed="false"/> 85 > > > > >>> > > > > >>> <field > > > > >>> > > > > >>>> name="lastModified" type="long" stored="true" 86 > indexed="true"/> > > > > > > 87 > > > > > > > >>> <field > > > > >>> > > > > >>>> name="date" type="string" stored="true" indexed="true"/> > > > > >>>> then you'll need to add the plugin, I would rebuild the project > if > > > > > > it > > > > > > > >> is > > > > >> > > > > >>>> possible but this is not essential, then index your content. And > > > > > > yes I > > > > > > > >>>> would expect the parsers need to be re-run to extract the > > > > > > lastModified > > > > > > > >>>> value from pages. >

