Re: Dump all urls from merged index

2011-06-08 Thread MilleBii
I don't know what you call large, but its around 21GB currently. By the way thx for automaton filter, worked great and works much faster now. Actually I gained a x4 in the generate phase instead of loosing time by adding regexes. 2011/6/8 Julien Nioche lists.digitalpeb...@gmail.com or you can

Re: [RESULT] [VOTE] Apache Nutch 1.3 Release Candidate #3

2011-06-08 Thread Julien Nioche
Thanks Chris for doing the releases, organising the votes, spreading the word etc... and thanks to all contributers and users On 8 June 2011 05:01, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, This VOTE has passed with the following tallies: +1 Nutch PMC Chris

Re: nutch NoClassDefFound

2011-06-08 Thread lewis john mcgibbney
Hi, I suggest that before you try to progress any further with this you read as much of the wiki [1] as you can, in particular I would start here [2] [3] After this, try looking through some of the source and understanding what parameters are required to run various commands. The reason for this

Re: [RESULT] [VOTE] Apache Nutch 1.3 Release Candidate #3

2011-06-08 Thread Markus Jelsma
Great! Thanks! Can you also add version 1.4 to Jira? Hi Folks, This VOTE has passed with the following tallies: +1 Nutch PMC Chris Mattmann Markus Jelsma Julien Nioche Lewis John McGibbney I'll go ahead and push the release to the mirrors and release the Maven repo to Central and

Updates to Nutch Wiki

2011-06-08 Thread lewis john mcgibbney
Hi everyone, Was wondering if anyone (familiar with the topics) would be interested in sending me material for the following pages [1] [2]. The links appear to be non existent in our wiki and it would be nice to get some material on these topics if these topics are important and are required!

Re: searcher.dir not working

2011-06-08 Thread lewis john mcgibbney
Hi abhayd, In short...yes. Although you have correctly specified an absolute path, you need to drop the /crawldb/current/part-0 A good resource for this stuff can usually be found on the mailing lists. On Wed, Jun 8, 2011 at 8:03 AM, abhayd ajdabhol...@hotmail.com wrote: hi I am using

Re: Nutch Plugin: add several fields at once

2011-06-08 Thread jasimop
This is still an open issue for me and I have not found a solution for it. Just to be sure: is it possible to add several fields to the index from within one plugin? How do you pass data from parsing to indexing stage? Any plugin I could look at to get an idea? As described in my last post putting

Re: Nutch Plugin: add several fields at once

2011-06-08 Thread MilleBii
Of course it is possible to add multiple fields, I have that running daily. Have a look at the more plugin to see how it works. Here an example. In the filter method: String content=some data; doc.add(myfield, content); AND you need to configure the field in