Http Post authentication?

2012-07-05 Thread 12rad
Hi, Does Nutch 1.4 have support for POST based authentication that depends on cookies? If not, how do I work around sites that need this authentication. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Http-Post-authentication-tp3993343.html Sent from the Nutch

Re: Nutch 1.4 with Solr 3.6 - compatible?

2012-07-05 Thread 12rad
Thanks! That fixed my problem:) -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-1-4-with-Solr-3-6-compatible-tp3992890p3993327.html Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Nutch Author, Publication, and Religion Detection

2012-07-05 Thread Lewis John Mcgibbney
Sounds great, glad you got something. Lewis On Thu, Jul 5, 2012 at 6:04 PM, JAB wrote: > > Thanks for the advice. Currently I'm looking at a simplified GATE Gazetteer > approach. My customer isn't clear on what he wants and the requirements I > came up with may be overkill. > > -- > View this mes

Re: NutchField

2012-07-05 Thread Lewis John Mcgibbney
How is your schema and accompanying solr-mapping.xml? These need to be spot on or else you can expect sometimes confusing results. hth On Thu, Jul 5, 2012 at 7:58 PM, Jim Chandler wrote: > Markus, > > Thanks for the speedy reply. > > I attempted your suggestion and my error changed now I'm gett

Re: NutchField

2012-07-05 Thread Jim Chandler
Markus, Thanks for the speedy reply. I attempted your suggestion and my error changed now I'm getting: org.apache.solr.common.SolrException: [doc=null] missing required field: id I am relatively new at this and all help is very appreciated. Thanks On Thu, Jul 5, 2012 at 11:41 AM, Markus Jelsm

Re: Nutch Author, Publication, and Religion Detection

2012-07-05 Thread JAB
Thanks for the advice. Currently I'm looking at a simplified GATE Gazetteer approach. My customer isn't clear on what he wants and the requirements I came up with may be overkill. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-Nutch-Author-Publication-and-Religion-Detect

RE: NutchField

2012-07-05 Thread Markus Jelsma
Hello, The index-more plugin might run after your custom plugin. You can configure the order in which plugins are run. Please consult the indexingfilter.order directive's description.in conf/nutch-default.xml. Cheers, -Original message- > From:Jim Chandler > Sent: Thu 05-Jul-2012

NutchField

2012-07-05 Thread Jim Chandler
Greetings All, I'm trying to write access NutchFields that have been written to the NutchDocument earlier by the index-more plugin. When I use NutchDocument.getFields() all that is returned is the segment and digest fields. I know that index-more adds date, type, content-length. Could someone p

RE: Adaptive scheduling, but different

2012-07-05 Thread Markus Jelsma
URI consistency is not under our control. Perhaps we should attempt to identify these pages first. Thanks -Original message- > From:Lewis John Mcgibbney > Sent: Thu 05-Jul-2012 10:56 > To: user@nutch.apache.org > Subject: Re: Adaptive scheduling, but different > > Hi Markus, > This

Re: Adaptive scheduling, but different

2012-07-05 Thread Lewis John Mcgibbney
Hi Markus, This is a tricky one, I have personally had terrible headaches with similar problems where an update to a piece of legislation completely changes it's URL, which makes the task of provenance hellishly complex... We addressed this by ensuring that legislation URI's stay consistent regardl

RE: Adaptive scheduling, but different

2012-07-05 Thread Markus Jelsma
Any ideas? -Original message- > From:Markus Jelsma > Sent: Mon 02-Jul-2012 23:05 > To: user@nutch.apache.org > Subject: Adaptive scheduling, but different > > Hi, > > We use an adaptive scheduler for our crawl, this works fine for most cases > but a specific type of page is crawled