Re: Adaptive generate.max.count

2011-11-04 Thread Ferdy Galema
Hi Markus, I was wondering what you exactly mean with dynamic. Is it different per fetch cycle but for all queues or do you mean a different value for different queues. (For example, when type is HOST, hostA will have a different generate max count than hostB). Ferdy. On 11/04/2011 12:32 AM

Re: The old search page?

2011-11-04 Thread Lewis John Mcgibbney
Hi John, Please read the latest tutorial on the Nutch wiki for a comprehensive account of where the project has been moving over the last year or so. Indexing > Solr Parsing (mostly) > Tika basically delegating as much to more mature and technically superior projects rather than trying to support

[jira] [Commented] (NUTCH-1070) Run nutch under native windows (no cygwin)

2011-11-04 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143966#comment-13143966 ] Lewis John McGibbney commented on NUTCH-1070: - His this issue to be closed off

Re: Adaptive generate.max.count

2011-11-04 Thread Markus Jelsma
On Friday 04 November 2011 13:39:25 Ferdy Galema wrote: > Hi Markus, > > I was wondering what you exactly mean with dynamic. Is it different per > fetch cycle but for all queues or do you mean a different value for > different queues. (For example, when type is HOST, hostA will have a > differen

[jira] [Commented] (NUTCH-1070) Run nutch under native windows (no cygwin)

2011-11-04 Thread Radim Kolar (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143984#comment-13143984 ] Radim Kolar commented on NUTCH-1070: i closed it because i removed my patches, i will

[jira] [Commented] (NUTCH-1194) CrawlDB lock should be released earlier

2011-11-04 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143993#comment-13143993 ] Markus Jelsma commented on NUTCH-1194: -- The comment above mine was removed by the use

[jira] [Updated] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Radim Kolar (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar updated NUTCH-1098: --- Attachment: (was: patch-with-utf8-encoding.diff) > better url-normalizer basic >

Re: Adaptive generate.max.count

2011-11-04 Thread Ferdy Galema
Using an adaptive setting is a pretty daunting task. Perhaps a nice start would be creating a mechanism that allows exceptional queue settings set *by hand*? A resource file would fit purpose for this. Later on it could be replaced by automatic settings. On 11/04/2011 01:56 PM, Markus Jelsma w

[jira] [Updated] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1098: - Attachment: patch-with-utf8-encoding.diff Restored original patch. > better url-

[jira] [Resolved] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Radim Kolar (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar resolved NUTCH-1098. Resolution: Invalid Attached patch was in improper format. > better url-normalizer

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Radim Kolar (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144020#comment-13144020 ] Radim Kolar commented on NUTCH-1098: By removing my patch i also withdraw permission w

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144025#comment-13144025 ] Ferdy Galema commented on NUTCH-1098: - Radim, that's funny. I don't believe that is po

The old search page?

2011-11-04 Thread John Whelan
I’ve been ‘out of it’ for a while. It used to be that Nutch has a localized HTML search page that featured these guys. Did1.3 bring this forward insome form that I cannot find (maybe involving an XSL on search results?), or has this just

[jira] [Updated] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-04 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1196: Patch Info: Patch Available > Update job should impose an upper limit on the number of inlinks

[jira] [Commented] (NUTCH-1070) Run nutch under native windows (no cygwin)

2011-11-04 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144084#comment-13144084 ] Lewis John McGibbney commented on NUTCH-1070: - Thanks for your comments Radim.

[jira] [Updated] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-04 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1196: Attachment: NUTCH-1196.patch Patch done. It applies the db.update.max.inlinks just like Nutch trunk

[jira] [Updated] (NUTCH-1198) Less verbose logging when unmapped mimetypes are trying to be parsed.

2011-11-04 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1198: Attachment: NUTCH-1198.patch > Less verbose logging when unmapped mimetypes are trying to be pa

[jira] [Created] (NUTCH-1198) Less verbose logging when unmapped mimetypes are trying to be parsed.

2011-11-04 Thread Ferdy Galema (Created) (JIRA)
Less verbose logging when unmapped mimetypes are trying to be parsed. - Key: NUTCH-1198 URL: https://issues.apache.org/jira/browse/NUTCH-1198 Project: Nutch Issue Type: Impr

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Chris A. Mattmann (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144099#comment-13144099 ] Chris A. Mattmann commented on NUTCH-1098: -- Guys: let's change the tone of this i

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144132#comment-13144132 ] Ferdy Galema commented on NUTCH-1098: - Like I said before, I'm up for converting space

[jira] [Closed] (NUTCH-1070) Run nutch under native windows (no cygwin)

2011-11-04 Thread Radim Kolar (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Radim Kolar closed NUTCH-1070. -- > Run nutch under native windows (no cygwin) > -- > >

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Radim Kolar (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144160#comment-13144160 ] Radim Kolar commented on NUTCH-1098: If you are so clever and hard working then stop u

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Chris A. Mattmann (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144168#comment-13144168 ] Chris A. Mattmann commented on NUTCH-1098: -- {quote} You simply need months to dis

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Radim Kolar (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144183#comment-13144183 ] Radim Kolar commented on NUTCH-1098: Remove my patch from this ticket. I hold copyrigh

[jira] [Updated] (NUTCH-1189) add commented out default settings to gora.properties files

2011-11-04 Thread Lewis John McGibbney (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1189: Attachment: NUTCH-1189-v2.patch 2nd edition added to acknowledge some pointers from

[jira] [Commented] (NUTCH-1189) add commented out default settings to gora.properties files

2011-11-04 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144199#comment-13144199 ] Lewis John McGibbney commented on NUTCH-1189: - In addition, there is scope to

[jira] [Commented] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-04 Thread Andrzej Bialecki (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144226#comment-13144226 ] Andrzej Bialecki commented on NUTCH-1196: -- Very nicely done and useful patch! A

[jira] [Commented] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-04 Thread Ferdy Galema (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144274#comment-13144274 ] Ferdy Galema commented on NUTCH-1196: - Thanks Andrzej. When I have the chance I will i

[jira] [Updated] (NUTCH-1197) Add statically configured field values to solrindex-mapping.xml

2011-11-04 Thread Chris A. Mattmann (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1197: - Fix Version/s: (was: 1.4) 1.5 - push out. > Add s

[VOTE] Apache Nutch 1.4 release rc #1

2011-11-04 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Nutch 1.4 release is available at: http://people.apache.org/~mattmann/apache-nutch-1.4/rc1/ The release candidate is a zip and tar.gz archive of the sources in: http://svn.apache.org/repos/asf/nutch/tags/release-1.4/ And a binary build suitable for deploymen

[jira] [Commented] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Chris A. Mattmann (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144586#comment-13144586 ] Chris A. Mattmann commented on NUTCH-1098: -- I was told the following from an expe

[jira] [Closed] (NUTCH-1098) better url-normalizer basic

2011-11-04 Thread Markus Jelsma (Closed) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-1098. Assignee: (was: Markus Jelsma) > better url-normalizer basic > ---

Nutch Maven artifacts now published as polled/nightly SNAPSHOTS

2011-11-04 Thread Mattmann, Chris A (388J)
Hey Guys, I modified the Jenkins jobs that Lewis set up to now: * poll SCM hourly for changes to Nutch * publish Maven snapshots (1.5-SNAPSHOT) and above of Nutch to repository.apache.org Cheers, Chris ++ Chris Mattmann, Ph.D. Sen