[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002820#comment-15002820 ] Hudson commented on NUTCH-2165: --- FAILURE: Integrated in Nutch-trunk #3308 (See

[jira] [Updated] (NUTCH-2168) Parse-tika fails to retrieve parser

2015-11-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2168: --- Attachment: NUTCH-2168.patch Attached patch: use constructor of TikaConfig which passes the

Build failed in Jenkins: Nutch-trunk #3308

2015-11-12 Thread Apache Jenkins Server
See Changes: [joyce] NUTCH-2165 - Fix FileDumper hard coded part-# folder -- [...truncated 14561 lines...] test: [echo] Testing plugin: urlnormalizer-basic [junit] Running

[jira] [Created] (NUTCH-2168) Parse-tika fails to retrieve parser

2015-11-12 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2168: -- Summary: Parse-tika fails to retrieve parser Key: NUTCH-2168 URL: https://issues.apache.org/jira/browse/NUTCH-2168 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-2130) copyField rawcontent creates error within schema.xml

2015-11-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2130: --- Attachment: NUTCH-2130.patch Patch attached: - field specification of "rawcontent" is added

[jira] [Updated] (NUTCH-2169) Integrate index-html into Nutch build

2015-11-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2169: --- Attachment: NUTCH-2169.patch Patch to integrate index-html into ant build and javadoc. Also

[jira] [Commented] (NUTCH-2130) copyField rawcontent creates error within schema.xml

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003551#comment-15003551 ] Lewis John McGibbney commented on NUTCH-2130: - +1 Seb please commit Sir > copyField

[jira] [Commented] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002072#comment-15002072 ] Markus Jelsma commented on NUTCH-2120: -- Im fine with removing it, we're using Hadoop's MapWritable

[jira] [Updated] (NUTCH-2130) copyField rawcontent creates error within schema.xml

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2130: Fix Version/s: (was: 2.4) 2.3.1 > copyField rawcontent

[jira] [Updated] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2160: Issue Type: Improvement (was: Bug) > Upgrade Selenium Java to 2.48.2 >

[jira] [Closed] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2120. --- Committed revision 1714068 > Remove MapWritable from trunk codebase >

[jira] [Resolved] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2160. - Resolution: Fixed Committed revision 1714071 > Upgrade Selenium Java to 2.48.2 >

[jira] [Resolved] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2120. - Resolution: Fixed Fix Version/s: (was: 1.12) 1.11 >

[jira] [Commented] (NUTCH-2160) Upgrade Selenium Java to 2.48.2

2015-11-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002285#comment-15002285 ] Hudson commented on NUTCH-2160: --- SUCCESS: Integrated in Nutch-trunk #3306 (See

[jira] [Commented] (NUTCH-2120) Remove MapWritable from trunk codebase

2015-11-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002286#comment-15002286 ] Hudson commented on NUTCH-2120: --- SUCCESS: Integrated in Nutch-trunk #3306 (See

Re: [jira] [Commented] (NUTCH-2166) Add reverse URL format to dump tool

2015-11-12 Thread Mattmann, Chris A (3980)
We’ll run into file length issues - Giuseppe had the same problem, and so did students who used it from USC hence the solution we have now. I think having nested directory structures is probably the best bet, and making it configurable.

[jira] [Commented] (NUTCH-2166) Add reverse URL format to dump tool

2015-11-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002328#comment-15002328 ] Michael Joyce commented on NUTCH-2166: -- Small change in dump format. Instead of making a bajillion

[jira] [Resolved] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing

2015-11-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2167. -- Resolution: Fixed TableUtil copied over in r1714078 and tests copied over in 1714079 >

[jira] [Commented] (NUTCH-2167) Backport TableUtil from 2.x for URL reversing

2015-11-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002399#comment-15002399 ] Hudson commented on NUTCH-2167: --- SUCCESS: Integrated in Nutch-trunk #3307 (See

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002598#comment-15002598 ] Lewis John McGibbney commented on NUTCH-2165: - +1 [~mjoyce] verified on small sample crawl

[jira] [Comment Edited] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002598#comment-15002598 ] Lewis John McGibbney edited comment on NUTCH-2165 at 11/12/15 6:39 PM:

[jira] [Commented] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002604#comment-15002604 ] Michael Joyce commented on NUTCH-2165: -- Thanks [~lewismc], I'll merge shortly > FileDumper Util hard

[jira] [Resolved] (NUTCH-2165) FileDumper Util hard codes part-# folder name

2015-11-12 Thread Michael Joyce (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Joyce resolved NUTCH-2165. -- Resolution: Fixed Committed in r1714104 > FileDumper Util hard codes part-# folder name >