[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961.patch Updated patch. ExtractorRepository was missing. > Expose Tika's boilerpi

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: 1.12 > Expose Tika's boilerpipe support > > >

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Affects Version/s: 1.11 > Expose Tika's boilerpipe support > > >

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Description: Tika 0.8 comes with the Boilerpipe content handler which can be used to extract boilerp

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Description: Tika 0.8 comes with the Boilerpipe content handler which can be used to extract boilerp

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2016-02-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961.patch Patch for trunk. > Expose Tika's boilerpipe support > --

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2015-12-08 Thread Vincent Slot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent Slot updated NUTCH-961: --- Attachment: NUTCH-961-1.11-1.patch Modified the NUTCH-961 patch for 1.11 > Expose Tika's boilerpipe su

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2015-04-01 Thread Alexander Kingson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Kingson updated NUTCH-961: Attachment: nutch-2.x-boilerpipe.patch > Expose Tika's boilerpipe support > -

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2013-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: 1.8 > Expose Tika's boilerpipe support > > >

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2013-06-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961-1.8-1.patch Updated patch for trunk. Estimator code has been removed. Parser s

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2013-03-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-961: --- Fix Version/s: 2.2 > Expose Tika's boilerpipe support > --

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2013-03-06 Thread Roland (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roland updated NUTCH-961: - Attachment: NUTCH-961-2.1-v2.patch - now with working config options - cleanup (removed unused useBoilerpipeEstima

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2013-03-06 Thread Roland (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roland updated NUTCH-961: - Attachment: NUTCH-961-2.1-v1.patch Status: - ported - compiles - yields same results as stock 2.1 if disabled (tik

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2012-04-03 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: (was: 1.5) 1.6 20120304-push-1.6 > Expose Tik

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-11-22 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961-1.5-1.patch Here's a working patch we use in production. This includes a nasty

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-09-28 Thread Markus Jelsma (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: (was: 1.4) (was: 2.0) 1.5 Assig

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-07-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961-1.4-dombuilder-1.patch With BP enabled you can get an java.util.EmptyStackExce

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-06-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Patch Info: [Patch Available] > Expose Tika's boilerpipe support >

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-06-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961-1.3-3.patch Patch to include mark up from Tika. Anchors are now detected but l

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-06-10 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Fix Version/s: 1.4 > Expose Tika's boilerpipe support > > >

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-06-02 Thread Gabriele Kahlout (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriele Kahlout updated NUTCH-961: --- Attachment: NUTCH-961v2.patch cleaned up patch. To reproduce: {code} export NUTCH_HOME=`pwd`"

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-06-02 Thread Gabriele Kahlout (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriele Kahlout updated NUTCH-961: --- Attachment: (was: NUTCH-961v2.patch) > Expose Tika's boilerpipe support >

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-06-02 Thread Gabriele Kahlout (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriele Kahlout updated NUTCH-961: --- Attachment: NUTCH-961v2.patch Tested the patch against a checkout of 1.3 branch at revision 11

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-06-02 Thread Gabriele Kahlout (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriele Kahlout updated NUTCH-961: --- Attachment: (was: NUTCH-961-1.3-tikaparser1.patch) > Expose Tika's boilerpipe support > --

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-05-11 Thread Gabriele Kahlout (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriele Kahlout updated NUTCH-961: --- Attachment: NUTCH-961-1.3-tikaparser1.patch Modified to include necessary changes to parse-plu

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-05-11 Thread Gabriele Kahlout (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriele Kahlout updated NUTCH-961: --- Attachment: NUTCH-961-1.3-tikaparser1.patch Same as NUTCH-961-1.3-tikaparser.patch by Markus b

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: BoilerpipeExtractorRepository.java Here's the correct file. > Expose Tika's boilerpipe

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: (was: BoilerpipeExtractorRepository.java) > Expose Tika's boilerpipe support > -

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-04-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961-1.3-tikaparser.patch BoilerpipeExtractorRepository.java Here's

[jira] Updated: (NUTCH-961) Expose Tika's boilerpipe support

2011-01-24 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-961: Fix Version/s: (was: 1.3) Tika 0.8 has some issues with PDF parsing, it would be better to use t