[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961.patch
Updated patch. ExtractorRepository was missing.
> Expose Tika's boilerpi
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Fix Version/s: 1.12
> Expose Tika's boilerpipe support
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Affects Version/s: 1.11
> Expose Tika's boilerpipe support
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Description:
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
boilerp
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Description:
Tika 0.8 comes with the Boilerpipe content handler which can be used to extract
boilerp
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961.patch
Patch for trunk.
> Expose Tika's boilerpipe support
> --
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vincent Slot updated NUTCH-961:
---
Attachment: NUTCH-961-1.11-1.patch
Modified the NUTCH-961 patch for 1.11
> Expose Tika's boilerpipe su
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Kingson updated NUTCH-961:
Attachment: nutch-2.x-boilerpipe.patch
> Expose Tika's boilerpipe support
> -
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Fix Version/s: 1.8
> Expose Tika's boilerpipe support
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961-1.8-1.patch
Updated patch for trunk. Estimator code has been removed. Parser s
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-961:
---
Fix Version/s: 2.2
> Expose Tika's boilerpipe support
> --
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roland updated NUTCH-961:
-
Attachment: NUTCH-961-2.1-v2.patch
- now with working config options
- cleanup (removed unused useBoilerpipeEstima
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roland updated NUTCH-961:
-
Attachment: NUTCH-961-2.1-v1.patch
Status:
- ported
- compiles
- yields same results as stock 2.1 if disabled (tik
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Fix Version/s: (was: 1.5)
1.6
20120304-push-1.6
> Expose Tik
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961-1.5-1.patch
Here's a working patch we use in production. This includes a nasty
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Fix Version/s: (was: 1.4)
(was: 2.0)
1.5
Assig
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961-1.4-dombuilder-1.patch
With BP enabled you can get an java.util.EmptyStackExce
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Patch Info: [Patch Available]
> Expose Tika's boilerpipe support
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961-1.3-3.patch
Patch to include mark up from Tika. Anchors are now detected but l
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Fix Version/s: 1.4
> Expose Tika's boilerpipe support
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: NUTCH-961v2.patch
cleaned up patch.
To reproduce:
{code}
export NUTCH_HOME=`pwd`"
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: (was: NUTCH-961v2.patch)
> Expose Tika's boilerpipe support
>
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: NUTCH-961v2.patch
Tested the patch against a checkout of 1.3 branch at revision 11
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: (was: NUTCH-961-1.3-tikaparser1.patch)
> Expose Tika's boilerpipe support
> --
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: NUTCH-961-1.3-tikaparser1.patch
Modified to include necessary changes to parse-plu
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: NUTCH-961-1.3-tikaparser1.patch
Same as NUTCH-961-1.3-tikaparser.patch by Markus b
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: BoilerpipeExtractorRepository.java
Here's the correct file.
> Expose Tika's boilerpipe
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: (was: BoilerpipeExtractorRepository.java)
> Expose Tika's boilerpipe support
> -
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-961:
Attachment: NUTCH-961-1.3-tikaparser.patch
BoilerpipeExtractorRepository.java
Here's
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-961:
Fix Version/s: (was: 1.3)
Tika 0.8 has some issues with PDF parsing, it would be better to use t
30 matches
Mail list logo