[
https://issues.apache.org/jira/browse/NUTCH-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836191#comment-17836191
]
Tim Allison commented on NUTCH-3040:
:cry-sob: This is great news!
> Upg
[
https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834532#comment-17834532
]
Tim Allison commented on NUTCH-2937:
I really, really, really wish we didn'
[
https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827510#comment-17827510
]
Tim Allison edited comment on NUTCH-3026 at 3/15/24 2:1
[
https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-3026.
Resolution: Won't Fix
Lost support for working on this issue.
> Allow statusOnly op
[
https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825440#comment-17825440
]
Tim Allison commented on NUTCH-3026:
I should close out the PR and this issue.
[
https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794972#comment-17794972
]
Tim Allison commented on NUTCH-3026:
Anyone have any time for feedback, even if
[
https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787372#comment-17787372
]
Tim Allison commented on NUTCH-3026:
The above PR is a WIP for discussion. Le
[
https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3026:
---
Description:
This issue follows on from discussion here:
https://lists.apache.org/thread
[
https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3026:
---
Description:
This issue follows on from discussion here:
https://lists.apache.org/thread
[
https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3026:
---
Issue Type: New Feature (was: Task)
> Allow statusOnly option for Indexing
Tim Allison created NUTCH-3026:
--
Summary: Allow statusOnly option for IndexingJob
Key: NUTCH-3026
URL: https://issues.apache.org/jira/browse/NUTCH-3026
Project: Nutch
Issue Type: Task
[
https://issues.apache.org/jira/browse/NUTCH-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-3020.
Fix Version/s: 1.20
Resolution: Fixed
> ParseSegment should check for protocol's f
[
https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783352#comment-17783352
]
Tim Allison commented on NUTCH-3019:
{noformat}
[junit] Tests run: 7, Fail
[
https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-3019.
Fix Version/s: 1.20
Resolution: Fixed
> Upgrade to Apache Tika 2.
[
https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783254#comment-17783254
]
Tim Allison edited comment on NUTCH-3019 at 11/6/23 3:4
[
https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783252#comment-17783252
]
Tim Allison edited comment on NUTCH-3019 at 11/6/23 3:3
[
https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783252#comment-17783252
]
Tim Allison commented on NUTCH-3019:
ParserStatus
failed=84
suc
Tim Allison created NUTCH-3021:
--
Summary: Improve http-protocol to identify truncated content
Key: NUTCH-3021
URL: https://issues.apache.org/jira/browse/NUTCH-3021
Project: Nutch
Issue Type
Tim Allison created NUTCH-3020:
--
Summary: ParseSegment should check for protocol's flags for
truncation
Key: NUTCH-3020
URL: https://issues.apache.org/jira/browse/NUTCH-3020
Project:
[
https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3018:
---
Description:
It looks like it takes between 2x and 4x of the time to initialize the remote
[
https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781485#comment-17781485
]
Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:5
[
https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781485#comment-17781485
]
Tim Allison commented on NUTCH-3018:
On further reflection, what the above mean
[
https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781483#comment-17781483
]
Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:4
[
https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781483#comment-17781483
]
Tim Allison commented on NUTCH-3018:
It looks like we cannot create more web dri
[
https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3018:
---
Description:
It looks like it takes between 2x and 4x of the time to initialize the remote
[
https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781482#comment-17781482
]
Tim Allison commented on NUTCH-3019:
Separately, I noticed that logging from
Tim Allison created NUTCH-3019:
--
Summary: Upgrade to Apache Tika 2.9.1
Key: NUTCH-3019
URL: https://issues.apache.org/jira/browse/NUTCH-3019
Project: Nutch
Issue Type: Task
Reporter
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-2959.
Resolution: Fixed
> Upgrade to Apache Tika 2.
Tim Allison created NUTCH-3018:
--
Summary: Consider pooling remote webdrivers for Selenium?
Key: NUTCH-3018
URL: https://issues.apache.org/jira/browse/NUTCH-3018
Project: Nutch
Issue Type: Task
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771476#comment-17771476
]
Tim Allison commented on NUTCH-2959:
If you and the Nutch team are ok with the
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771170#comment-17771170
]
Tim Allison edited comment on NUTCH-2959 at 10/2/23 3:5
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771170#comment-17771170
]
Tim Allison commented on NUTCH-2959:
I've continued to stub my toes on
Sorry for two emails...
Migrating javax->jakarta has been quite a chore on Tika because of
dependencies. Given back-compat issues with hadoop, is this even on the
horizon for Nutch?
On Thu, Sep 28, 2023 at 9:29 AM Tim Allison wrote:
> Y, I'd like to get a working Tika version i
Y, I'd like to get a working Tika version in a release fairly soon. Not
sure how much effort a release is?
On Thu, Sep 28, 2023 at 8:29 AM Sebastian Nagel wrote:
> Hi Lewis,
>
> thanks!
>
> I'd put on top of the list
>
> * release 1.20
>
> Since the release of 1.19 more than one year has elapse
[
https://issues.apache.org/jira/browse/NUTCH-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770059#comment-17770059
]
Tim Allison commented on NUTCH-3006:
An alternative approach would be for Tik
Tim Allison created NUTCH-3005:
--
Summary: Upgrade selenium as needed
Key: NUTCH-3005
URL: https://issues.apache.org/jira/browse/NUTCH-3005
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-3004.
Resolution: Fixed
> Avoid NPE in HttpResponse
> -
>
>
Tim Allison created NUTCH-3004:
--
Summary: Avoid NPE in HttpResponse
Key: NUTCH-3004
URL: https://issues.apache.org/jira/browse/NUTCH-3004
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766832#comment-17766832
]
Tim Allison commented on NUTCH-2937:
As [~snagel] pointed out on the PR for N
Tim Allison created NUTCH-3003:
--
Summary: Consider integration testing in a Dockerized mini-hadoop
cluster via testcontainers?
Key: NUTCH-3003
URL: https://issues.apache.org/jira/browse/NUTCH-3003
[
https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-2978.
Fix Version/s: 1.20
Resolution: Fixed
Many thanks [~markus17] for all of the work on this
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-2959:
---
Summary: Upgrade to Apache Tika 2.9.0 (was: Upgrade to Apache Tika 2.4.1)
> Upgrade to Apache T
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765306#comment-17765306
]
Tim Allison commented on NUTCH-2959:
Currently working on this to bump to Tika 2
[
https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-2998.
Fix Version/s: 1.20
Resolution: Fixed
> Remove the Any23 plu
[
https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-3000.
Fix Version/s: 1.20
Resolution: Fixed
> protocol-selenium returns only the body,strips
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-3001.
Fix Version/s: 1.20
Resolution: Fixed
> protocol-selenium requires Content-Type hea
[
https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764741#comment-17764741
]
Tim Allison commented on NUTCH-2998:
Sorry, I botched the title in the PR: h
All,
I opened https://issues.apache.org/jira/browse/NUTCH-2998 a few weeks
ago. Any23 was moved to the attic in June. Unless there are objections, I
propose removing it from Nutch before the next release.
Any objections?
Best,
Tim
[
https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764705#comment-17764705
]
Tim Allison commented on NUTCH-2978:
I haven't tested in hadoop. I'
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
---
Description:
It looks like the selenium protocol requires that there be a content-type
header
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
---
Priority: Minor (was: Major)
> protocol-selenium requires Content-Type hea
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764698#comment-17764698
]
Tim Allison commented on NUTCH-3001:
Or is the notion that if the selenium prot
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
---
Description:
It looks like the selenium protocol requires that there be a content-type
header
[
https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-3001:
---
Description:
It looks like the selenium protocol requires that there be content-type.
The logic
Tim Allison created NUTCH-3001:
--
Summary: protocol-selenium requires Content-Type header
Key: NUTCH-3001
URL: https://issues.apache.org/jira/browse/NUTCH-3001
Project: Nutch
Issue Type: Bug
Tim Allison created NUTCH-3000:
--
Summary: protocol-selenium returns only the body,strips off the
element
Key: NUTCH-3000
URL: https://issues.apache.org/jira/browse/NUTCH-3000
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764376#comment-17764376
]
Tim Allison commented on NUTCH-2998:
I don't want to make such a drast
[
https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760926#comment-17760926
]
Tim Allison commented on NUTCH-2978:
K, I think https://github.com/apache/nutch/
[
https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-2999.
Resolution: Fixed
Updated PR should have fixed that issue. Would be nice to add testcontainers
[
https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reopened NUTCH-2999:
The applied PR breaks the lucene-based indexers.
> Update Lucene version to latest
[
https://issues.apache.org/jira/browse/NUTCH-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-2961.
Resolution: Fixed
I confirmed we can simply remove those dependencies. I fixed this as part of
[
https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-2999.
Fix Version/s: 1.20
Resolution: Fixed
Thank you [~markus17] for the review!
> Upd
[
https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760512#comment-17760512
]
Tim Allison commented on NUTCH-2999:
This PR also takes care of NUTCH-2961
>
[
https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760511#comment-17760511
]
Tim Allison commented on NUTCH-2999:
https://github.com/apache/nutch/pull
Tim Allison created NUTCH-2999:
--
Summary: Update Lucene version to latest 8.x
Key: NUTCH-2999
URL: https://issues.apache.org/jira/browse/NUTCH-2999
Project: Nutch
Issue Type: Task
[
https://issues.apache.org/jira/browse/NUTCH-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760508#comment-17760508
]
Tim Allison commented on NUTCH-2961:
It looks like neither mahout nor lucene
Tim Allison created NUTCH-2998:
--
Summary: Remove the Any23 plugin
Key: NUTCH-2998
URL: https://issues.apache.org/jira/browse/NUTCH-2998
Project: Nutch
Issue Type: Task
Components
[
https://issues.apache.org/jira/browse/NUTCH-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-2989.
Resolution: Fixed
Fellow Nutch devs, please let me know if I botched any of our processes in
[
https://issues.apache.org/jira/browse/NUTCH-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reassigned NUTCH-2989:
--
Assignee: Tim Allison
> Can't have username/pw AND https on elastic
Thank you, all! I’m thrilled to join the team!
On Thu, Jul 20, 2023 at 9:42 AM Julien Nioche
wrote:
> What a fantastic addition to the Nutch team! Congrats to Tim
>
> On Thu, 20 Jul 2023 at 10:20, Sebastian Nagel wrote:
>
>> Dear all,
>>
>> It is my pleasure to
[
https://issues.apache.org/jira/browse/NUTCH-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-2994:
---
Description:
Over on NUTCH-2920, we added an indexer for OpenSearch 1.x. We should do this
for 2.x
Tim Allison created NUTCH-2994:
--
Summary: Implement an indexer for OpenSearch 2.x
Key: NUTCH-2994
URL: https://issues.apache.org/jira/browse/NUTCH-2994
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725842#comment-17725842
]
Tim Allison commented on NUTCH-2959:
tika-server would be cleaner? Could
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725807#comment-17725807
]
Tim Allison commented on NUTCH-2959:
Separately, I'm wondering if it would
[
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725805#comment-17725805
]
Tim Allison commented on NUTCH-2959:
I just opened a PR to upgrade Tika to 2.8.
Tim Allison created NUTCH-2989:
--
Summary: Can't have username/pw AND https on elastic-indexer?!
Key: NUTCH-2989
URL: https://issues.apache.org/jira/browse/NUTCH-2989
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved NUTCH-2988.
Resolution: Duplicate
Duplicate. Sorry!
> Elasticsearch 7.13.2 compatible with ASL
[
https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695217#comment-17695217
]
Tim Allison edited comment on NUTCH-2927 at 3/1/23 5:2
[
https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695217#comment-17695217
]
Tim Allison commented on NUTCH-2927:
Over on NUTCH-2920 , I stumbled into
[
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695152#comment-17695152
]
Tim Allison commented on NUTCH-2920:
Current proposal is to go with the high l
[
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695148#comment-17695148
]
Tim Allison commented on NUTCH-2920:
Well, that was a funny notion...
Turns
[
https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695096#comment-17695096
]
Tim Allison commented on NUTCH-2920:
My initial PR was a simple copy+paste wi
[
https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694744#comment-17694744
]
Tim Allison commented on NUTCH-2988:
If you open the 7.13.2 jar file, there
[
https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-2988:
---
Attachment: LICENSE.txt
> Elasticsearch 7.13.2 compatible with ASL
[
https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-2988:
---
Description:
In the latest release of at least the 1.x branch of Nutch, the elasticsearch
high
[
https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694739#comment-17694739
]
Tim Allison commented on NUTCH-2988:
Y, k.
https://www.elastic.co/guid
[
https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated NUTCH-2988:
---
Description:
In the latest release of at least the 1.x branch of Nutch, the elasticsearch
high
Tim Allison created NUTCH-2988:
--
Summary: Elasticsearch 7.13.2 compatible with ASL 2.0?
Key: NUTCH-2988
URL: https://issues.apache.org/jira/browse/NUTCH-2988
Project: Nutch
Issue Type: Task
[
https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939516#comment-16939516
]
Tim Allison edited comment on NUTCH-2457 at 9/27/19 2:5
[
https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939516#comment-16939516
]
Tim Allison commented on NUTCH-2457:
W00t! Default is to parse embedded, right
[
https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939478#comment-16939478
]
Tim Allison commented on NUTCH-2457:
The issue is that the AutoDetectPa
[
https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939473#comment-16939473
]
Tim Allison commented on NUTCH-2457:
Let me take a look at the code again...it
[
https://issues.apache.org/jira/browse/NUTCH-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542898#comment-16542898
]
Tim Allison commented on NUTCH-2586:
Is this better handled at the Tika level.
[
https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482879#comment-16482879
]
Tim Allison edited comment on NUTCH-2578 at 5/21/18 6:3
[
https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482879#comment-16482879
]
Tim Allison edited comment on NUTCH-2578 at 5/21/18 6:3
[
https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482879#comment-16482879
]
Tim Allison edited comment on NUTCH-2578 at 5/21/18 6:3
[
https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482879#comment-16482879
]
Tim Allison commented on NUTCH-2578:
Based on [~wastl-nagel]'s observation,
[
https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1622#comment-1622
]
Tim Allison commented on NUTCH-2457:
Before Tika 1.15 (I think...might have been
[
https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1620#comment-1620
]
Tim Allison edited comment on NUTCH-2457 at 11/16/17 4:2
[
https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1620#comment-1620
]
Tim Allison commented on NUTCH-2457:
So, in lieu of a PR...please, please, please
1 - 100 of 102 matches
Mail list logo