[jira] [Commented] (NUTCH-3040) Upgrade to Hadoop 3.4.0

2024-04-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836191#comment-17836191 ] Tim Allison commented on NUTCH-3040: :cry-sob: This is great news! > Upg

[jira] [Commented] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode

2024-04-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17834532#comment-17834532 ] Tim Allison commented on NUTCH-2937: I really, really, really wish we didn'

[jira] [Comment Edited] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827510#comment-17827510 ] Tim Allison edited comment on NUTCH-3026 at 3/15/24 2:1

[jira] [Resolved] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3026. Resolution: Won't Fix Lost support for working on this issue. > Allow statusOnly op

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2024-03-11 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825440#comment-17825440 ] Tim Allison commented on NUTCH-3026: I should close out the PR and this issue.

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-12-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794972#comment-17794972 ] Tim Allison commented on NUTCH-3026: Anyone have any time for feedback, even if

[jira] [Commented] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787372#comment-17787372 ] Tim Allison commented on NUTCH-3026: The above PR is a WIP for discussion. Le

[jira] [Updated] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3026: --- Description: This issue follows on from discussion here: https://lists.apache.org/thread

[jira] [Updated] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3026: --- Description: This issue follows on from discussion here: https://lists.apache.org/thread

[jira] [Updated] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3026: --- Issue Type: New Feature (was: Task) > Allow statusOnly option for Indexing

[jira] [Created] (NUTCH-3026) Allow statusOnly option for IndexingJob

2023-11-17 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3026: -- Summary: Allow statusOnly option for IndexingJob Key: NUTCH-3026 URL: https://issues.apache.org/jira/browse/NUTCH-3026 Project: Nutch Issue Type: Task

[jira] [Resolved] (NUTCH-3020) ParseSegment should check for protocol's flags for truncation

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3020. Fix Version/s: 1.20 Resolution: Fixed > ParseSegment should check for protocol's f

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783352#comment-17783352 ] Tim Allison commented on NUTCH-3019: {noformat} [junit] Tests run: 7, Fail

[jira] [Resolved] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3019. Fix Version/s: 1.20 Resolution: Fixed > Upgrade to Apache Tika 2.

[jira] [Comment Edited] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783254#comment-17783254 ] Tim Allison edited comment on NUTCH-3019 at 11/6/23 3:4

[jira] [Comment Edited] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783252#comment-17783252 ] Tim Allison edited comment on NUTCH-3019 at 11/6/23 3:3

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-11-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783252#comment-17783252 ] Tim Allison commented on NUTCH-3019: ParserStatus         failed=84         suc

[jira] [Created] (NUTCH-3021) Improve http-protocol to identify truncated content

2023-11-01 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3021: -- Summary: Improve http-protocol to identify truncated content Key: NUTCH-3021 URL: https://issues.apache.org/jira/browse/NUTCH-3021 Project: Nutch Issue Type

[jira] [Created] (NUTCH-3020) ParseSegment should check for protocol's flags for truncation

2023-11-01 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3020: -- Summary: ParseSegment should check for protocol's flags for truncation Key: NUTCH-3020 URL: https://issues.apache.org/jira/browse/NUTCH-3020 Project:

[jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3018: --- Description: It looks like it takes between 2x and 4x of the time to initialize the remote

[jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781485#comment-17781485 ] Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:5

[jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781485#comment-17781485 ] Tim Allison commented on NUTCH-3018: On further reflection, what the above mean

[jira] [Comment Edited] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781483#comment-17781483 ] Tim Allison edited comment on NUTCH-3018 at 10/31/23 6:4

[jira] [Commented] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781483#comment-17781483 ] Tim Allison commented on NUTCH-3018: It looks like we cannot create more web dri

[jira] [Updated] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3018: --- Description: It looks like it takes between 2x and 4x of the time to initialize the remote

[jira] [Commented] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781482#comment-17781482 ] Tim Allison commented on NUTCH-3019: Separately, I noticed that logging from

[jira] [Created] (NUTCH-3019) Upgrade to Apache Tika 2.9.1

2023-10-31 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3019: -- Summary: Upgrade to Apache Tika 2.9.1 Key: NUTCH-3019 URL: https://issues.apache.org/jira/browse/NUTCH-3019 Project: Nutch Issue Type: Task Reporter

[jira] [Resolved] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2959. Resolution: Fixed > Upgrade to Apache Tika 2.

[jira] [Created] (NUTCH-3018) Consider pooling remote webdrivers for Selenium?

2023-10-31 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3018: -- Summary: Consider pooling remote webdrivers for Selenium? Key: NUTCH-3018 URL: https://issues.apache.org/jira/browse/NUTCH-3018 Project: Nutch Issue Type: Task

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771476#comment-17771476 ] Tim Allison commented on NUTCH-2959: If you and the Nutch team are ok with the

[jira] [Comment Edited] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771170#comment-17771170 ] Tim Allison edited comment on NUTCH-2959 at 10/2/23 3:5

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-10-02 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17771170#comment-17771170 ] Tim Allison commented on NUTCH-2959: I've continued to stub my toes on

Re: Establishing a Nutch development roadmap

2023-09-28 Thread Tim Allison
Sorry for two emails... Migrating javax->jakarta has been quite a chore on Tika because of dependencies. Given back-compat issues with hadoop, is this even on the horizon for Nutch? On Thu, Sep 28, 2023 at 9:29 AM Tim Allison wrote: > Y, I'd like to get a working Tika version i

Re: Establishing a Nutch development roadmap

2023-09-28 Thread Tim Allison
Y, I'd like to get a working Tika version in a release fairly soon. Not sure how much effort a release is? On Thu, Sep 28, 2023 at 8:29 AM Sebastian Nagel wrote: > Hi Lewis, > > thanks! > > I'd put on top of the list > > * release 1.20 > > Since the release of 1.19 more than one year has elapse

[jira] [Commented] (NUTCH-3006) Downgrade Tika dependency to 2.2.1 (core and parse-tika)

2023-09-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770059#comment-17770059 ] Tim Allison commented on NUTCH-3006: An alternative approach would be for Tik

[jira] [Created] (NUTCH-3005) Upgrade selenium as needed

2023-09-26 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3005: -- Summary: Upgrade selenium as needed Key: NUTCH-3005 URL: https://issues.apache.org/jira/browse/NUTCH-3005 Project: Nutch Issue Type: Improvement

[jira] [Resolved] (NUTCH-3004) Avoid NPE in HttpResponse

2023-09-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3004. Resolution: Fixed > Avoid NPE in HttpResponse > - > >

[jira] [Created] (NUTCH-3004) Avoid NPE in HttpResponse

2023-09-25 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3004: -- Summary: Avoid NPE in HttpResponse Key: NUTCH-3004 URL: https://issues.apache.org/jira/browse/NUTCH-3004 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2937) parse-tika: review dependency exclusions and avoid dependency conflicts in distributed mode

2023-09-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766832#comment-17766832 ] Tim Allison commented on NUTCH-2937: As [~snagel] pointed out on the PR for N

[jira] [Created] (NUTCH-3003) Consider integration testing in a Dockerized mini-hadoop cluster via testcontainers?

2023-09-19 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3003: -- Summary: Consider integration testing in a Dockerized mini-hadoop cluster via testcontainers? Key: NUTCH-3003 URL: https://issues.apache.org/jira/browse/NUTCH-3003

[jira] [Resolved] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2023-09-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2978. Fix Version/s: 1.20 Resolution: Fixed Many thanks [~markus17] for all of the work on this

[jira] [Updated] (NUTCH-2959) Upgrade to Apache Tika 2.9.0

2023-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2959: --- Summary: Upgrade to Apache Tika 2.9.0 (was: Upgrade to Apache Tika 2.4.1) > Upgrade to Apache T

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2023-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765306#comment-17765306 ] Tim Allison commented on NUTCH-2959: Currently working on this to bump to Tika 2

[jira] [Resolved] (NUTCH-2998) Remove the Any23 plugin

2023-09-14 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2998. Fix Version/s: 1.20 Resolution: Fixed > Remove the Any23 plu

[jira] [Resolved] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3000. Fix Version/s: 1.20 Resolution: Fixed > protocol-selenium returns only the body,strips

[jira] [Resolved] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-3001. Fix Version/s: 1.20 Resolution: Fixed > protocol-selenium requires Content-Type hea

[jira] [Commented] (NUTCH-2998) Remove the Any23 plugin

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764741#comment-17764741 ] Tim Allison commented on NUTCH-2998: Sorry, I botched the title in the PR: h

[DISCUSS] Removing Any23 from Nutch?

2023-09-13 Thread Tim Allison
All, I opened https://issues.apache.org/jira/browse/NUTCH-2998 a few weeks ago. Any23 was moved to the attic in June. Unless there are objections, I propose removing it from Nutch before the next release. Any objections? Best, Tim

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764705#comment-17764705 ] Tim Allison commented on NUTCH-2978: I haven't tested in hadoop. I'

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be a content-type header

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Priority: Minor (was: Major) > protocol-selenium requires Content-Type hea

[jira] [Commented] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764698#comment-17764698 ] Tim Allison commented on NUTCH-3001: Or is the notion that if the selenium prot

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be a content-type header

[jira] [Updated] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-3001: --- Description: It looks like the selenium protocol requires that there be content-type. The logic

[jira] [Created] (NUTCH-3001) protocol-selenium requires Content-Type header

2023-09-13 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3001: -- Summary: protocol-selenium requires Content-Type header Key: NUTCH-3001 URL: https://issues.apache.org/jira/browse/NUTCH-3001 Project: Nutch Issue Type: Bug

[jira] [Created] (NUTCH-3000) protocol-selenium returns only the body,strips off the element

2023-09-13 Thread Tim Allison (Jira)
Tim Allison created NUTCH-3000: -- Summary: protocol-selenium returns only the body,strips off the element Key: NUTCH-3000 URL: https://issues.apache.org/jira/browse/NUTCH-3000 Project: Nutch

[jira] [Commented] (NUTCH-2998) Remove the Any23 plugin

2023-09-12 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764376#comment-17764376 ] Tim Allison commented on NUTCH-2998: I don't want to make such a drast

[jira] [Commented] (NUTCH-2978) Move to slf4j2 and remove log4j1 and reload4j

2023-08-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760926#comment-17760926 ] Tim Allison commented on NUTCH-2978: K, I think https://github.com/apache/nutch/

[jira] [Resolved] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2999. Resolution: Fixed Updated PR should have fixed that issue. Would be nice to add testcontainers

[jira] [Reopened] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened NUTCH-2999: The applied PR breaks the lucene-based indexers. > Update Lucene version to latest

[jira] [Resolved] (NUTCH-2961) Upgrade dependencies of parsefilter-naivebayes

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2961. Resolution: Fixed I confirmed we can simply remove those dependencies. I fixed this as part of

[jira] [Resolved] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2999. Fix Version/s: 1.20 Resolution: Fixed Thank you [~markus17] for the review! > Upd

[jira] [Commented] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760512#comment-17760512 ] Tim Allison commented on NUTCH-2999: This PR also takes care of NUTCH-2961 >

[jira] [Commented] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760511#comment-17760511 ] Tim Allison commented on NUTCH-2999: https://github.com/apache/nutch/pull

[jira] [Created] (NUTCH-2999) Update Lucene version to latest 8.x

2023-08-30 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2999: -- Summary: Update Lucene version to latest 8.x Key: NUTCH-2999 URL: https://issues.apache.org/jira/browse/NUTCH-2999 Project: Nutch Issue Type: Task

[jira] [Commented] (NUTCH-2961) Upgrade dependencies of parsefilter-naivebayes

2023-08-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760508#comment-17760508 ] Tim Allison commented on NUTCH-2961: It looks like neither mahout nor lucene

[jira] [Created] (NUTCH-2998) Remove the Any23 plugin

2023-08-28 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2998: -- Summary: Remove the Any23 plugin Key: NUTCH-2998 URL: https://issues.apache.org/jira/browse/NUTCH-2998 Project: Nutch Issue Type: Task Components

[jira] [Resolved] (NUTCH-2989) Can't have username/pw AND https on elastic-indexer?!

2023-08-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2989. Resolution: Fixed Fellow Nutch devs, please let me know if I botched any of our processes in

[jira] [Assigned] (NUTCH-2989) Can't have username/pw AND https on elastic-indexer?!

2023-08-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned NUTCH-2989: -- Assignee: Tim Allison > Can't have username/pw AND https on elastic

Re: [ANNOUNCE] New Nutch committer and PMC - Tim Allison

2023-07-20 Thread Tim Allison
Thank you, all! I’m thrilled to join the team! On Thu, Jul 20, 2023 at 9:42 AM Julien Nioche wrote: > What a fantastic addition to the Nutch team! Congrats to Tim > > On Thu, 20 Jul 2023 at 10:20, Sebastian Nagel wrote: > >> Dear all, >> >> It is my pleasure to

[jira] [Updated] (NUTCH-2994) Implement an indexer for OpenSearch 2.x

2023-06-08 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2994: --- Description: Over on NUTCH-2920, we added an indexer for OpenSearch 1.x. We should do this for 2.x

[jira] [Created] (NUTCH-2994) Implement an indexer for OpenSearch 2.x

2023-06-08 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2994: -- Summary: Implement an indexer for OpenSearch 2.x Key: NUTCH-2994 URL: https://issues.apache.org/jira/browse/NUTCH-2994 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2023-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725842#comment-17725842 ] Tim Allison commented on NUTCH-2959: tika-server would be cleaner?  Could

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2023-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725807#comment-17725807 ] Tim Allison commented on NUTCH-2959: Separately, I'm wondering if it would

[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2023-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17725805#comment-17725805 ] Tim Allison commented on NUTCH-2959: I just opened a PR to upgrade Tika to 2.8.

[jira] [Created] (NUTCH-2989) Can't have username/pw AND https on elastic-indexer?!

2023-03-01 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2989: -- Summary: Can't have username/pw AND https on elastic-indexer?! Key: NUTCH-2989 URL: https://issues.apache.org/jira/browse/NUTCH-2989 Project: Nutch Issue

[jira] [Resolved] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved NUTCH-2988. Resolution: Duplicate Duplicate. Sorry! > Elasticsearch 7.13.2 compatible with ASL

[jira] [Comment Edited] (NUTCH-2927) indexer-elastic: use Java API client

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695217#comment-17695217 ] Tim Allison edited comment on NUTCH-2927 at 3/1/23 5:2

[jira] [Commented] (NUTCH-2927) indexer-elastic: use Java API client

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695217#comment-17695217 ] Tim Allison commented on NUTCH-2927: Over on NUTCH-2920 , I stumbled into

[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695152#comment-17695152 ] Tim Allison commented on NUTCH-2920: Current proposal is to go with the high l

[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695148#comment-17695148 ] Tim Allison commented on NUTCH-2920: Well, that was a funny notion... Turns

[jira] [Commented] (NUTCH-2920) Implement a indexer-opensearch plugin

2023-03-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695096#comment-17695096 ] Tim Allison commented on NUTCH-2920: My initial PR was a simple copy+paste wi

[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694744#comment-17694744 ] Tim Allison commented on NUTCH-2988: If you open the 7.13.2 jar file, there&#x

[jira] [Updated] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2988: --- Attachment: LICENSE.txt > Elasticsearch 7.13.2 compatible with ASL

[jira] [Updated] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2988: --- Description: In the latest release of at least the 1.x branch of Nutch, the elasticsearch high

[jira] [Commented] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694739#comment-17694739 ] Tim Allison commented on NUTCH-2988: Y, k. https://www.elastic.co/guid

[jira] [Updated] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated NUTCH-2988: --- Description: In the latest release of at least the 1.x branch of Nutch, the elasticsearch high

[jira] [Created] (NUTCH-2988) Elasticsearch 7.13.2 compatible with ASL 2.0?

2023-02-28 Thread Tim Allison (Jira)
Tim Allison created NUTCH-2988: -- Summary: Elasticsearch 7.13.2 compatible with ASL 2.0? Key: NUTCH-2988 URL: https://issues.apache.org/jira/browse/NUTCH-2988 Project: Nutch Issue Type: Task

[jira] [Comment Edited] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2019-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939516#comment-16939516 ] Tim Allison edited comment on NUTCH-2457 at 9/27/19 2:5

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2019-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939516#comment-16939516 ] Tim Allison commented on NUTCH-2457: W00t! Default is to parse embedded, right

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2019-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939478#comment-16939478 ] Tim Allison commented on NUTCH-2457: The issue is that the AutoDetectPa

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2019-09-27 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16939473#comment-16939473 ] Tim Allison commented on NUTCH-2457: Let me take a look at the code again...it

[jira] [Commented] (NUTCH-2586) Add a fallback mechanism for missing meta tags

2018-07-13 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542898#comment-16542898 ] Tim Allison commented on NUTCH-2586: Is this better handled at the Tika level.

[jira] [Comment Edited] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content

2018-05-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482879#comment-16482879 ] Tim Allison edited comment on NUTCH-2578 at 5/21/18 6:3

[jira] [Comment Edited] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content

2018-05-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482879#comment-16482879 ] Tim Allison edited comment on NUTCH-2578 at 5/21/18 6:3

[jira] [Comment Edited] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content

2018-05-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482879#comment-16482879 ] Tim Allison edited comment on NUTCH-2578 at 5/21/18 6:3

[jira] [Commented] (NUTCH-2578) Avoid lock by MimeUtil in constructor of protocol.Content

2018-05-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482879#comment-16482879 ] Tim Allison commented on NUTCH-2578: Based on [~wastl-nagel]'s observation,

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2017-11-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1622#comment-1622 ] Tim Allison commented on NUTCH-2457: Before Tika 1.15 (I think...might have been

[jira] [Comment Edited] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2017-11-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1620#comment-1620 ] Tim Allison edited comment on NUTCH-2457 at 11/16/17 4:2

[jira] [Commented] (NUTCH-2457) Embedded documents likely not correctly parsed by Tika

2017-11-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1620#comment-1620 ] Tim Allison commented on NUTCH-2457: So, in lieu of a PR...please, please, please

  1   2   >