[
https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2218.
--
Resolution: Fixed
[~lewismc], This got merged. I added an example to the option you raised as
[
https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152731#comment-15152731
]
Michael Joyce commented on NUTCH-2218:
--
Sorry for any confusion here folks. Changes were merged in
[
https://issues.apache.org/jira/browse/NUTCH-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2218:
-
Issue Type: Improvement (was: Bug)
> Switch CrawlCompletion arg parsing to Commons CLI
>
Michael Joyce created NUTCH-2187:
Summary: Change FileDumper SHAs to all uppercase
Key: NUTCH-2187
URL: https://issues.apache.org/jira/browse/NUTCH-2187
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2187.
--
Resolution: Duplicate
Going to just resolve this in NUTCH-2182. Thought that patch had already
[
https://issues.apache.org/jira/browse/NUTCH-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2182.
--
Resolution: Fixed
Resolved in r1720466
> Make reverseUrlDirs file dumper option hash the URL
[
https://issues.apache.org/jira/browse/NUTCH-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048889#comment-15048889
]
Michael Joyce commented on NUTCH-2180:
--
Thanks for the patch [~hmanjuna], will scope shortly
>
[
https://issues.apache.org/jira/browse/NUTCH-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce reassigned NUTCH-2180:
Assignee: Michael Joyce
> FileDumper dumps data, but breaks midway on corrupt segments
>
[
https://issues.apache.org/jira/browse/NUTCH-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2182:
-
Attachment: NUTCH-2182_joyce_8Dec2015.patch
Patch Attached
> Make reverseUrlDirs file dumper
Michael Joyce created NUTCH-2182:
Summary: Make reverseUrlDirs file dumper option hash the URL for
consistency
Key: NUTCH-2182
URL: https://issues.apache.org/jira/browse/NUTCH-2182
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023388#comment-15023388
]
Michael Joyce commented on NUTCH-2158:
--
+1 on this. Looks good to me
> Upgrade to Tika 1.11
>
Michael Joyce created NUTCH-2173:
Summary: String.join in FileDumper breaks the build
Key: NUTCH-2173
URL: https://issues.apache.org/jira/browse/NUTCH-2173
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2173 started by Michael Joyce.
> String.join in FileDumper breaks the build
>
[
https://issues.apache.org/jira/browse/NUTCH-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2173.
--
Resolution: Fixed
Resolve in r1715046
> String.join in FileDumper breaks the build
>
[
https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2166.
--
Resolution: Fixed
Committed in r1714908
> Add reverse URL format to dump tool
>
[
https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004191#comment-15004191
]
Michael Joyce commented on NUTCH-2166:
--
Output from a small example run. I don't know that I'm
[
https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002328#comment-15002328
]
Michael Joyce commented on NUTCH-2166:
--
Small change in dump format. Instead of making a bajillion
[
https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2167.
--
Resolution: Fixed
TableUtil copied over in r1714078 and tests copied over in 1714079
>
[
https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002604#comment-15002604
]
Michael Joyce commented on NUTCH-2165:
--
Thanks [~lewismc], I'll merge shortly
> FileDumper Util hard
[
https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2165.
--
Resolution: Fixed
Committed in r1714104
> FileDumper Util hard codes part-# folder name
>
[
https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2155.
--
Resolution: Fixed
Latest patch committed in r1713885
> Create a "crawl completeness" utility
>
[
https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-2150.
--
Resolution: Fixed
Resolved in r1713892
> Add ProtocolStatus Utility
>
[
https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000841#comment-15000841
]
Michael Joyce commented on NUTCH-2167:
--
Hi folks,
All looks good and tests run fine after moving
[
https://issues.apache.org/jira/browse/NUTCH-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2167 started by Michael Joyce.
> Backport TableUtil from 2.x for URL reversing
>
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-1911 started by Michael Joyce.
> Improve DomainStatistics tool command line parsing
>
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce resolved NUTCH-1911.
--
Resolution: Fixed
Resolved in r1713890
> Improve DomainStatistics tool command line parsing
>
[
https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2150 started by Michael Joyce.
> Add ProtocolStatus Utility
> --
>
>
[
https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2155 started by Michael Joyce.
> Create a "crawl completeness" utility
> -
Michael Joyce created NUTCH-2166:
Summary: Add reverse URL format to dump tool
Key: NUTCH-2166
URL: https://issues.apache.org/jira/browse/NUTCH-2166
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2166 started by Michael Joyce.
> Add reverse URL format to dump tool
> ---
>
>
Michael Joyce created NUTCH-2165:
Summary: FileDumper Util hard codes part-# folder name
Key: NUTCH-2165
URL: https://issues.apache.org/jira/browse/NUTCH-2165
Project: Nutch
Issue Type: Bug
Michael Joyce created NUTCH-2167:
Summary: Backport TableUtil from 2.x for URL reversing
Key: NUTCH-2167
URL: https://issues.apache.org/jira/browse/NUTCH-2167
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-2165 started by Michael Joyce.
> FileDumper Util hard codes part-# folder name
>
[
https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce reassigned NUTCH-2165:
Assignee: Michael Joyce
> FileDumper Util hard codes part-# folder name
>
[
https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000910#comment-15000910
]
Michael Joyce commented on NUTCH-2165:
--
Oh aye
> FileDumper Util hard codes part-# folder name
>
[
https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2165:
-
Attachment: NUTCH-2165_joyce_11Nov2015.patch
Patch attached
> FileDumper Util hard codes part-#
[
https://issues.apache.org/jira/browse/NUTCH-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000923#comment-15000923
]
Michael Joyce commented on NUTCH-2165:
--
Note, the diff looks massive here. This is really just adding
[
https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce reassigned NUTCH-2150:
Assignee: Michael Joyce (was: Chris A. Mattmann)
> Add ProtocolStatus Utility
>
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce reassigned NUTCH-1911:
Assignee: Michael Joyce (was: Chris A. Mattmann)
> Improve DomainStatistics tool command
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-1911:
-
Summary: Improve DomainStatistics tool command line parsing (was: Imeprove
DomainStatistics tool
[
https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce reassigned NUTCH-2155:
Assignee: Michael Joyce (was: Chris A. Mattmann)
> Create a "crawl completeness" utility
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-1911:
-
Fix Version/s: 1.10
> Improve DomainStatistics tool command line parsing
>
[
https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2150:
-
Attachment: NUTCH-2015_joyce_9Nov2015.patch
Patch attached to clean up help formatting and drop
[
https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2155:
-
Attachment: NUTCH-2155_joyce_9Nov2015.patch
Patch attached to address "current" requirements in
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-1911:
-
Attachment: NUTCH-1911_joyce_9Nov2015.patch
Attach more recent patch to include removal of
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-1911:
-
Fix Version/s: (was: 1.10)
1.11
> Improve DomainStatistics tool command
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-1911:
-
Attachment: NUTCH-1911_joyce_9Nov2015.patch
Going to resubmit the attached patch to get these
[
https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985431#comment-14985431
]
Michael Joyce commented on NUTCH-2155:
--
+1 sounds good to me [~sebastien0], I will update it in a
[
https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985427#comment-14985427
]
Michael Joyce commented on NUTCH-2150:
--
Yes, will address in a patch shortly.
> Add ProtocolStatus
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985436#comment-14985436
]
Michael Joyce commented on NUTCH-1911:
--
Hrm odd, I want to throw some commons-cli at a few of the
Michael Joyce created NUTCH-2155:
Summary: Create a "crawl completeness" utility
Key: NUTCH-2155
URL: https://issues.apache.org/jira/browse/NUTCH-2155
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979196#comment-14979196
]
Michael Joyce commented on NUTCH-2155:
--
Should have a first patch up shortly for review folks
>
Michael Joyce created NUTCH-2150:
Summary: Add ProtocolStatus Utility
Key: NUTCH-2150
URL: https://issues.apache.org/jira/browse/NUTCH-2150
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977036#comment-14977036
]
Michael Joyce commented on NUTCH-2150:
--
Hi folks,
PR is up for this. You can run the util with
[
https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959659#comment-14959659
]
Michael Joyce commented on NUTCH-2141:
--
Cool makes sense. Do you have any examples? I'd like to poke
[
https://issues.apache.org/jira/browse/NUTCH-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959345#comment-14959345
]
Michael Joyce commented on NUTCH-2141:
--
This was actually brought up in NUTCH-2108. There's also an
[
https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947002#comment-14947002
]
Michael Joyce commented on NUTCH-2129:
--
Fixed the unnecessary init that [~jnioche] caught. Thanks
Michael Joyce created NUTCH-2133:
Summary: Transfer Selenium Documentation to WIki
Key: NUTCH-2133
URL: https://issues.apache.org/jira/browse/NUTCH-2133
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14945766#comment-14945766
]
Michael Joyce commented on NUTCH-2129:
--
Hey folks, updated PR with the metadata approach for HTTP and
[
https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939939#comment-14939939
]
Michael Joyce commented on NUTCH-2129:
--
Thanks Julien. I figured there would probably be a few
[
https://issues.apache.org/jira/browse/NUTCH-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940036#comment-14940036
]
Michael Joyce commented on NUTCH-2108:
--
Good stuff [~asitang], glad to see the workaround proved
Michael Joyce created NUTCH-2129:
Summary: Track Protocol Status in Crawl Datum
Key: NUTCH-2129
URL: https://issues.apache.org/jira/browse/NUTCH-2129
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939124#comment-14939124
]
Michael Joyce commented on NUTCH-2129:
--
Hi folks,
Initial pull request up to address this. Note that
Michael Joyce created NUTCH-2115:
Summary: Add total counts to dump stats
Key: NUTCH-2115
URL: https://issues.apache.org/jira/browse/NUTCH-2115
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14905156#comment-14905156
]
Michael Joyce commented on NUTCH-2115:
--
Cheers [~lewismc], thanks for the quick merge!
> Add total
[
https://issues.apache.org/jira/browse/NUTCH-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720279#comment-14720279
]
Michael Joyce commented on NUTCH-2077:
--
Hey folks, updated tika to 1.10. If there was
Michael Joyce created NUTCH-2088:
Summary: Add Optional Execution to Interactive Selenium Handlers
Key: NUTCH-2088
URL: https://issues.apache.org/jira/browse/NUTCH-2088
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703153#comment-14703153
]
Michael Joyce commented on NUTCH-2082:
--
FYI, this is a duplicate of NUTCH-2077 I
[
https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701707#comment-14701707
]
Michael Joyce commented on NUTCH-2049:
--
Great stuff Lewis. Builds and runs cleanly
[
https://issues.apache.org/jira/browse/NUTCH-2049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694210#comment-14694210
]
Michael Joyce commented on NUTCH-2049:
--
Hey [~lewismc],
Tried your patch here. Seems
[
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646423#comment-14646423
]
Michael Joyce commented on NUTCH-2062:
--
Hi folks,
Is there something I need to do to
[
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646488#comment-14646488
]
Michael Joyce edited comment on NUTCH-2062 at 7/29/15 5:50 PM:
[
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646488#comment-14646488
]
Michael Joyce commented on NUTCH-2062:
--
Cheers Chris, responded on the PR.
Also,
[
https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2048:
-
Attachment: NUTCH-2048_Joyce_20150727.patch
Updated the patch to set the sync attribute on
[
https://issues.apache.org/jira/browse/NUTCH-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641210#comment-14641210
]
Michael Joyce commented on NUTCH-1936:
--
Ah this is absolutely awesome Lewis. Great
[
https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639462#comment-14639462
]
Michael Joyce commented on NUTCH-2048:
--
Alright, hopefully this one is a bit more on
[
https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2048:
-
Attachment: NUTCH-2048_Joyce_20150723_2.patch
Patch #2 up. Explanation to follow shortly
[
https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639396#comment-14639396
]
Michael Joyce commented on NUTCH-2048:
--
Ah I clearly didn't pay enough attention to
[
https://issues.apache.org/jira/browse/NUTCH-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2048:
-
Attachment: NUTCH-2048_Joyce_20150723.patch
Quick patch up for this.
parse-tika: fix
[
https://issues.apache.org/jira/browse/NUTCH-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2063:
-
Labels: memex (was: )
Add -mimeStats flag to FileDumper tool
[
https://issues.apache.org/jira/browse/NUTCH-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2004:
-
Labels: memex (was: )
ParseChecker does not handle redirects
[
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636958#comment-14636958
]
Michael Joyce commented on NUTCH-2062:
--
Cheers [~lewismc], let me see what I can do
[
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635389#comment-14635389
]
Michael Joyce commented on NUTCH-2062:
--
Hi folks,
Just wanted to elaborate a bit on
[
https://issues.apache.org/jira/browse/NUTCH-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635706#comment-14635706
]
Michael Joyce commented on NUTCH-2063:
--
Hey [~lewismc], threw a patch up for this.
[
https://issues.apache.org/jira/browse/NUTCH-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-2063:
-
Attachment: nutch-2063-joyce-21July2015.patch
Add -mimeStats flag to FileDumper tool
Michael Joyce created NUTCH-2062:
Summary: Add Plugin for interacting with Selenium WebDriver
Key: NUTCH-2062
URL: https://issues.apache.org/jira/browse/NUTCH-2062
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633731#comment-14633731
]
Michael Joyce commented on NUTCH-2062:
--
Hi folks,
I have a work-in progress locally
[
https://issues.apache.org/jira/browse/NUTCH-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14599958#comment-14599958
]
Michael Joyce commented on NUTCH-1504:
--
This is great stuff [~lewismc], we definitely
[
https://issues.apache.org/jira/browse/NUTCH-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596832#comment-14596832
]
Michael Joyce commented on NUTCH-2045:
--
+1 this is great
index-basic incorrect
Michael Joyce created NUTCH-2004:
Summary: ParseChecker does not handle redirects
Key: NUTCH-2004
URL: https://issues.apache.org/jira/browse/NUTCH-2004
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520028#comment-14520028
]
Michael Joyce commented on NUTCH-2004:
--
Hi folks, will try to get a patch thrown up
[
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503746#comment-14503746
]
Michael Joyce commented on NUTCH-1934:
--
Hey [~lewismc],
Patch applied clean to
[
https://issues.apache.org/jira/browse/NUTCH-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503727#comment-14503727
]
Michael Joyce commented on NUTCH-1934:
--
Once sec Lewis and I'll take a quick scope.
[
https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503446#comment-14503446
]
Michael Joyce commented on NUTCH-1987:
--
Hi folks, PR has been updated with the
[
https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501674#comment-14501674
]
Michael Joyce commented on NUTCH-1987:
--
Hey Chris,
Will do. I'll try to take a poke
[
https://issues.apache.org/jira/browse/NUTCH-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-1986:
-
Labels: memex (was: )
Clarify Elastic Search Indexer Plugin Settings
[
https://issues.apache.org/jira/browse/NUTCH-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498689#comment-14498689
]
Michael Joyce commented on NUTCH-1911:
--
Hey folks,
Here's what the output from this
[
https://issues.apache.org/jira/browse/NUTCH-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-1988:
-
Labels: memex (was: )
Make nested output directory dump optional
[
https://issues.apache.org/jira/browse/NUTCH-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498573#comment-14498573
]
Michael Joyce commented on NUTCH-1906:
--
Hi folks,
I'll throw a patch up shortly for
[
https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Joyce updated NUTCH-1987:
-
Labels: memex (was: )
Make bin/crawl indexer agnostic
---
1 - 100 of 111 matches
Mail list logo