Welcome!!
-Original message-
From: Sujen Shah
Sent: Wednesday 16th September 2015 0:58
To: dev@nutch.apache.org
Cc: u...@nutch.apache.org
Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Sujen Shah
Hi Everyone,
I would like to thank the members of the Apache Nutch PMC for bringing m
[
https://issues.apache.org/jira/browse/NUTCH-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney reassigned NUTCH-1572:
---
Assignee: Lewis John McGibbney
> Nutch 2.x should use o.a.g.mem.store.MemStor
[
https://issues.apache.org/jira/browse/NUTCH-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1572:
Fix Version/s: (was: 2.4)
2.3.1
> Nutch 2.x should use o.a.g.
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746866#comment-14746866
]
Hudson commented on NUTCH-1679:
---
SUCCESS: Integrated in Nutch-nutchgora #1535 (See
[https:/
Lewis John McGibbney created NUTCH-2101:
---
Summary: Upgrade Nutch 2.X to Hadoop 2.4.0
Key: NUTCH-2101
URL: https://issues.apache.org/jira/browse/NUTCH-2101
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-2009.
-
Resolution: Duplicate
These MongoDB issues have been resolved in Gora 0.6.1 and on
[
https://issues.apache.org/jira/browse/NUTCH-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-2080.
-
Resolution: Invalid
This has to do with ivy/ivy.xml configuration and should be fi
[
https://issues.apache.org/jira/browse/NUTCH-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-2029.
-
Resolution: Fixed
This issue has been resolved as it was fixed over in GORA-423. W
[
https://issues.apache.org/jira/browse/NUTCH-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-1922.
-
Resolution: Duplicate
This issue is a clone of NUTCH-1679 for which I just committ
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved NUTCH-1679.
-
Resolution: Fixed
Committed @revision 1703331 in 2.X HEAD
> UpdateDb using batchI
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1679:
Attachment: NUTCH-1679_4.patch
Patch which sorts out some trivial formatting and als
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746807#comment-14746807
]
Lewis John McGibbney commented on NUTCH-1679:
-
I've tested this with Nutch 2.X
[
https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kim Whitehall closed NUTCH-2100.
Resolution: Invalid
The command was used incorrectly. There is no bug.
> Nutch dump command doesnt
Hi Everyone,
I would like to thank the members of the Apache Nutch PMC for bringing me
on board and giving me the opportunity to become a member and committer.
I am a Graduate student at the University of Southern California, majoring
in Computer Science. I have been working with Chris Mattmann an
[
https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746210#comment-14746210
]
Kim Whitehall commented on NUTCH-2100:
--
LOL! how dumb of me! yeap, it works. Of all t
[
https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746141#comment-14746141
]
Chris A. Mattmann commented on NUTCH-2100:
--
Kim I think that the directory expect
[
https://issues.apache.org/jira/browse/NUTCH-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann reassigned NUTCH-2100:
Assignee: Chris A. Mattmann
> Nutch dump command doesnt dump anything
> --
Kim Whitehall created NUTCH-2100:
Summary: Nutch dump command doesnt dump anything
Key: NUTCH-2100
URL: https://issues.apache.org/jira/browse/NUTCH-2100
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746062#comment-14746062
]
Sebastian Nagel commented on NUTCH-1932:
Correct, it was about 404 pages not about
Dear all,
on behalf of the Nutch PMC it is my pleasure to announce
that Sujen Shah has been voted in as committer and member
of the Nutch PMC. Sujen, would you mind to introduce
yourself to the Nutch community and tell in just a few
words about your interests and your plans regarding Nutch?
Cong
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746034#comment-14746034
]
Markus Jelsma commented on NUTCH-1932:
--
Hello Sebastian. I am not sure about that bei
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1932:
---
Attachment: NUTCH-1932-add.patch
> Automatically remove orphaned pages
> -
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746004#comment-14746004
]
Sebastian Nagel commented on NUTCH-1932:
Hi Markus, understood.
- didn't we have t
Github user prernasatija closed the pull request at:
https://github.com/apache/nutch/pull/57
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "AdvancedAjaxInteraction" page has been changed by MichaelJoyce:
https://wiki.apache.org/nutch/AdvancedAjaxInteraction?action=diff&rev1=4&rev2=5
Comment:
Updates regarding available
[
https://issues.apache.org/jira/browse/NUTCH-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745755#comment-14745755
]
Hudson commented on NUTCH-2093:
---
SUCCESS: Integrated in Nutch-trunk #3271 (See
[https://bui
[
https://issues.apache.org/jira/browse/NUTCH-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745663#comment-14745663
]
ASF GitHub Bot commented on NUTCH-2099:
---
GitHub user sujen1412 opened a pull request
GitHub user sujen1412 opened a pull request:
https://github.com/apache/nutch/pull/59
Fix for NUTCH-2099 Contributed by Sujen Shah
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sujen1412/nutch NUTCH-2099
Alternatively you can r
Sujen Shah created NUTCH-2099:
-
Summary: Refactoring the REST endpoints for integration with webui
Key: NUTCH-2099
URL: https://issues.apache.org/jira/browse/NUTCH-2099
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aron Ahmadia updated NUTCH-2098:
Attachment: 0001-Default-SeedURL-constructor.patch
> Add null SeedUrl constructor
>
Aron Ahmadia created NUTCH-2098:
---
Summary: Add null SeedUrl constructor
Key: NUTCH-2098
URL: https://issues.apache.org/jira/browse/NUTCH-2098
Project: Nutch
Issue Type: Bug
Components
Github user jnioche commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39509421
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import java.io.ByteArr
Github user jorgelbg commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39509273
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import java.io.ByteAr
Github user jorgelbg commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39509063
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import java.io.ByteAr
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745330#comment-14745330
]
Nadeem Douba edited comment on NUTCH-2097 at 9/15/15 12:23 PM:
-
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745330#comment-14745330
]
Nadeem Douba edited comment on NUTCH-2097 at 9/15/15 12:22 PM:
-
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745330#comment-14745330
]
Nadeem Douba commented on NUTCH-2097:
-
Re: maven migration
Would building each tool i
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745322#comment-14745322
]
Markus Jelsma commented on NUTCH-2097:
--
Yes, having them as separate mapper and reduc
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745313#comment-14745313
]
Nadeem Douba commented on NUTCH-2097:
-
I'm not entirely married to the package structu
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Description: Orphan scoring filter that determines whether a page has
become orphaned, e.g. it has
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Attachment: NUTCH-1932.patch
First proper working patch. Tests pass
> Automatically remove orph
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745216#comment-14745216
]
Markus Jelsma commented on NUTCH-1932:
--
Hey Sebastian, i fixed the location, it is al
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Attachment: NUTCH-1932.patch
> Automatically remove orphaned pages
> -
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745205#comment-14745205
]
Sebastian Nagel commented on NUTCH-1932:
Hi Markus, that looks quite simple
- do w
Github user jnioche commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39492479
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import java.io.ByteArr
Github user jnioche commented on a diff in the pull request:
https://github.com/apache/nutch/pull/55#discussion_r39492460
--- Diff: src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java ---
@@ -0,0 +1,337 @@
+package org.apache.nutch.tools;
+
+import java.io.ByteArr
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Attachment: NUTCH-1932.patch
Eeh, patch with the scoring filter itself. Apparently it is possible
[
https://issues.apache.org/jira/browse/NUTCH-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1932:
-
Attachment: NUTCH-1932.patch
New and much simpler patch. This relies on a scoring filter to mark
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745061#comment-14745061
]
Sebastian Nagel commented on NUTCH-2097:
Yes, looks promising.
- maven could simpl
[
https://issues.apache.org/jira/browse/NUTCH-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744983#comment-14744983
]
Lewis John McGibbney commented on NUTCH-2097:
-
Hi [~markus17] thanks for initi
50 matches
Mail list logo