[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600788#comment-14600788
]
Sebastian Nagel commented on NUTCH-2038:
Great, thanks! Ideally the model is loade
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600419#comment-14600419
]
Asitang Mishra edited comment on NUTCH-2038 at 6/25/15 12:19 AM:
---
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600419#comment-14600419
]
Asitang Mishra commented on NUTCH-2038:
---
" maybe rename the plugin to parsefilte
Hi Folks,
In not too long time Hadoop will be up at 3.X for stable official releases.
I wanted to solicit the dev@ community to see what difficulties if any
people have had running Nutch trunk on Hadoop 2.X.
Hadoop 2.X is supported on Nutch 2.X but getting the patches all correct is
literally a PIT
[
https://issues.apache.org/jira/browse/NUTCH-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600327#comment-14600327
]
Sujen Shah edited comment on NUTCH-2047 at 6/24/15 11:09 PM:
-
[
https://issues.apache.org/jira/browse/NUTCH-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600334#comment-14600334
]
Sebastian Nagel commented on NUTCH-1625:
Is this really only legacy code and what'
[
https://issues.apache.org/jira/browse/NUTCH-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sujen Shah updated NUTCH-2047:
--
Attachment: part-0
This file is a dump of the top 1000 URLs.
The model file contained information r
Sujen Shah created NUTCH-2047:
-
Summary: Improvements to the relevance scoring plugin
Key: NUTCH-2047
URL: https://issues.apache.org/jira/browse/NUTCH-2047
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600290#comment-14600290
]
Sebastian Nagel commented on NUTCH-1692:
+1
> SegmentReader broken in distributed
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14600269#comment-14600269
]
Sebastian Nagel commented on NUTCH-2038:
Hi [~asitang], the latest pull request #3
I am sorry, by getting rid i meant moving git requests to a separate list. But
because both are accepted, this is probably not going to happen. Due to the
flood of mail, i normally ignore git mail completely, but not Jira updates.
If Lewis' mail client is friendly, he can filter git mail to a
Sorry I wasn't clear. I'm *not* fine with getting rid of Github.
I was simply proposing for the mail spam to be moved to a different
list. But, to me JIRA/SVN, is no different than Github comments and
pull requests and so forth. To each their own :) The ASF full supports
Git and Github integration
I am fine with getting rid of Github e-mail, not Jira, Jenkins or other ASF
infra stuff. The git requests are not in our svn format anyway. If someone is
serious about their patch and want it in the regular releases, then please be
so polite to not make it a bit harder for us ;)
-Original
Hey Lewis,
Yeah to be honest, this no different than ReviewBoard, JIRA, etc.
At least it's not as bad as Spark :/ I did a review of Asitang's patch
and it took each one of my comments and sent a mail. B/c of Apache's
requirement that things happen "on the list", we have to have the mails
replicate
Well, either disable it or have people send less requests. On the other hand,
adding patches and Jira comments also gets you e-mail.
-Original message-
From: Lewis John Mcgibbney
Sent: Wednesday 24th June 2015 21:47
To: dev@nutch.apache.org
Subject: Github Spam
Hi Folks,
The Github spam
Hi Folks,
The Github spam is killing me.
Seems to go to - nu...@noreply.github.com
Basically every commit someone pushes (there have been loads recently) is
sending me a new email over and above the digest emails I get.
I am sure this must be pissing other people off. Is there a better way for
us t
[
https://issues.apache.org/jira/browse/NUTCH-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599958#comment-14599958
]
Michael Joyce commented on NUTCH-1504:
--
This is great stuff [~lewismc], we definitely
[
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599933#comment-14599933
]
Luis Lopez commented on NUTCH-2046:
---
I used just -skipInject instead of the actual path
[
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luis Lopez updated NUTCH-2046:
--
Attachment: crawl.patch
The crawl script skips the initial injection if we use -skipInject instead of
t
[
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599840#comment-14599840
]
Julien Nioche commented on NUTCH-2046:
--
re-script : what about a positive parameter i
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Asitang Mishra updated NUTCH-2038:
--
Description:
A html parse filter that will filter out the outlinks in two stages.
Classify the
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Asitang Mishra updated NUTCH-2038:
--
Description:
A html parse filter that will filter out the outlinks in two stages.
One: Classify
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599798#comment-14599798
]
ASF GitHub Bot commented on NUTCH-2038:
---
GitHub user asitang opened a pull request:
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/36
NUTCH-2038
Made aesthetic changes suggested by Chris Mattmann. Removed dependencies
from the main ivy.xml and added it to plugin's ivy.xml.
You can merge this pull request into a Git repository by r
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599795#comment-14599795
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user asitang closed the pull request
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/35
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
[
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599750#comment-14599750
]
Lewis John McGibbney commented on NUTCH-2046:
-
Hi [~betolink], this is a nice
[
https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-2046:
Fix Version/s: 1.11
> The crawl script should be able to skip an initial injection.
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599748#comment-14599748
]
Chris A. Mattmann commented on NUTCH-2038:
--
yeah you got it Seb, we can do accept
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599661#comment-14599661
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599660#comment-14599660
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599659#comment-14599659
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165638
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165623
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165581
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599657#comment-14599657
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165500
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165528
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599655#comment-14599655
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599656#comment-14599656
]
Asitang Mishra commented on NUTCH-2038:
---
I still have to transfer the external mahou
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599646#comment-14599646
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599647#comment-14599647
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599648#comment-14599648
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165388
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165405
--- Diff: ivy/ivy.xml ---
@@ -100,6 +104,8 @@
+
--- End diff --
also should
Luis Lopez created NUTCH-2046:
-
Summary: The crawl script should be able to skip an initial
injection.
Key: NUTCH-2046
URL: https://issues.apache.org/jira/browse/NUTCH-2046
Project: Nutch
Issue
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165338
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599644#comment-14599644
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165265
--- Diff: conf/nutch-default.xml ---
@@ -1208,6 +1208,28 @@
+ htmlparsefilter.naivebayes.trainfile
+
+ Set the name of th
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599645#comment-14599645
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user chrismattmann commented on a di
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165299
--- Diff: conf/nutch-default.xml ---
@@ -1258,6 +1280,7 @@
+
--- End diff --
extraneous not needed.
---
If your projec
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599625#comment-14599625
]
ASF GitHub Bot commented on NUTCH-2038:
---
GitHub user asitang opened a pull request:
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599622#comment-14599622
]
ASF GitHub Bot commented on NUTCH-2038:
---
Github user asitang closed the pull request
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/34
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/35
NUTCH-2038
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asitang/nutch NUTCH-2038
Alternatively you can review and apply these changes as the p
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599610#comment-14599610
]
Sebastian Nagel commented on NUTCH-2038:
Jaccard similarity sounds more like a sco
[
https://issues.apache.org/jira/browse/NUTCH-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599210#comment-14599210
]
Sebastian Nagel commented on NUTCH-2038:
Yes, it's possible to implement it in Htm
57 matches
Mail list logo