Github user asfgit closed the pull request at:
https://github.com/apache/nutch/pull/42
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enable
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/42
NUTCH-2038
minor changes and suggestions by Sebastian.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asitang/nutch NUTCH-2038
Alternatively you
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/41
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/41
NUTCH-2038
--added specific IOException messages
--added files:
conf/naivebayes-train.txt.template
conf/naivebayes-wordlist.txt.template
You can merge this pull request into a Git reposit
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/40
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/40
NUTCH-2038
added all the jars in plugin.xml
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asitang/nutch NUTCH-2038
Alternatively you can review
Github user asfgit closed the pull request at:
https://github.com/apache/nutch/pull/39
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enable
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/39
NUTCH-2038
Removed the TODO comments
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asitang/nutch NUTCH-2038
Alternatively you can review and app
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/38
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
Github user asitang commented on a diff in the pull request:
https://github.com/apache/nutch/pull/38#discussion_r33433136
--- Diff:
src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java
---
@@ -0,0 +1,204 @@
+/**
+ *
Github user asitang commented on a diff in the pull request:
https://github.com/apache/nutch/pull/38#discussion_r33433090
--- Diff:
src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java
---
@@ -0,0 +1,204 @@
+/**
+ *
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/38#discussion_r33432911
--- Diff:
src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java
---
@@ -0,0 +1,204 @@
+/**
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/38#discussion_r33432889
--- Diff:
src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java
---
@@ -0,0 +1,204 @@
+/**
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/36
NUTCH-2038
Made aesthetic changes suggested by Chris Mattmann. Removed dependencies
from the main ivy.xml and added it to plugin's ivy.xml.
You can merge this pull request into a Git repository by r
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/35
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165638
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165623
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165581
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165500
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165528
--- Diff:
src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java
---
@@ -0,0 +1,214 @@
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165388
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165405
--- Diff: ivy/ivy.xml ---
@@ -100,6 +104,8 @@
+
--- End diff --
also should
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165338
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165265
--- Diff: conf/nutch-default.xml ---
@@ -1208,6 +1208,28 @@
+ htmlparsefilter.naivebayes.trainfile
+
+ Set the name of th
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/35#discussion_r33165299
--- Diff: conf/nutch-default.xml ---
@@ -1258,6 +1280,7 @@
+
--- End diff --
extraneous not needed.
---
If your projec
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/34
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/35
NUTCH-2038
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asitang/nutch NUTCH-2038
Alternatively you can review and apply these changes as the p
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/34#discussion_r32870349
--- Diff: src/java/org/apache/nutch/parse/ParseSegment.java ---
@@ -140,6 +177,37 @@ public void map(WritableComparable key, Content
content,
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/34#discussion_r32870336
--- Diff: src/java/org/apache/nutch/parse/ParseSegment.java ---
@@ -69,6 +77,35 @@ public void configure(JobConf job) {
setConf(job);
thi
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/34#discussion_r32869857
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/34#discussion_r32869870
--- Diff: src/java/org/apache/nutch/net/URLFilters.java ---
@@ -41,4 +42,28 @@ public String filter(String urlString) throws
URLFilterException {
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/34#discussion_r32869372
--- Diff: conf/nutch-default.xml ---
@@ -1259,6 +1259,34 @@
+ urlfilter.model.trainfile
+
+ Set the name of the file to b
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/34#discussion_r32869350
--- Diff: conf/nutch-default.xml ---
@@ -1259,6 +1259,34 @@
+ urlfilter.model.trainfile
+
+ Set the name of the file to b
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/34#discussion_r32869303
--- Diff: conf/nutch-default.xml ---
@@ -1259,6 +1259,34 @@
+ urlfilter.model.trainfile
+
+ Set the name of the file to b
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/34
NUTCH-2038
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asitang/nutch NUTCH-2038
Alternatively you can review and apply these changes as the p
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/32
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32798921
--- Diff:
src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/NBClassifier.java
---
@@ -0,0 +1,234 @@
+/**
+ * Licensed to the A
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32798910
--- Diff:
src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/NBClassifier.java
---
@@ -0,0 +1,234 @@
+/**
--- End diff --
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32798896
--- Diff: src/java/org/apache/nutch/parse/ParseSegment.java ---
@@ -56,6 +57,14 @@
private ParseUtil parseUtil;
private boolean skipTru
Github user chrismattmann commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32798873
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32741673
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
Github user asitang commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32741196
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
Github user asitang commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32742390
--- Diff: conf/nutch-default.xml ---
@@ -1136,6 +1136,28 @@
+ parser.modelfilter.trainfile
+ tweets-train.tsv
+
--- End dif
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/32
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
GitHub user asitang opened a pull request:
https://github.com/apache/nutch/pull/32
Nutch 2038
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asitang/nutch NUTCH-2038
Alternatively you can review and apply these changes as the p
Github user asitang closed the pull request at:
https://github.com/apache/nutch/pull/31
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702482
--- Diff: src/java/org/apache/nutch/parse/ModelURLFilterAbstract.java ---
@@ -0,0 +1,12 @@
+package org.apache.nutch.parse;
--- End diff --
We n
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702463
--- Diff: ivy/ivy.xml ---
@@ -78,7 +78,11 @@
-
+
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702499
--- Diff: conf/nutch-default.xml ---
@@ -1136,6 +1136,28 @@
+ parser.modelfilter.trainfile
+ tweets-train.tsv
+
--- End dif
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702537
--- Diff: src/java/org/apache/nutch/net/URLFilters.java ---
@@ -41,4 +41,24 @@ public String filter(String urlString) throws
URLFilterException {
}
GitHub user asitang reopened a pull request:
https://github.com/apache/nutch/pull/32
Nutch 2038
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/asitang/nutch NUTCH-2038
Alternatively you can review and apply these changes as the
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702634
--- Diff: src/java/org/apache/nutch/parse/ParseSegment.java ---
@@ -56,6 +57,14 @@
private ParseUtil parseUtil;
private boolean skipTruncated
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702839
--- Diff:
src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/NBClassifier.java
---
@@ -0,0 +1,234 @@
+/**
+ * Licensed to the Apache
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702649
--- Diff: src/java/org/apache/nutch/parse/ParseSegment.java ---
@@ -140,6 +161,29 @@ public void map(WritableComparable key, Content
content,
LOG
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702733
--- Diff:
src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/ModelURLFilter.java
---
@@ -0,0 +1,158 @@
+/**
+ * Licensed to the Apach
Github user lewismc commented on a diff in the pull request:
https://github.com/apache/nutch/pull/32#discussion_r32702851
--- Diff:
src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/NBClassifier.java
---
@@ -0,0 +1,234 @@
+/**
+ * Licensed to the Apache
56 matches
Mail list logo