[GitHub] nutch pull request: NUTCH-2038

2015-06-30 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/42 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[GitHub] nutch pull request: NUTCH-2038

2015-06-29 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/42 NUTCH-2038 minor changes and suggestions by Sebastian. You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2038 Alternatively you

[GitHub] nutch pull request: NUTCH-2038

2015-06-29 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/41 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] nutch pull request: NUTCH-2038

2015-06-29 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/41 NUTCH-2038 --added specific IOException messages --added files: conf/naivebayes-train.txt.template conf/naivebayes-wordlist.txt.template You can merge this pull request into a Git reposit

[GitHub] nutch pull request: NUTCH-2038

2015-06-29 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/40 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] nutch pull request: NUTCH-2038

2015-06-29 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/40 NUTCH-2038 added all the jars in plugin.xml You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2038 Alternatively you can review

[GitHub] nutch pull request: NUTCH-2038

2015-06-28 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/39 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enable

[GitHub] nutch pull request: NUTCH-2038

2015-06-28 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/39 NUTCH-2038 Removed the TODO comments You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2038 Alternatively you can review and app

[GitHub] nutch pull request: NUTCH-2038

2015-06-28 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/38 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] nutch pull request: NUTCH-2038

2015-06-28 Thread asitang
Github user asitang commented on a diff in the pull request: https://github.com/apache/nutch/pull/38#discussion_r33433136 --- Diff: src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java --- @@ -0,0 +1,204 @@ +/** + *

[GitHub] nutch pull request: NUTCH-2038

2015-06-28 Thread asitang
Github user asitang commented on a diff in the pull request: https://github.com/apache/nutch/pull/38#discussion_r33433090 --- Diff: src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java --- @@ -0,0 +1,204 @@ +/** + *

[GitHub] nutch pull request: NUTCH-2038

2015-06-28 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/38#discussion_r33432911 --- Diff: src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java --- @@ -0,0 +1,204 @@ +/**

[GitHub] nutch pull request: NUTCH-2038

2015-06-28 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/38#discussion_r33432889 --- Diff: src/plugin/parsefilter-naivebayes/src/java/org/apache/nutch/parsefilter/naivebayes/NaiveBayesParseFilter.java --- @@ -0,0 +1,204 @@ +/**

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/36 NUTCH-2038 Made aesthetic changes suggested by Chris Mattmann. Removed dependencies from the main ivy.xml and added it to plugin's ivy.xml. You can merge this pull request into a Git repository by r

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/35 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165638 --- Diff: src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java --- @@ -0,0 +1,214 @@

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165623 --- Diff: src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java --- @@ -0,0 +1,214 @@

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165581 --- Diff: src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java --- @@ -0,0 +1,214 @@

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165500 --- Diff: src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java --- @@ -0,0 +1,214 @@

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165528 --- Diff: src/plugin/htmlparsefilter-naivebayes/src/java/org/apache/nutch/htmlparsefilter/naivebayes/NaiveBayesHTMLParseFilter.java --- @@ -0,0 +1,214 @@

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165388 --- Diff: ivy/ivy.xml --- @@ -78,7 +78,11 @@ - +

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165405 --- Diff: ivy/ivy.xml --- @@ -100,6 +104,8 @@ + --- End diff -- also should

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165338 --- Diff: ivy/ivy.xml --- @@ -78,7 +78,11 @@ - +

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165265 --- Diff: conf/nutch-default.xml --- @@ -1208,6 +1208,28 @@ + htmlparsefilter.naivebayes.trainfile + + Set the name of th

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/35#discussion_r33165299 --- Diff: conf/nutch-default.xml --- @@ -1258,6 +1280,7 @@ + --- End diff -- extraneous not needed. --- If your projec

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/34 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] nutch pull request: NUTCH-2038

2015-06-24 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/35 NUTCH-2038 You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2038 Alternatively you can review and apply these changes as the p

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/34#discussion_r32870349 --- Diff: src/java/org/apache/nutch/parse/ParseSegment.java --- @@ -140,6 +177,37 @@ public void map(WritableComparable key, Content content,

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/34#discussion_r32870336 --- Diff: src/java/org/apache/nutch/parse/ParseSegment.java --- @@ -69,6 +77,35 @@ public void configure(JobConf job) { setConf(job); thi

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/34#discussion_r32869857 --- Diff: ivy/ivy.xml --- @@ -78,7 +78,11 @@ - +

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/34#discussion_r32869870 --- Diff: src/java/org/apache/nutch/net/URLFilters.java --- @@ -41,4 +42,28 @@ public String filter(String urlString) throws URLFilterException {

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/34#discussion_r32869372 --- Diff: conf/nutch-default.xml --- @@ -1259,6 +1259,34 @@ + urlfilter.model.trainfile + + Set the name of the file to b

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/34#discussion_r32869350 --- Diff: conf/nutch-default.xml --- @@ -1259,6 +1259,34 @@ + urlfilter.model.trainfile + + Set the name of the file to b

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/34#discussion_r32869303 --- Diff: conf/nutch-default.xml --- @@ -1259,6 +1259,34 @@ + urlfilter.model.trainfile + + Set the name of the file to b

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/34 NUTCH-2038 You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2038 Alternatively you can review and apply these changes as the p

[GitHub] nutch pull request: NUTCH-2038

2015-06-19 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/32 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] nutch pull request: NUTCH-2038

2015-06-18 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32798921 --- Diff: src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/NBClassifier.java --- @@ -0,0 +1,234 @@ +/** + * Licensed to the A

[GitHub] nutch pull request: NUTCH-2038

2015-06-18 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32798910 --- Diff: src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/NBClassifier.java --- @@ -0,0 +1,234 @@ +/** --- End diff --

[GitHub] nutch pull request: NUTCH-2038

2015-06-18 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32798896 --- Diff: src/java/org/apache/nutch/parse/ParseSegment.java --- @@ -56,6 +57,14 @@ private ParseUtil parseUtil; private boolean skipTru

[GitHub] nutch pull request: NUTCH-2038

2015-06-18 Thread chrismattmann
Github user chrismattmann commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32798873 --- Diff: ivy/ivy.xml --- @@ -78,7 +78,11 @@ - +

[GitHub] nutch pull request: NUTCH-2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32741673 --- Diff: ivy/ivy.xml --- @@ -78,7 +78,11 @@ - +

[GitHub] nutch pull request: NUTCH-2038

2015-06-18 Thread asitang
Github user asitang commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32741196 --- Diff: ivy/ivy.xml --- @@ -78,7 +78,11 @@ - +

[GitHub] nutch pull request: NUTCH-2038

2015-06-18 Thread asitang
Github user asitang commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32742390 --- Diff: conf/nutch-default.xml --- @@ -1136,6 +1136,28 @@ + parser.modelfilter.trainfile + tweets-train.tsv + --- End dif

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/32 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread asitang
GitHub user asitang opened a pull request: https://github.com/apache/nutch/pull/32 Nutch 2038 You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2038 Alternatively you can review and apply these changes as the p

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread asitang
Github user asitang closed the pull request at: https://github.com/apache/nutch/pull/31 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702482 --- Diff: src/java/org/apache/nutch/parse/ModelURLFilterAbstract.java --- @@ -0,0 +1,12 @@ +package org.apache.nutch.parse; --- End diff -- We n

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702463 --- Diff: ivy/ivy.xml --- @@ -78,7 +78,11 @@ - +

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702499 --- Diff: conf/nutch-default.xml --- @@ -1136,6 +1136,28 @@ + parser.modelfilter.trainfile + tweets-train.tsv + --- End dif

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702537 --- Diff: src/java/org/apache/nutch/net/URLFilters.java --- @@ -41,4 +41,24 @@ public String filter(String urlString) throws URLFilterException { }

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread asitang
GitHub user asitang reopened a pull request: https://github.com/apache/nutch/pull/32 Nutch 2038 You can merge this pull request into a Git repository by running: $ git pull https://github.com/asitang/nutch NUTCH-2038 Alternatively you can review and apply these changes as the

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702634 --- Diff: src/java/org/apache/nutch/parse/ParseSegment.java --- @@ -56,6 +57,14 @@ private ParseUtil parseUtil; private boolean skipTruncated

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702839 --- Diff: src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/NBClassifier.java --- @@ -0,0 +1,234 @@ +/** + * Licensed to the Apache

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702649 --- Diff: src/java/org/apache/nutch/parse/ParseSegment.java --- @@ -140,6 +161,29 @@ public void map(WritableComparable key, Content content, LOG

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702733 --- Diff: src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/ModelURLFilter.java --- @@ -0,0 +1,158 @@ +/** + * Licensed to the Apach

[GitHub] nutch pull request: Nutch 2038

2015-06-18 Thread lewismc
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/32#discussion_r32702851 --- Diff: src/plugin/urlfilter-model/src/java/org/apache/nutch/urlfilter/model/NBClassifier.java --- @@ -0,0 +1,234 @@ +/** + * Licensed to the Apache