[jira] [Updated] (NUTCH-1469) Upgrade commons-net dependency

2013-05-17 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1469: --- Labels: ftp ftpclient (was: ) > Upgrade commons-net dependency > --

[jira] [Commented] (NUTCH-1469) Upgrade commons-net dependency

2013-05-17 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661078#comment-13661078 ] Tejas Patil commented on NUTCH-1469: Hi Lewis, I checked that merely updating the dep

[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-05-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655807#comment-13655807 ] Tejas Patil commented on NUTCH-1545: I dont fully understand the significance of batch

[jira] [Resolved] (NUTCH-1418) error parsing robots rules- can't decode path: /wiki/Wikipedia%3Mediation_Committee/

2013-05-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1418. Resolution: Fixed Fix Version/s: 2.2 After the robots handling has been delegated to crawler

[jira] [Updated] (NUTCH-1325) HostDB for Nutch

2013-05-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1325: --- Attachment: NUTCH-1325.trunk.v2.path Hi [~markus17], The initial patch is good. This feature would be

[jira] [Updated] (NUTCH-1053) Parsing of RSS feeds fails

2013-05-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1053: --- Attachment: NUTCH-1053.trunk.patch A tiny change in ivy file for feeds plugin fixes the problem. Atta

[jira] [Commented] (NUTCH-1243) Junit jar removed from lib

2013-05-10 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654755#comment-13654755 ] Tejas Patil commented on NUTCH-1243: Hi [~jnioche], looks like you have fixed the [ivy

[jira] [Comment Edited] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-05-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652805#comment-13652805 ] Tejas Patil edited comment on NUTCH-1031 at 5/9/13 7:52 AM: I

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-05-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652805#comment-13652805 ] Tejas Patil commented on NUTCH-1031: I had forgot to add crawler-commons dependency in

[jira] [Closed] (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implment

2013-05-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-427. - Resolution: Won't Fix Patch uses JCIFS which is licensed under LGPL. So it cannot be included in Nutch di

[jira] [Updated] (NUTCH-1249) Resolve all issues flagged up by adding javac -Xlint arguement

2013-05-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1249: --- Attachment: NUTCH-1249.trunk.patch Here is a mega patch for trunk which addresses all warnings(there

[jira] [Updated] (NUTCH-1513) Support Robots.txt for Ftp urls

2013-05-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1513: --- Attachment: NUTCH-1513.2.x.v2.patch NUTCH-1513.trunk.v2.patch Attached the patches fo

[jira] [Resolved] (NUTCH-1277) Fix [fallthrough] javac warnings

2013-05-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1277. Resolution: Fixed Fix Version/s: 2.2 As the change was trivial, went ahead and committed to

[jira] [Assigned] (NUTCH-1277) Fix [fallthrough] javac warnings

2013-05-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reassigned NUTCH-1277: -- Assignee: Tejas Patil > Fix [fallthrough] javac warnings >

[jira] [Updated] (NUTCH-1277) Fix [fallthrough] javac warnings

2013-05-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1277: --- Attachment: NUTCH-1277.2.x.patch NUTCH-1277.trunk.patch The options that we have are:

[jira] [Commented] (NUTCH-1513) Support Robots.txt for Ftp urls

2013-05-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649026#comment-13649026 ] Tejas Patil commented on NUTCH-1513: One thing that I forgot to mention: The change pi

[jira] [Updated] (NUTCH-1513) Support Robots.txt for Ftp urls

2013-05-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1513: --- Attachment: NUTCH-1513.trunk.patch Patch for trunk. Please let me know your comments.

[jira] [Comment Edited] (NUTCH-1039) Fetcher fails for pages without content-length header

2013-05-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648787#comment-13648787 ] Tejas Patil edited comment on NUTCH-1039 at 5/3/13 8:36 PM: I

[jira] [Resolved] (NUTCH-1039) Fetcher fails for pages without content-length header

2013-05-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1039. Resolution: Cannot Reproduce I feel that thin item wont have any progress unless we get some real u

[jira] [Commented] (NUTCH-1039) Fetcher fails for pages without content-length header

2013-05-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648785#comment-13648785 ] Tejas Patil commented on NUTCH-1039: I am not able to reproduce this issue: {noformat

[jira] [Commented] (NUTCH-649) Log list of files found but not crawled.

2013-05-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648752#comment-13648752 ] Tejas Patil commented on NUTCH-649: --- Hi [~lewismc], The method where I need to introduce

[jira] [Resolved] (NUTCH-1514) Phase out the deprecated configuration properties (if possible)

2013-05-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1514. Resolution: Fixed Assignee: Tejas Patil Committed to trunk (rev 1478939) and 2.x (rev 1478937

[jira] [Resolved] (NUTCH-1334) NPE in FetcherOutputFormat

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1334. Resolution: Fixed As the patch was trivial (null checks) so went ahead and committed to trunk @ re

[jira] [Commented] (NUTCH-1566) bin/nutch to allow whitespace in paths

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645985#comment-13645985 ] Tejas Patil commented on NUTCH-1566: Apart from normal testing over a linux box, this

[jira] [Closed] (NUTCH-1329) parser not extract outlinks to external web sites

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-1329. -- Resolution: Cannot Reproduce Closing for now by marking it "cannot reproduce" > parser

[jira] [Resolved] (NUTCH-1549) Fix deprecated use of Tika MimeType API in o.a.n.util.MimeUtil

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1549. Resolution: Fixed Assignee: Tejas Patil Ported the change from 2.x to trunk. Committed @ rev1

[jira] [Resolved] (NUTCH-213) checkstyle

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-213. --- Resolution: Won't Fix Closing this one as we don't need this. > checkstyle >

[jira] [Commented] (NUTCH-1549) Fix deprecated use of Tika MimeType API in o.a.n.util.MimeUtil

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645884#comment-13645884 ] Tejas Patil commented on NUTCH-1549: This one got fixed for 2.x with NUTCH-1273.

[jira] [Resolved] (NUTCH-1273) Fix [deprecation] javac warnings

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1273. Resolution: Fixed Committed to 2.x at revision 1477792 > Fix [deprecation] javac w

[jira] [Commented] (NUTCH-649) Log list of files found but not crawled.

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645802#comment-13645802 ] Tejas Patil commented on NUTCH-649: --- Hi [~lewismc], Now thats an awesome idea... certainl

[jira] [Commented] (NUTCH-1273) Fix [deprecation] javac warnings

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645794#comment-13645794 ] Tejas Patil commented on NUTCH-1273: Hi [~lewismc], Yup. I went through the new API ov

[jira] [Resolved] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1529. Resolution: Won't Fix There was a similar jira related to mongodb which lead to a discussion that w

[jira] [Commented] (NUTCH-1053) Parsing of RSS feeds fails

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645497#comment-13645497 ] Tejas Patil commented on NUTCH-1053: I have figured out the issue here but cant figure

[jira] [Comment Edited] (NUTCH-1273) Fix [deprecation] javac warnings

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645434#comment-13645434 ] Tejas Patil edited comment on NUTCH-1273 at 4/30/13 11:19 AM: --

[jira] [Updated] (NUTCH-1273) Fix [deprecation] javac warnings

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1273: --- Attachment: NUTCH-1249.2.x.v2.patch > Fix [deprecation] javac warnings >

[jira] [Commented] (NUTCH-1273) Fix [deprecation] javac warnings

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645434#comment-13645434 ] Tejas Patil commented on NUTCH-1273: Hey [~lewismc], I still see deprecation warnings

[jira] [Updated] (NUTCH-1543) Display consistent usage of DBUpdaterJob with 1.X

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1543: --- Attachment: NUTCH-1543.v2.patch Hi [~amuseme], The patch will kill the current behavior wherein if no

[jira] [Resolved] (NUTCH-802) Problems managing outlinks with large url length

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-802. --- Resolution: Won't Fix Agree with Markus and Lewis. Hence marking this one as wont fix. If someone wis

[jira] [Updated] (NUTCH-1514) Phase out the deprecated configuration properties (if possible)

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1514: --- Attachment: NUTCH-1514.2.x.patch Here is a corresponding patch for 2.x. Unless there are any objectio

[jira] [Closed] (NUTCH-449) Format of junit output should be configurable

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-449. - Resolution: Implemented I have verified that current trunk and 2.x build files already have this change.

[jira] [Commented] (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implm

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645403#comment-13645403 ] Tejas Patil commented on NUTCH-427: --- As [~ab] mentioned earlier "This plugin uses an LGPL

[jira] [Commented] (NUTCH-213) checkstyle

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645400#comment-13645400 ] Tejas Patil commented on NUTCH-213: --- IMHO, I dont think that we are in dire need to have

[jira] [Closed] (NUTCH-342) Nutch commands log to nutch/logs/hadoop.logs by default

2013-04-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-342. - Resolution: Won't Fix I agree with Lewis wrt closing this issue as won't fix. > Nutch com

[jira] [Closed] (NUTCH-1455) RobotRulesParser to match multi-word user-agent names

2013-04-29 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-1455. -- Resolution: Fixed Fix Version/s: 2.2 Assignee: Tejas Patil We have migrated to CC for r

[jira] [Resolved] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-04-29 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1031. Resolution: Fixed Fix Version/s: 2.2 Thanks Lewis :) Changes committed to 2.x (revision 147

[jira] [Commented] (NUTCH-1329) parser not extract outlinks to external web sites

2013-04-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644272#comment-13644272 ] Tejas Patil commented on NUTCH-1329: Should we close this one ? I had tried to reprod

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2013-04-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644269#comment-13644269 ] Tejas Patil commented on NUTCH-1314: Hi Lewis, I tried to test both the patches. NUTCH

[jira] [Updated] (NUTCH-649) Log list of files found but not crawled.

2013-04-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-649: -- Attachment: NUTCH-649.trunk.patch NUTCH-649.2.x.patch patches for trunk and 2.x

[jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-04-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1031: --- Attachment: NUTCH-1031-2.x.v1.patch Patch for 2.x. If there are no objections, would commit in coming

[jira] [Closed] (NUTCH-346) Improve readability of logs/hadoop.log

2013-04-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-346. - Resolution: Fixed Pushed to svn. (trunk: rev 1476859, 2.x: 1476861) > Improve readability

[jira] [Closed] (NUTCH-1528) Port nutch-mongodb-indexer to Nutch

2013-04-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed NUTCH-1528. -- Resolution: Won't Fix > Port nutch-mongodb-indexer to Nutch > --- >

[jira] [Commented] (NUTCH-1528) Port nutch-mongodb-indexer to Nutch

2013-04-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643879#comment-13643879 ] Tejas Patil commented on NUTCH-1528: I think that [~jnioche] and [~wastl-nagel] might

[jira] [Commented] (NUTCH-346) Improve readability of logs/hadoop.log

2013-04-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643863#comment-13643863 ] Tejas Patil commented on NUTCH-346: --- I think that this will be a good addition as current

[jira] [Commented] (NUTCH-1528) Port nutch-mongodb-indexer to Nutch

2013-04-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643858#comment-13643858 ] Tejas Patil commented on NUTCH-1528: As this change ain't going to the repo, should we

[jira] [Resolved] (NUTCH-829) duplicate hadoop temp files

2013-04-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-829. --- Resolution: Fixed Thanks Lewis for pointing that out. Committed @ revision 1476702 >

[jira] [Updated] (NUTCH-829) duplicate hadoop temp files

2013-04-27 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-829: -- Attachment: NUTCH-829.v2.patch Hi Lewis, There was one more place in Generator where this change could h

[jira] [Resolved] (NUTCH-1565) Proper downloads page for Nutch

2013-04-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1565. Resolution: Fixed Changes pushed to SVN @ revision 1475631. Here is the [new downloads page|http:/

[jira] [Updated] (NUTCH-1565) Proper downloads page for Nutch

2013-04-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1565: --- Attachment: NUTCH-1565.v2.patch downloads.html So far I could fix the errors, but cou

[jira] [Commented] (NUTCH-1565) Proper downloads page for Nutch

2013-04-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641380#comment-13641380 ] Tejas Patil commented on NUTCH-1565: No problem bro.. i am close enough to get around

[jira] [Commented] (NUTCH-1565) Proper downloads page for Nutch

2013-04-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641349#comment-13641349 ] Tejas Patil commented on NUTCH-1565: Hi Lewis, I tried to build the docs using steps g

[jira] [Commented] (NUTCH-1447) Nutch 2.x with Cloudera CDH 4 get Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2013-04-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641243#comment-13641243 ] Tejas Patil commented on NUTCH-1447: I agree with you [~lewismc]. There are and will b

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-04-05 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624194#comment-13624194 ] Tejas Patil commented on NUTCH-1031: I have removed the @author tag and ported the che

[jira] [Comment Edited] (NUTCH-1544) Nutch crawls only first site from seed list

2013-03-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600295#comment-13600295 ] Tejas Patil edited comment on NUTCH-1544 at 3/12/13 6:58 PM: -

[jira] [Commented] (NUTCH-1544) Nutch crawls only first site from seed list

2013-03-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600295#comment-13600295 ] Tejas Patil commented on NUTCH-1544: I don't know what seeds you had and what urls you

[jira] [Commented] (NUTCH-1542) adddays param for generator not present in 2.x

2013-03-10 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598482#comment-13598482 ] Tejas Patil commented on NUTCH-1542: Committed @revision 1454974 > ad

[jira] [Resolved] (NUTCH-1542) adddays param for generator not present in 2.x

2013-03-10 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1542. Resolution: Fixed > adddays param for generator not present in 2.x > --

[jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-03-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1031: --- Attachment: NUTCH-1031-trunk.v5.patch Thanks Lewis :) I have corrected the usage message.

[jira] [Updated] (NUTCH-1542) adddays param for generator not present in 2.x

2013-03-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1542: --- Attachment: NUTCH-1542.v2.patch Updated the patch as per NUTCH-1393 > adddays param

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-03-08 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597460#comment-13597460 ] Tejas Patil commented on NUTCH-1031: @Dev: Can anyone kindly review the patch ?

[jira] [Updated] (NUTCH-1542) adddays param for generator not present in 2.x

2013-03-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1542: --- Attachment: NUTCH-1542.patch Patch for changes in GeneratorJob and the crawl script.

[jira] [Created] (NUTCH-1542) -adddays param for generator not present in 2.x

2013-03-06 Thread Tejas Patil (JIRA)
Tejas Patil created NUTCH-1542: -- Summary: -adddays param for generator not present in 2.x Key: NUTCH-1542 URL: https://issues.apache.org/jira/browse/NUTCH-1542 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1542) adddays param for generator not present in 2.x

2013-03-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1542: --- Summary: adddays param for generator not present in 2.x (was: -adddays param for generator not prese

[jira] [Commented] (NUTCH-842) AutoGenerate WebPage code

2013-03-05 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594442#comment-13594442 ] Tejas Patil commented on NUTCH-842: --- Hi Lewis, Can you kindly upload the changes that you

[jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-03-05 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1031: --- Attachment: NUTCH-1031-trunk.v4.patch Hey Lewis, Thanks for pointing that out :) I have updated the p

[jira] [Commented] (NUTCH-1454) parsing chm failed

2013-03-05 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13594411#comment-13594411 ] Tejas Patil commented on NUTCH-1454: Few observations about this issue: 1. Nutch is ge

[jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-03-05 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated NUTCH-1031: --- Attachment: NUTCH-1031-trunk.v3.patch Hi [~wastl-nagel], I have done the suggested changes. @[~amuse

[jira] [Commented] (NUTCH-1447) Nutch 2.x with Cloudera CDH 4 get Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2013-03-04 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593076#comment-13593076 ] Tejas Patil commented on NUTCH-1447: As per discussion over user group [0], I agree th

[jira] [Commented] (NUTCH-1447) Nutch 2.x with Cloudera CDH 4 get Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2013-03-04 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593073#comment-13593073 ] Tejas Patil commented on NUTCH-1447: This error is possibly due to code refactoring du

[jira] [Commented] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590243#comment-13590243 ] Tejas Patil commented on NUTCH-1529: @Lufeng The earlier patch had some references to

[jira] [Commented] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589977#comment-13589977 ] Tejas Patil commented on NUTCH-1529: @Lewis, I am a rookie in terms of mongodb so pref

[jira] [Commented] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13589946#comment-13589946 ] Tejas Patil commented on NUTCH-1529: There is no harm in adding such support. Mongodb

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-02-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585467#comment-13585467 ] Tejas Patil commented on NUTCH-1031: Hi Sebastian, Thanks for your time and suggesting

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-02-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583664#comment-13583664 ] Tejas Patil commented on NUTCH-1031: @Dev: I am planning to commit this change in comi

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-02-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583662#comment-13583662 ] Tejas Patil commented on NUTCH-1031: Hi Lewis, I should have checked on the main page

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-02-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583013#comment-13583013 ] Tejas Patil commented on NUTCH-1031: Hey Ken, A gentle reminder for releasing CC.

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-02-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583011#comment-13583011 ] Tejas Patil commented on NUTCH-1047: Hi Julien, One small change in Java class will b

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-02-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582169#comment-13582169 ] Tejas Patil commented on NUTCH-1047: Hey Julien, One question: Why is this change not

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-02-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582163#comment-13582163 ] Tejas Patil commented on NUTCH-1047: Hey Julien, While running the solrclean command,

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-02-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581964#comment-13581964 ] Tejas Patil commented on NUTCH-1047: Hi Julien, The crawl command (with solr option)

[jira] [Commented] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2013-02-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570050#comment-13570050 ] Tejas Patil commented on NUTCH-1521: Hi Lufeng, In 2.x, some classes are given differe

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-29 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13565202#comment-13565202 ] Tejas Patil commented on NUTCH-1047: Hi Julien, As you suggested, I tried to run solr

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564883#comment-13564883 ] Tejas Patil commented on NUTCH-1465: Hi Sebastian, So we are looking at 2 things here

[jira] [Comment Edited] (NUTCH-1465) Support sitemaps in Nutch

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564601#comment-13564601 ] Tejas Patil edited comment on NUTCH-1465 at 1/28/13 8:02 PM: -

[jira] [Commented] (NUTCH-1465) Support sitemaps in Nutch

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564601#comment-13564601 ] Tejas Patil commented on NUTCH-1465: Hi Sebastian, By (“for a given host, sitemaps ar

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564252#comment-13564252 ] Tejas Patil commented on NUTCH-1047: Hi Julien, The solrindex commmand and crawl scrip

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564187#comment-13564187 ] Tejas Patil commented on NUTCH-1047: Hi Julien, After reply from @lufeng, I was able

[jira] [Assigned] (NUTCH-1465) Support sitemaps in Nutch

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reassigned NUTCH-1465: -- Assignee: Tejas Patil > Support sitemaps in Nutch > - > >

[jira] [Resolved] (NUTCH-1042) Fetcher.max.crawl.delay property not taken into account correctly when set to -1

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1042. Resolution: Fixed The fix for NUTCH-1284 takes care of this. > Fetcher.max.crawl.d

[jira] [Resolved] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved NUTCH-1284. Resolution: Fixed > Add site fetcher.max.crawl.delay as log output by default. > --

[jira] [Commented] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.

2013-01-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564107#comment-13564107 ] Tejas Patil commented on NUTCH-1284: Committed @revision 1439289 in trunk Committed @r

<    1   2   3   >