OOM error during parsing with nekohtml

2007-07-16 Thread Shailendra Mudgal
Hi All, We are getting an OOM Exception during the processing of http://www.fotofinity.com/cgi-bin/homepages.cgi . We have also applied Nutch-497 patch to our source code. But actually the error is coming during the parse method. Does anybody has any idea regarding this. Here is the complete sta

RE: OOM error during parsing with nekohtml

2007-07-16 Thread Tsengtan A Shuy
I successfully run the whole-web crawl with the my new ubuntu OS, and I am ready to fix the bug. I need someone to guide me to get the most updated source code and the bug assignment. Thank you in advance!! Adam Shuy, President ePacific Web Design & Hosting Professional Web/Software developer T

[jira] Created: (NUTCH-515) Next fetch time is set incorrectly

2007-07-16 Thread JIRA
Next fetch time is set incorrectly -- Key: NUTCH-515 URL: https://issues.apache.org/jira/browse/NUTCH-515 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.0.0 Re

[jira] Updated: (NUTCH-515) Next fetch time is set incorrectly

2007-07-16 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-515: Attachment: NUTCH-515.patch Patch for the issue. Replaces wrong usages of old option ("db.default.f

[jira] Commented: (NUTCH-439) Top Level Domains Indexing / Scoring

2007-07-16 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512930 ] Doğacan Güney commented on NUTCH-439: - A big +1 from me. Though, it may be useful to break this patch into multipl

Re: OOM error during parsing with nekohtml

2007-07-16 Thread Kai_testing Middleton
You could try looking at these two discussions: http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06571.html http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06571.html --Kai - Original Message From: Tsengtan A Shuy <[EMAIL PROTECTED]> To: nutch-dev@lucene.apache.org;

RE: OOM error during parsing with nekohtml

2007-07-16 Thread Tsengtan A Shuy
Thank you for the info. The OOM exception in your previous email indicates that your system is running out of heap memory. You either have instantiated too many objects, or there are memory leaks in the source codes. Hope this will help you! Cheer!! Adam Shuy, President ePacific Web Design & Hos

[jira] Commented: (NUTCH-515) Next fetch time is set incorrectly

2007-07-16 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513019 ] Andrzej Bialecki commented on NUTCH-515: - +1 - sorry for the mess up ... > Next fetch time is set incorrectl

[jira] Commented: (NUTCH-515) Next fetch time is set incorrectly

2007-07-16 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513040 ] Doğacan Güney commented on NUTCH-515: - With more than a hundred config options, and with the way we use hadoop's

[jira] Commented: (NUTCH-506) Nutch should delegate compression to Hadoop

2007-07-16 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513044 ] Doğacan Güney commented on NUTCH-506: - If there are no objections, I am going to commit this one. Just to get mor

Re: OOM error during parsing with nekohtml

2007-07-16 Thread Shailendra Mudgal
Hi all, Thanks for your suggestions. I am running parse on a single url ( http://www.fotofinity.com/cgi-bin/homepages.cgi). For other urls, parse works perfectly. we are getting this error because of the html of the page. The page contains many anchor tags which are not closed properly. Hence ne

[jira] Resolved: (NUTCH-515) Next fetch time is set incorrectly

2007-07-16 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney resolved NUTCH-515. - Resolution: Fixed Assignee: Doğacan Güney Patch committed in rev. 556824. > Next fetch time

Re: OOM error during parsing with nekohtml

2007-07-16 Thread Doğacan Güney
Hi, On 7/17/07, Shailendra Mudgal <[EMAIL PROTECTED]> wrote: Hi all, Thanks for your suggestions. I am running parse on a single url ( http://www.fotofinity.com/cgi-bin/homepages.cgi). For other urls, parse works perfectly. we are getting this error because of the html of the page. The page co