[Nutch-dev] [jira] Commented: (NUTCH-526) Use a combiner in LinDbMerger to improve the performance as in LinkDb

2007-07-24 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515219 ] Doğacan Güney commented on NUTCH-526: - Do you have some numbers on how faster it is? Or what the reduction in inp

[Nutch-dev] [jira] Created: (NUTCH-526) Use a combiner in LinDbMerger to improve the performance as in LinkDb

2007-07-24 Thread Emmanuel Joke (JIRA)
Use a combiner in LinDbMerger to improve the performance as in LinkDb - Key: NUTCH-526 URL: https://issues.apache.org/jira/browse/NUTCH-526 Project: Nutch Issue Type: Improv

[Nutch-dev] [jira] Updated: (NUTCH-526) Use a combiner in LinDbMerger to improve the performance as in LinkDb

2007-07-24 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Joke updated NUTCH-526: Attachment: NUTCH-526.patch patch provided > Use a combiner in LinDbMerger to improve the performan

[Nutch-dev] Lists of Medical Doctors

2007-07-24 Thread goodwin
Offer valid until July 27/2007 - Buy the Doctor Database and get 4 other medical Databases at no charge Licensed Doctors in the USA 788,893 in total  17,400 emails 34 primary and secondary specialties Fields: First name, Last name, Title, Specialty, Address (city, state, zip, county), M

[Nutch-dev] [jira] Updated: (NUTCH-25) needs 'character encoding' detector

2007-07-24 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cook updated NUTCH-25: --- Attachment: EncodingDetector.java I cleaned up EncodingDetector a little; here's a functionally identical, but

[Nutch-dev] [jira] Updated: (NUTCH-25) needs 'character encoding' detector

2007-07-24 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cook updated NUTCH-25: --- Attachment: (was: EncodingDetector.java) > needs 'character encoding' detector > --

[Nutch-dev] [jira] Commented: (NUTCH-524) Generate Problem with Single Node

2007-07-24 Thread Ian Holsman (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515041 ] Ian Holsman commented on NUTCH-524: --- Hi Dogacan. we need this setting as we have the situation where we have a sing

[Nutch-dev] [jira] Updated: (NUTCH-25) needs 'character encoding' detector

2007-07-24 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cook updated NUTCH-25: --- Attachment: EncodingDetector.java patch > needs 'character encoding' detector > ---

[Nutch-dev] [jira] Commented: (NUTCH-25) needs 'character encoding' detector

2007-07-24 Thread Doug Cook (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515026 ] Doug Cook commented on NUTCH-25: OK, I've got more data, and a proposed solution. I created a test set with a number o

[Nutch-dev] 票/据

2007-07-24 Thread 刘志龙
尊敬的负责人(经理/财务)您好! 我公司以良好的商业信誉和高尚的行业道德品质依法纳税,荣获纳税信誉A级企业。 在国内大都城市中均有分公司,上海、北京、江苏、福建、广东。贵公司作帐及销售方 面是需要用到一些票据:(普通商品、地税、建筑安装、其它服务、广告、咨询、运输发 票)等……。 如有不详请来电: 刘志龙 :13794477608 E-mail : [EMAIL PROTECTED] 郑重承诺!交接方便可根据所开数量额度的大小来衡量优惠的点数,收费0.5%-1.5%。 如贵公司有些疑虑或担心可上网验证或本公司先提供发票给贵公司到税务局

[Nutch-dev] [jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-24 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515008 ] Doğacan Güney commented on NUTCH-525: - Both patches look good to me, +1. > DeleteDuplicates generates ArrayIndexO

Re: [Nutch-dev] power cords

2007-07-24 Thread Mr. Shen
Dear Sir, This is Ningbo Yunhuan Electronics Group CO.,LTD. from China. We are a professional Power Cords products manufacture with many years in China. We offer many power cords according clients\' mind with good quality and reasonable price. We would like to make good business relation wit

[Nutch-dev] 广州市

2007-07-24 Thread
尊敬的公司财务负责人: 您好! 本公司是广州市森榕贸易有限公司。我公司是一家服务性企业,专为各行业的利益得到提高且更方便 地为各公司提供服务,特别针对;能快捷开到税务发票这个项目,现可长期对外代开发票。代理范围: 商品销售,运输,广告,建筑,装饰,服务,定额等有关发票,点数从优,信誉第一。 联系人;李生 联系电话;13560448136 传真;020-39627289 广州市森榕贸易有限公司 ---

[Nutch-dev] [jira] Updated: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-24 Thread Vishal Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Shah updated NUTCH-525: -- Attachment: RededupUnitTest.patch I have modified the existing junit test for DeleteDuplicates to test f

[Nutch-dev] [jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-24 Thread Vishal Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514910 ] Vishal Shah commented on NUTCH-525: --- Hi, I'll add a unit test. For the undelete thing, the need could arise i

[Nutch-dev] [jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-24 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514915 ] Doğacan Güney commented on NUTCH-525: - OK, I can see why undelete is useful. But I still think that we should make

[Nutch-dev] [jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-24 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514914 ] Andrzej Bialecki commented on NUTCH-525: - +1 for adding undeleteAll(). When DDRecordReader was created, this

[Nutch-dev] [jira] Commented: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-24 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514903 ] Doğacan Güney commented on NUTCH-525: - Nice patch. Could you also add a unit test? It is enough if you add a test

[Nutch-dev] [jira] Updated: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-24 Thread Vishal Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Shah updated NUTCH-525: -- Attachment: deleteDups.patch Patch for the bug attached here. > DeleteDuplicates generates ArrayIndexOu

[Nutch-dev] [jira] Created: (NUTCH-525) DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment

2007-07-24 Thread Vishal Shah (JIRA)
DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun dedup on a segment - Key: NUTCH-525 URL: https://issues.apache.org/jira/browse/NUTCH-525