[
https://issues.apache.org/jira/browse/NUTCH-526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515219
]
Doğacan Güney commented on NUTCH-526:
-
Do you have some numbers on how faster it is? Or what the reduction in
inp
Use a combiner in LinDbMerger to improve the performance as in LinkDb
-
Key: NUTCH-526
URL: https://issues.apache.org/jira/browse/NUTCH-526
Project: Nutch
Issue Type: Improv
[
https://issues.apache.org/jira/browse/NUTCH-526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Emmanuel Joke updated NUTCH-526:
Attachment: NUTCH-526.patch
patch provided
> Use a combiner in LinDbMerger to improve the performan
Offer valid until July 27/2007 - Buy the Doctor Database and get 4 other
medical Databases at no charge
Licensed Doctors in the USA
788,893 in total 17,400 emails
34 primary and secondary specialties
Fields: First name, Last name, Title, Specialty, Address (city, state, zip,
county),
M
[
https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cook updated NUTCH-25:
---
Attachment: EncodingDetector.java
I cleaned up EncodingDetector a little; here's a functionally identical, but
[
https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cook updated NUTCH-25:
---
Attachment: (was: EncodingDetector.java)
> needs 'character encoding' detector
> --
[
https://issues.apache.org/jira/browse/NUTCH-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515041
]
Ian Holsman commented on NUTCH-524:
---
Hi Dogacan.
we need this setting as we have the situation where we have a sing
[
https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cook updated NUTCH-25:
---
Attachment: EncodingDetector.java
patch
> needs 'character encoding' detector
> ---
[
https://issues.apache.org/jira/browse/NUTCH-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515026
]
Doug Cook commented on NUTCH-25:
OK, I've got more data, and a proposed solution.
I created a test set with a number o
尊敬的负责人(经理/财务)您好!
我公司以良好的商业信誉和高尚的行业道德品质依法纳税,荣获纳税信誉A级企业。
在国内大都城市中均有分公司,上海、北京、江苏、福建、广东。贵公司作帐及销售方
面是需要用到一些票据:(普通商品、地税、建筑安装、其它服务、广告、咨询、运输发
票)等……。
如有不详请来电: 刘志龙 :13794477608 E-mail : [EMAIL PROTECTED]
郑重承诺!交接方便可根据所开数量额度的大小来衡量优惠的点数,收费0.5%-1.5%。
如贵公司有些疑虑或担心可上网验证或本公司先提供发票给贵公司到税务局
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515008
]
Doğacan Güney commented on NUTCH-525:
-
Both patches look good to me, +1.
> DeleteDuplicates generates ArrayIndexO
Dear Sir,
This is Ningbo Yunhuan Electronics Group CO.,LTD. from China.
We are a professional Power Cords products manufacture with many years in
China. We offer many power cords according clients\' mind with good quality and
reasonable price.
We would like to make good business relation wit
尊敬的公司财务负责人:
您好!
本公司是广州市森榕贸易有限公司。我公司是一家服务性企业,专为各行业的利益得到提高且更方便
地为各公司提供服务,特别针对;能快捷开到税务发票这个项目,现可长期对外代开发票。代理范围:
商品销售,运输,广告,建筑,装饰,服务,定额等有关发票,点数从优,信誉第一。
联系人;李生
联系电话;13560448136
传真;020-39627289
广州市森榕贸易有限公司
---
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vishal Shah updated NUTCH-525:
--
Attachment: RededupUnitTest.patch
I have modified the existing junit test for DeleteDuplicates to test f
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514910
]
Vishal Shah commented on NUTCH-525:
---
Hi,
I'll add a unit test.
For the undelete thing, the need could arise i
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514915
]
Doğacan Güney commented on NUTCH-525:
-
OK, I can see why undelete is useful. But I still think that we should make
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514914
]
Andrzej Bialecki commented on NUTCH-525:
-
+1 for adding undeleteAll(). When DDRecordReader was created, this
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514903
]
Doğacan Güney commented on NUTCH-525:
-
Nice patch. Could you also add a unit test? It is enough if you add a test
[
https://issues.apache.org/jira/browse/NUTCH-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vishal Shah updated NUTCH-525:
--
Attachment: deleteDups.patch
Patch for the bug attached here.
> DeleteDuplicates generates ArrayIndexOu
DeleteDuplicates generates ArrayIndexOutOfBoundsException when trying to rerun
dedup on a segment
-
Key: NUTCH-525
URL: https://issues.apache.org/jira/browse/NUTCH-525
20 matches
Mail list logo