[jira] [Updated] (NUTCH-882) Design a Host table in GORA

2012-04-20 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-882: --- Attachment: NUTCH-882-v3.txt NUTCH-882-v3.txt New version of patch. (On behalf of Math

[jira] [Updated] (NUTCH-1340) Increase scalability by only removing markers when they actually exist for DbUpdaterReducer

2012-04-18 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1340: Attachment: NUTCH-1340-v1.txt > Increase scalability by only removing markers when they actuall

[jira] [Updated] (NUTCH-1314) Impose a limit on the length of outlink target urls

2012-03-16 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1314: Attachment: NUTCH-1314.patch > Impose a limit on the length of outlink target urls > --

[jira] [Updated] (NUTCH-1312) Nutchgora to send HTTP-accept header

2012-03-16 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1312: Attachment: NUTCH-1312.patch > Nutchgora to send HTTP-accept header > -

[jira] [Updated] (NUTCH-841) Nutch 2.0 webapp

2012-03-08 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-841: --- Priority: Major (was: Blocker) > Nutch 2.0 webapp > > > Key: NUT

[jira] [Updated] (NUTCH-1302) nutchgora job failures should be noticed by submitter

2012-03-06 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1302: Attachment: NUTCH-1302.patch > nutchgora job failures should be noticed by submitter >

[jira] [Updated] (NUTCH-1289) In distributed mode URL's are not partitioned

2012-03-05 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1289: Attachment: NUTCH-1289-v2.patch Done with patch v2. It fixes the problem as described above. It als

[jira] [Updated] (NUTCH-1295) nutchgora restlet dependencies failing when remote repos is down

2012-03-02 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1295: Attachment: NUTCH-1295.patch > nutchgora restlet dependencies failing when remote repos is down

[jira] [Updated] (NUTCH-965) Skip parsing for truncated documents

2012-02-22 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-965: --- Attachment: NUTCH-965-v3-trunk.txt NUTCH-965-v3-nutchgora.txt > Skip parsing for t

[jira] [Updated] (NUTCH-1286) Refactoring/reimplementing crawling API (NutchApp)

2012-02-20 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1286: Useful wiki http://wiki.apache.org/nutch/NutchAdministrationUserInterface > Refactori

[jira] [Updated] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain

2012-02-16 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1280: Attachment: NUTCH-1280.txt > language-identifier should have option to use detected value by Ti

[jira] [Updated] (NUTCH-1280) language-identifier should have option to use detected value by Tika even when uncertain

2012-02-16 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1280: Priority: Minor (was: Major) > language-identifier should have option to use detected value by

[jira] [Updated] (NUTCH-1279) Check if limit has been reached in GeneraterReducer must be the first check performance-wise.

2012-02-15 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1279: Attachment: NUTCH-1279.txt Attached patch and committed. > Check if limit has been

[jira] [Updated] (NUTCH-1263) FetcherJob must put 'fetchTime' on input

2012-01-31 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1263: Attachment: NUTCH-1263.patch > FetcherJob must put 'fetchTime' on input > -

[jira] [Updated] (NUTCH-1255) Change ivy.xml of all plugins to remove "nutch.root" property

2012-01-23 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1255: Attachment: NUTCH-1255-trunk.patch NUTCH-1255.patch > Change ivy.xml of all plu

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2-incubating in ivy/ivy.xml

2012-01-20 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: (was: NUTCH-1205-v3.patch) > Upgrade gora modules to 0.2-incubating in ivy/ivy.

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2-incubating in ivy/ivy.xml

2012-01-20 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: NUTCH-1205-v3.patch Reattaching because of "grant license to ASF".. >

[jira] [Updated] (NUTCH-1205) Upgrade gora modules to 0.2-incubating in ivy/ivy.xml

2012-01-20 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1205: Attachment: NUTCH-1205-v3.patch There, this patch fixes the two "createDataStore" compile errors. (

[jira] [Updated] (NUTCH-1189) add commented out default settings to gora.properties files

2011-11-14 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1189: Attachment: NUTCH-1189-v3.patch Hi Lewis, I took the liberty of adding the HBaseStore documentatio

[jira] [Updated] (NUTCH-1148) Nutchgora job jar functionalilty is broken: PluginManifestParser cannot load plugins from system classloader.

2011-11-14 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1148: Summary: Nutchgora job jar functionalilty is broken: PluginManifestParser cannot load plugins from

[jira] [Updated] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-14 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1196: Attachment: NUTCH-1196-v2.patch New version of patch available. Changes from previous one: bq. "re

[jira] [Updated] (NUTCH-1198) Less verbose logging when unmapped mimetypes are trying to be parsed.

2011-11-04 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1198: Attachment: NUTCH-1198.patch > Less verbose logging when unmapped mimetypes are trying to be pa

[jira] [Updated] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-04 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1196: Attachment: NUTCH-1196.patch Patch done. It applies the db.update.max.inlinks just like Nutch trunk

[jira] [Updated] (NUTCH-1196) Update job should impose an upper limit on the number of inlinks (nutchgora)

2011-11-04 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1196: Patch Info: Patch Available > Update job should impose an upper limit on the number of inlinks

[jira] [Updated] (NUTCH-1192) Add '/runtime' to svn ignore

2011-11-02 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1192: Fix Version/s: (was: 1.5) 1.4 > Add '/runtime' to svn ignore > -

[jira] [Updated] (NUTCH-1191) Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse

2011-11-01 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1191: Component/s: fetcher > Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse > ---

[jira] [Updated] (NUTCH-1191) Port NUTCH-1102 to nutchgora - consistent use of fetcher.parse

2011-11-01 Thread Ferdy Galema (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1191: Attachment: NUTCH-1191.patch Patch replaces all references with 'parse' argument to the 'fetcher.pa