[jira] [Updated] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format

2015-03-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1949: - Assignee: Lewis John McGibbney (was: Giuseppe Totaro) > Dump out the Nuth data into the C

[jira] [Updated] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format

2015-03-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1949: - Component/s: tool storage linkdb crawldb

[jira] [Updated] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format

2015-03-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-1949: - Fix Version/s: 1.10 > Dump out the Nuth data into the Common Crawl format > --

[jira] [Updated] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format

2015-02-27 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giuseppe Totaro updated NUTCH-1949: --- Attachment: CommonCrawlDataDumper_v02.pdf CommonCrawlDataDumper.xlsx Hi all, I

[jira] [Updated] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format

2015-02-25 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giuseppe Totaro updated NUTCH-1949: --- Attachment: CommonCrawlDataDumper.pdf You can find in attachment my workflow diagram. I will u

[jira] [Updated] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format

2015-02-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1949: Assignee: Giuseppe Totaro > Dump out the Nuth data into the Common Crawl format > --