[jira] [Created] (NUTCH-1269) Generate main problems

2012-02-08 Thread behnam nikbakht (Created) (JIRA)
Generate main problems -- Key: NUTCH-1269 URL: https://issues.apache.org/jira/browse/NUTCH-1269 Project: Nutch Issue Type: Improvement Components: generator Affects Versions: 1.4 Environment:

[jira] [Created] (NUTCH-1270) some of Deflate encoded pages not fetched

2012-02-08 Thread behnam nikbakht (Created) (JIRA)
some of Deflate encoded pages not fetched - Key: NUTCH-1270 URL: https://issues.apache.org/jira/browse/NUTCH-1270 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.4

[jira] [Commented] (NUTCH-1269) Generate main problems

2012-02-08 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203453#comment-13203453 ] Lewis John McGibbney commented on NUTCH-1269: - Hi Behnam. Can you please

[jira] [Created] (NUTCH-1271) Fix errors @ compile time

2012-02-08 Thread Lewis John McGibbney (Created) (JIRA)
Fix errors @ compile time - Key: NUTCH-1271 URL: https://issues.apache.org/jira/browse/NUTCH-1271 Project: Nutch Issue Type: Improvement Components: build Affects Versions: nutchgora, 1.5

[jira] [Updated] (NUTCH-1269) Generate main problems

2012-02-08 Thread behnam nikbakht (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] behnam nikbakht updated NUTCH-1269: --- Attachment: NUTCH-1269.patch Generate main problems --

[jira] [Updated] (NUTCH-1269) Generate main problems

2012-02-08 Thread behnam nikbakht (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] behnam nikbakht updated NUTCH-1269: --- Patch Info: Patch Available yes, thanks for your attention Generate main

[jira] [Commented] (NUTCH-1269) Generate main problems

2012-02-08 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203488#comment-13203488 ] Markus Jelsma commented on NUTCH-1269: -- It won't patch for trunk, all hunks fail.

[jira] [Commented] (NUTCH-1269) Generate main problems

2012-02-08 Thread behnam nikbakht (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203494#comment-13203494 ] behnam nikbakht commented on NUTCH-1269: i am using Nutch-1.3, and i know about

[jira] [Commented] (NUTCH-1269) Generate main problems

2012-02-08 Thread Markus Jelsma (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203501#comment-13203501 ] Markus Jelsma commented on NUTCH-1269: -- Ah, yes, i understand now. Your patch is an

[jira] [Commented] (NUTCH-1270) some of Deflate encoded pages not fetched

2012-02-08 Thread Lewis John McGibbney (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203515#comment-13203515 ] Lewis John McGibbney commented on NUTCH-1270: - Hi Benham, again thanks for

Fwd: Mandatory svnpubsub migration by Jan 2013

2012-02-08 Thread Lewis John Mcgibbney
Hi, Can anyone comment where we lie with this? I really don't have a clue. Thanks Lewis -- Forwarded message -- From: Joe Schaefer joe_schae...@yahoo.com Date: Wed, Feb 8, 2012 at 12:26 PM Subject: Mandatory svnpubsub migration by Jan 2013 To: Apache Infrastructure

tika-core, tika-parser

2012-02-08 Thread Markus Jelsma
Hi, Can anyone shed light on this? We don't have any parsers in our libs dir and we don't have tika-parsers jar, only the tika-core jar. Where are the parsers and how does this all work? I've posted a question (same subject) on the Tika list and Nick tells me there must be parsers somewhere.

Re: tika-core, tika-parser

2012-02-08 Thread Lewis John Mcgibbney
Hi Markus, For starters http://svn.apache.org/viewvc/nutch/trunk/src/plugin/parse-tika/ivy.xml?view=markup Can we pick our way through this? Thanks On Wed, Feb 8, 2012 at 12:50 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Can anyone shed light on this? We don't have any parsers

Re: Mandatory svnpubsub migration by Jan 2013

2012-02-08 Thread Julien Nioche
The Nutch site is already based on svnpubsub. On 8 February 2012 12:40, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi, Can anyone comment where we lie with this? I really don't have a clue. Thanks Lewis -- Forwarded message -- From: Joe Schaefer

Re: tika-core, tika-parser

2012-02-08 Thread Julien Nioche
The dependencies for the plugins are defined locally as shown in the URL below, where you can see the ref to tika-parsers for parse-tika. Is that more clear for you Markus? On 8 February 2012 12:58, Lewis John Mcgibbney lewis.mcgibb...@gmail.comwrote: Hi Markus, For starters

Re: tika-core, tika-parser

2012-02-08 Thread Markus Jelsma
Yes, it's listed there indeed! But where are the parser impls then? I'll check this out. I must be getting crazy or something! On Wednesday 08 February 2012 13:58:46 Lewis John Mcgibbney wrote: Hi Markus, For starters

Re: tika-core, tika-parser

2012-02-08 Thread Markus Jelsma
Yes, it looks like it! It should also be upgraded to Tika 1.0. But that's something else. dependencies, dependencies, dependencies :( On Wednesday 08 February 2012 14:04:26 Julien Nioche wrote: The dependencies for the plugins are defined locally as shown in the URL below, where you can

Re: tika-core, tika-parser

2012-02-08 Thread Julien Nioche
sorry don't understand what your issue is. We have a dependency on tika-parsers and the actual parser implementations (listed in tika parsers' POM) are pulled transitively just like any other dependency managed by Ivy. They end up being copied in runtime/local/plugins/parse-tika/ or put in the

Re: tika-core, tika-parser

2012-02-08 Thread Markus Jelsma
On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: sorry don't understand what your issue is. We have a dependency on tika-parsers and the actual parser implementations (listed in tika parsers' POM) are pulled transitively just like any other dependency managed by Ivy. They end up

Finding specific file types only -- *.ics files

2012-02-08 Thread Peter Jameson
Hi, I'm interested in using Nutch to crawl certain websites looking for only a specific file type, in my case I'm looking for any url that ends with a *.ics construct. I don't need to parse the ics files, I just need to know all the .ics files that exist. A list of links would be great. Can

Re: tika-core, tika-parser

2012-02-08 Thread Ken Krugler
On Feb 8, 2012, at 5:28am, Markus Jelsma wrote: On Wednesday 08 February 2012 14:22:36 Julien Nioche wrote: sorry don't understand what your issue is. We have a dependency on tika-parsers and the actual parser implementations (listed in tika parsers' POM) are pulled transitively just