Re: Updating Tika in Nutch

2011-07-20 Thread Mattmann, Chris A (388J)
Sorry guys I'm nutters! :) Cheers, Chris On Jul 20, 2011, at 1:39 AM, Julien Nioche wrote: Glad you managed to get it to work. I don't know what Chris meant by that, can;t see why we'd open a JIRA when we are already using the latest version Julien On 20 July 2011 08:19, Fernando Arreola

Re: Updating Tika in Nutch

2011-07-19 Thread Mattmann, Chris A (388J)
Hey Fernando, Would be great to get a JIRA issue and patch to bring Nutch 1.4-branch up to date with the latest Tika based on your experience. Thanks for your help! Cheers, Chris On Jul 19, 2011, at 4:48 PM, Fernando Arreola wrote: Hi, You were right, it is enough to provide the right

Re: Updating Tika in Nutch

2011-07-13 Thread Julien Nioche
You probably need to make sure that conf/tika-mimetypes.xml is the version you've modified and contains the clues for detecting afm files. BTW out of curiosity why did you have to modify tika-core.jar? Isn't it enough to provide the clues in tika-mimetypes.xml? Jul On 13 July 2011 01:16,

Re: Updating Tika in Nutch

2011-07-13 Thread Fernando Arreola
I did update the runtime/local/conf/tika-mimetypes.xml and my changes are there. I looked at the code for the ParserChecker and it seems to be doing its own content type detection using a Protocol call, so I am trying to set up Solr in hopes that it would work there (having some unix memory issues

Updating Tika in Nutch

2011-07-12 Thread Fernando Arreola
Hello, I have made some additions (a new parser) to the Apache Tika application and I am trying to see if I can run my new changes using the crawl mechanism in Nutch, but I am having some trouble updating Nutch with my modified Tika application. The Tika updates I made run fine if I run Tika as

Re: Updating Tika in Nutch

2011-07-12 Thread lewis john mcgibbney
Hi Fernando, One point for me to mention which I did not pick up from your post. Did you rebuild your Nutch dist after making the changes to include your new parser? I know that this is a pretty simple suggestion but hopefully it might be the right one. Also can you please provide more details

Re: Updating Tika in Nutch

2011-07-12 Thread Fernando Arreola
Hello, Thanks for the replies. I have started trying to use Nutch 1.3 after your suggestions, especially since I am using Tika 0.9, but I am not getting anywhere with it. I am able to build fine but whenever I try to run any command it gives the error stating that it cannot find C:\Program. For

Re: Updating Tika in Nutch

2011-07-12 Thread Fernando Arreola
Thanks, I really appreciate all the help. I used the ParserChecker and I could see the metadata my parser extracted! I have one more question though, I could only see the metadata my parser extracted if I used the -forceAs mimetype option. Otherwise it is detected as a text/plain file and my