Re: RSS-fecter and index individul-how can i realize this function

2009-01-05 Thread Doğacan Güney
On Mon, Jan 5, 2009 at 7:00 AM, Vlad Cananau vlad...@gmail.com wrote: Hello I'm trying to make RSSParser do something simmilar to FeedParser (which doesn't work quite right) - that is, instead of indexing the whole contents Why doesn't FeedParser work? Let's fix whatever is broken in it :D

nutch segment format

2009-01-05 Thread Matt Pearson
Hi Everyone, I'm looking into reading data from Nutch segments with PHP is there anywhere where I can get information on the format in which the data is stored? Thanks and apologies if this isn't the right place to ask this question. Matt Pearson

Re: nutch segment format

2009-01-05 Thread Todd Lipcon
Hi Matt, The nutch segments are stored as Hadoop SequenceFiles and MapFiles. MapFile is made up of multiple SequenceFiles. I'm not certain if the format is documented anywhere, but the source is in org.apache.hadoop.io. I doubt you'll find a PHP library for reading them, so you'll probably have

Re: RSS-fecter and index individul-how can i realize this function

2009-01-05 Thread Vlad Cananau
On Mon, Jan 5, 2009 at 12:32 PM, Doğacan Güney doga...@gmail.com wrote: On Mon, Jan 5, 2009 at 7:00 AM, Vlad Cananau vlad...@gmail.com wrote: Hello I'm trying to make RSSParser do something simmilar to FeedParser (which doesn't work quite right) - that is, instead of indexing the whole

Re: RSS-fecter and index individul-how can i realize this function

2009-01-05 Thread Vlad Cananau
Doğacan Güney wrote: On Mon, Jan 5, 2009 at 7:00 AM, Vlad Cananau vlad...@gmail.com wrote: Hello I'm trying to make RSSParser do something simmilar to FeedParser (which doesn't work quite right) - that is, instead of indexing the whole contents Why doesn't FeedParser work? Let's fix

Site update

2009-01-05 Thread Otis Gospodnetic
Hello, Quick heads up - I'm about to regenerate the files (HTML + PDF) for the site and update it tomorrow according to the instructions on http://wiki.apache.org/nutch/Website_Update_HOWTO . I have Forrest 0.8, and the site files were last generated with Forrest 0.7, so there will be some

Re: Site update

2009-01-05 Thread Otis Gospodnetic
One more thing. Forrest 0.8 wouldn't generate site files without me making the following change (so I'll commit this, too, unless somebody thinks this is bad): $ svn diff src/site Index: src/site/forrest.properties === ---

Re: Site update

2009-01-05 Thread Otis Gospodnetic
Below is what it spits out. I'm not sure what the cause is. I did try forrest seed forrest validate as prescribed at https://issues.apache.org/jira/browse/FOR-984?focusedCommentId=12649593#action_12649593 , but forrest validate failed. validate-sitemap:

Re: Site update

2009-01-05 Thread Dennis Kubes
http://www.mail-archive.com/d...@forrest.apache.org/msg15136.html This might help. Dennis Andrzej Bialecki wrote: Otis Gospodnetic wrote: Below is what it spits out. I'm not sure what the cause is. I did try forrest seed forrest validate as prescribed at

Build failed in Hudson: Nutch-trunk #684

2009-01-05 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/684/changes -- [...truncated 6523 lines...] [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.95 sec init: init-plugin: deps-jar: init: init-plugin: deps-jar: compile: