Re: Plugin development in Eclipse (Re: [nutch 0.5] frames)

2005-07-08 Thread Philipp Suter
Stefan Groschupf wrote: Code can be found here: http://cvs.sourceforge.net/viewcvs.py/tiniplug/nutch-extractors/src/ net/nutch/extractor/FlashExtractor.java?rev=1.3&view=markup libs can be found here: http://cvs.sourceforge.net/viewcvs.py/tiniplug/nutch-extractors/libs/ Please note that java

Re: Plugin development in Eclipse (Re: [nutch 0.5] frames)

2005-07-08 Thread Philipp Suter
Andrzej Bialecki wrote: Philipp Suter wrote: I would have some spare cycles starting end of july until end of august.. but I would need some short explanation where and how to integrate the flash text extractor. furthermore is there any document, whatsoever explaining the nutch deign

Re: [nutch 0.5] frames

2005-07-08 Thread Philipp Suter
Andrzej Bialecki wrote: Have you ever thought about integrating a javascript interpreter into nutch? this could be another big step thowards a wider range of crawlable websites. If you need any help on this I would be very much interested to support anybody (timewise) implementing such a

Re: [nutch 0.5] frames

2005-07-08 Thread Philipp Suter
Vacuum Joe wrote: Have you evaluated flash either? is it possible to parse it? Yes, definitely: http://swift-tools.net/Flash/ Obviously, it's a non-trivial amount of work to take the basic ideas from that and port it into Java. However, we're only interested in grabbing text, so it's

Re: [nutch 0.5] frames

2005-07-08 Thread Philipp Suter
Andrzej Bialecki wrote: Philipp Suter wrote: does anybody know how to crawl frames? Or how to extend nutch to be able to crawl frames? We are using the api. The development version (available from SVN) should handle frames just fine, i.e. it should follow the src=... attributed in frames

[nutch 0.5] frames

2005-07-07 Thread Philipp Suter
does anybody know how to crawl frames? Or how to extend nutch to be able to crawl frames? We are using the api. cheers ph