Hi, I'm having some problems using scrAPI. I'm getting some HTTPNoAccessErrors on certain urls.
The program searches a page (http://en.wikiquote.org/wiki/List_of_films) for all of the links on it that go to pages with movie quotes on them. It then loops through the list, pulling out the details from each page using this method: def self.scrapemovies Scraper::Base.parser :html_parser urlarray = Movie.findurls moviescraper = Scraper.define do process "h1", :name => :text process "p:nth-child(4)", :description => :text result :description, :name end urlarray.each do |url| fullurl = "http://en.wikiquote.org#{url}" movieurl = URI.parse(fullurl) data = moviescraper.scrape(movieurl) movie = Movie.new movie.url = fullurl movie.name = data.name movie.description = data.description movie.save end end This worked ok until it got to http://en.wikiquote.org/wiki/20,000_Leagues_Under_the_Sea which gave me the http error because it had a comma in the URL. I wrote a little bit of code in the Movie.findurl method that just stripped out any URLs with commas or parentheses in as a bodge just to get things working, but I'm even getting the error on this URL: http://en.wikiquote.org/wiki/27_Dresses which is very odd, because it worked fine on the previous one which was : http://en.wikiquote.org/wiki/25th_Hour. I can't see the difference between them - I've tried manually visiting the page, and it's fine. I'm assuming that I need to do some sort of cleverer parsing on the URLs (so that I can include the ones with commas and parentheses too). Is the Scraper::Base.parser :html_parser line got anything to do with it? I couldn't get the Tidy plugin to work properly, but I'm not sure that it's got anything to do with the URL parsing anyway. I'm totally stuck - thanks in advance for any help. Jules. -- Posted via http://www.ruby-forum.com/. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-t...@googlegroups.com. To unsubscribe from this group, send email to rubyonrails-talk+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.