Dear all, I'd like to use R to read in data from the web. I need some help finding an efficient way to strip the HTML tags and reformat the data as a data.frame to analyze in R.
I'm currently using readLines() to read in the HTML code and then grep() to isolate the block of HTML code I want from each page, but this may not be the best approach. A short example: x1 <- readLines(" http://www.nascar.com/races/cup/2007/1/data/standings_official.html",n=-1) grep1 <- grep("<table",x1,value=FALSE) grep2 <- grep("</table>",x1,value=FALSE) block1 <- x1[grep1:grep2] It seems like there should be a straightforward solution to extract a data.frame from the HTML code (especially since the data is already formatted as a table) but I haven't had any luck in my searches so far. Ultimately I'd like to compile several datasets from multiple webpages and websites, and I'm optimistic that I can use R to automate the process. If someone could point me in the right direction, that would be fantastic. Many thanks in advance, Ethan Ethan Pew Doctoral Candidate, Marketing Leeds School of Business University of Colorado at Boulder [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.