[R] Extracting a data.frame from HTML code

Ethan Pew Sat, 12 Apr 2008 15:48:30 -0700

Dear all,

I'd like to use R to read in data from the web. I need some help finding an
efficient way to strip the HTML tags and reformat the data as a data.frame
to analyze in R.


I'm currently using readLines() to read in the HTML code and then grep() to
isolate the block of HTML code I want from each page, but this may not be
the best approach.

A short example:
x1 <- readLines("
http://www.nascar.com/races/cup/2007/1/data/standings_official.html",n=-1)

grep1 <- grep("<table",x1,value=FALSE)
grep2 <- grep("</table>",x1,value=FALSE)

block1 <- x1[grep1:grep2]


It seems like there should be a straightforward solution to extract a
data.frame from the HTML code (especially since the data is already
formatted as a table) but I haven't had any luck in my searches so far.
Ultimately I'd like to compile several datasets from multiple webpages and
websites, and I'm optimistic that I can use R to automate the process.  If
someone could point me in the right direction, that would be fantastic.

Many thanks in advance,
Ethan



Ethan Pew
Doctoral Candidate, Marketing
Leeds School of Business
University of Colorado at Boulder

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting a data.frame from HTML code

Reply via email to