I used the readDOC function in tm.
After storing the document locally on a Windows pc...
langren.sp.path <- "C:\\text\\" #store file by itself in this directory
langren.corpus <- (Corpus(DirSource(langren.sp.path), readerControl =
list(reader
When I open that link in OpenOffice.org Writer and then save in "Text
encoded" format with "Unicode" encoding, the diacriticals (is that the
correct font-ish term?) seem to remain intact wehn re-opended. When I
read that file in, not with scan() but with readLines(), here is what
I get for
2 matches
Mail list logo