Dear R-List, I'm trying to read an UTF-8-encoded text file which works fine under
##################################################################### ### CONFIG 1 > sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base running under Windows Server 2008. ### RESULT: > read.csv2("example.utf", fileEncoding="UTF-8") VARIABLE LABEL ORDER_IN_PROFILE 1 A Umlauts:äüö 45 2 B Umlauts:äüöß 35 > ##################################################################### The exact same command executed under R-2.14.0 (running under Windows 7) gives a different output: ##################################################################### ### CONFIG 2 > sessionInfo() R version 2.14.0 (2011-10-31) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.14.0 > ### RESULT: > read.csv2("example.utf", fileEncoding="UTF-8") #same command [1] X. <0 rows> (or 0-length row.names) Warning messages: 1: In read.table(file = file, header = header, sep = sep, quote = quote, : invalid input found on input connection 'example.utf' 2: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'example.utf' > ## same results with > read.csv2("example.utf", fileEncoding="UCS-2LE") > read.csv2("example.utf", fileEncoding="UTF-16LE") If I specify "encoding" instead of "fileEncoding", non-ascii-chars are displayed fine, but apparently the "UTF-8-bytes" are not stripped: ### RESULT: > read.csv2("example.utf", encoding="UTF-8") X.U.FEFF.VARIABLE LABEL ORDER_IN_PROFILE 1 A Umlauts:äüö 45 2 B Umlauts:äüöß 35 > ###################################################################### Any hints what I could do to reach the results from config 1 under config 2? Many thanks in advance, Christian
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.