Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2013-02-22 Thread Lawr Eskin
I thiknk that I have to install Linux on VM... There is a shortest way by the way, could you please advise how to rebuild 'XML' package for R with latest libxml sources? Who may do that? or is it possible to build the new R package based on another non-C sorced parsers based like on PyPY, erlang a

Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2013-02-22 Thread Milan Bouchet-Valat
Le jeudi 21 février 2013 à 18:53 +0400, Lawr Eskin a écrit : > iconv trued before in various try, same issue and result with encoding > = unknown > now try sub - same issue This procedure works on Linux, but not on Windows: library(RCurl) library(XML) u <- "http://www.cian.ru/cat.php?deal_type=2&o

Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2013-02-21 Thread Lawr Eskin
Hi Milan, a <- getURL(con, .encoding = "UTF-8") Encoding(a) > [1] "UTF-8" a # Here - the UTF-8 codes looks like fine. htmlParse(a, encoding = "UTF-8") ###again same encoding issue >>why didn't getURL() detect and set a's encoding correctly? I think there are page issue because another sites works

Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2013-02-21 Thread Lawr Eskin
iconv trued before in various try, same issue and result with encoding = unknown now try sub - same issue 2013/2/21 Milan Bouchet-Valat > Le jeudi 21 février 2013 à 18:31 +0400, Lawr Eskin a écrit : > > Hi Milan, > > > > a <- getURL(con, .encoding = "UTF-8") > > Encoding(a) > > > [1] "UTF-8" >

Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2013-02-21 Thread Lawr Eskin
Hi Milan! > Encoding(a)[1] "unknown" 2013/2/21 Milan Bouchet-Valat > >> Le jeudi 21 février 2013 à 13:16 +0400, Lawr Eskin a écrit : >> > Hello dear R-help mailing list. >> > >> > >> > Looks like the same issue in Russian: >> > >> > >> > >> > library(RCurl) >> > >> > library(XML) >> > >> >

Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2013-02-21 Thread Milan Bouchet-Valat
Le jeudi 21 février 2013 à 18:31 +0400, Lawr Eskin a écrit : > Hi Milan, > > a <- getURL(con, .encoding = "UTF-8") > Encoding(a) > > [1] "UTF-8" > a # Here - the UTF-8 codes looks like fine. > htmlParse(a, encoding = "UTF-8") ###again same encoding issue And what if you try this: a2 <- htmlPars

Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2013-02-21 Thread Milan Bouchet-Valat
Le jeudi 21 février 2013 à 13:16 +0400, Lawr Eskin a écrit : > Hello dear R-help mailing list. > > > Looks like the same issue in Russian: > > > > library(RCurl) > > library(XML) > > u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1"; > > a = getURL(u) > > a # Here - the Russia

Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2012-01-30 Thread Duncan Temple Lang
With some off-line interaction and testing by Tal, the latest version of the XML package (3.9-4) should resolve these issues. So the encoding from the document is used in more cases as the default. It is often important to specify the encoding for HTML files in the call to htmlParse() and use "UTF

[R] Getting htmlParse to work with Hebrew? (on windows)

2012-01-30 Thread Tal Galili
Hello dear R-help mailing list. I wish to be able to have htmlParse work well with Hebrew, but it keeps to scramble the Hebrew text in pages I feed into it. For example: # why can't I parse the Hebrew correctly? library(RCurl) library(XML) u = "http://humus101.com/?p=2737"; a = getURL(u) a #