I thiknk that I have to install Linux on VM... There is a shortest way
by the way, could you please advise how to rebuild 'XML' package for R with
latest libxml sources? Who may do that?
or is it possible to build the new R package based on another non-C sorced
parsers based like on PyPY, erlang a
Le jeudi 21 février 2013 à 18:53 +0400, Lawr Eskin a écrit :
> iconv trued before in various try, same issue and result with encoding
> = unknown
> now try sub - same issue
This procedure works on Linux, but not on Windows:
library(RCurl)
library(XML)
u <- "http://www.cian.ru/cat.php?deal_type=2&o
Hi Milan,
a <- getURL(con, .encoding = "UTF-8")
Encoding(a)
> [1] "UTF-8"
a # Here - the UTF-8 codes looks like fine.
htmlParse(a, encoding = "UTF-8") ###again same encoding issue
>>why didn't getURL() detect and set a's encoding correctly?
I think there are page issue because another sites works
iconv trued before in various try, same issue and result with encoding =
unknown
now try sub - same issue
2013/2/21 Milan Bouchet-Valat
> Le jeudi 21 février 2013 à 18:31 +0400, Lawr Eskin a écrit :
> > Hi Milan,
> >
> > a <- getURL(con, .encoding = "UTF-8")
> > Encoding(a)
> > > [1] "UTF-8"
>
Hi Milan!
> Encoding(a)[1] "unknown"
2013/2/21 Milan Bouchet-Valat
>
>> Le jeudi 21 février 2013 à 13:16 +0400, Lawr Eskin a écrit :
>> > Hello dear R-help mailing list.
>> >
>> >
>> > Looks like the same issue in Russian:
>> >
>> >
>> >
>> > library(RCurl)
>> >
>> > library(XML)
>> >
>> >
Le jeudi 21 février 2013 à 18:31 +0400, Lawr Eskin a écrit :
> Hi Milan,
>
> a <- getURL(con, .encoding = "UTF-8")
> Encoding(a)
> > [1] "UTF-8"
> a # Here - the UTF-8 codes looks like fine.
> htmlParse(a, encoding = "UTF-8") ###again same encoding issue
And what if you try this:
a2 <- htmlPars
Le jeudi 21 février 2013 à 13:16 +0400, Lawr Eskin a écrit :
> Hello dear R-help mailing list.
>
>
> Looks like the same issue in Russian:
>
>
>
> library(RCurl)
>
> library(XML)
>
> u = " http://www.cian.ru/cat.php?deal_type=2&obl_id=1&room1=1";
>
> a = getURL(u)
>
> a # Here - the Russia
With some off-line interaction and testing by Tal, the latest
version of the XML package (3.9-4) should resolve these issues.
So the encoding from the document is used in more cases as the default.
It is often important to specify the encoding for HTML files in
the call to htmlParse() and use "UTF
Hello dear R-help mailing list.
I wish to be able to have htmlParse work well with Hebrew, but it keeps to
scramble the Hebrew text in pages I feed into it.
For example:
# why can't I parse the Hebrew correctly?
library(RCurl)
library(XML)
u = "http://humus101.com/?p=2737";
a = getURL(u)
a #
9 matches
Mail list logo