Mike wrote:
> So it seems link.replace() function reads whether the first option is
> utf-8 and converts the second option automatically to utf-8? How do I
> prevent that?
Not sure what an option is... if you are talking about parameters,
rest assured that .replace does not know or care whether
Mike <[EMAIL PROTECTED]> writes:
> What you and I typed was ascii. The value of link came from importing
> that utf-8 web page into that variable. That is why I think it is not
> working. But not sure what the solution is.
Are you sure you're asking what you think you are asking? Both the
ampe
In playing with this I found link.replace does work but when I use
link.replace('&','&')
it replaces it with & instead of just &. link.replace is working
for me since if I changed the second option from & to something else I
see the change.
So it seems link.replace() function reads whether th
Mike wrote:
> Hi, I am using Python to scrape web pages and I do not have problem
> unless I run into a site that is utf-8. It seems & is changed to
> & when the site is utf-8.
>
> [...]
> Any ideas?
How about using the universal feedparser from feedparser.org to fetch
and parse the RS
Steve Holden wrote:
>>>
> You must be doing *something* wrong:
>
> >>> link =
> "/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml"
>
>
> >>> link = link.replace('&','&')
> >>> link
> '/news/newsArticle.aspx?type=businessNe
Unknown wrote:
> For example this is what I am trying to do that is not working.
>
> The contents of link is the reuters web page, containing
>
> "/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml"
>
> link = link.replace('&','&'
For example this is what I am trying to do that is not working.
The contents of link is the reuters web page, containing
"/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml"
link = link.replace('&','&')
But if I now view the the
"Mike" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
> However when I pull it into python the URL ends up looking like this
> (notice the & instead of just & in the URL)
>
> Any ideas?
Some code would be helpful: the "&" is in the page source to start
with (which is as it ought to
Hi, I am using Python to scrape web pages and I do not have problem
unless I run into a site that is utf-8. It seems & is changed to &
when the site is utf-8.
If I try to replace it with .replace('&','&') it for some reason
does not replace it.
For example: http://today.reuters.co.uk/news/def