Re: Replacing utf-8 characters

2005-10-05 Thread Martin v. Löwis
Mike wrote: > So it seems link.replace() function reads whether the first option is > utf-8 and converts the second option automatically to utf-8? How do I > prevent that? Not sure what an option is... if you are talking about parameters, rest assured that .replace does not know or care whether

Re: Replacing utf-8 characters

2005-10-05 Thread David Bolen
Mike <[EMAIL PROTECTED]> writes: > What you and I typed was ascii. The value of link came from importing > that utf-8 web page into that variable. That is why I think it is not > working. But not sure what the solution is. Are you sure you're asking what you think you are asking? Both the ampe

Re: Replacing utf-8 characters

2005-10-05 Thread Mike
In playing with this I found link.replace does work but when I use link.replace('&','&') it replaces it with & instead of just &. link.replace is working for me since if I changed the second option from & to something else I see the change. So it seems link.replace() function reads whether th

Re: Replacing utf-8 characters

2005-10-05 Thread Klaus Alexander Seistrup
Mike wrote: > Hi, I am using Python to scrape web pages and I do not have problem > unless I run into a site that is utf-8. It seems & is changed to > & when the site is utf-8. > > [...] > Any ideas? How about using the universal feedparser from feedparser.org to fetch and parse the RS

Re: Replacing utf-8 characters

2005-10-05 Thread Mike
Steve Holden wrote: >>> > You must be doing *something* wrong: > > >>> link = > "/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml" > > > >>> link = link.replace('&','&') > >>> link > '/news/newsArticle.aspx?type=businessNe

Re: Replacing utf-8 characters

2005-10-05 Thread Steve Holden
Unknown wrote: > For example this is what I am trying to do that is not working. > > The contents of link is the reuters web page, containing > > "/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml" > > link = link.replace('&','&'

Re: Replacing utf-8 characters

2005-10-05 Thread Mike
For example this is what I am trying to do that is not working. The contents of link is the reuters web page, containing "/news/newsArticle.aspx?type=businessNews&storyID=2005-10-05T151245Z_01_HO548006_RTRUKOC_0_UK-AIRLINES-BA.xml" link = link.replace('&','&') But if I now view the the

Re: Replacing utf-8 characters

2005-10-05 Thread Richard Brodie
"Mike" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > However when I pull it into python the URL ends up looking like this > (notice the & instead of just & in the URL) > > Any ideas? Some code would be helpful: the "&" is in the page source to start with (which is as it ought to

Replacing utf-8 characters

2005-10-05 Thread Mike
Hi, I am using Python to scrape web pages and I do not have problem unless I run into a site that is utf-8. It seems & is changed to & when the site is utf-8. If I try to replace it with .replace('&','&') it for some reason does not replace it. For example: http://today.reuters.co.uk/news/def