Hi Udi,

I have tried it once again.
I had to change from  <br> to <br/> and from <img...> to <img...></img>.
"unescape" works now!

But it would be interesting to learn about more this problem. Could you
please answer Massimos questions?

2012/6/2 Massimo Di Pierro <massimo.dipie...@gmail.com>

> I am not sure there is an error here. Is the problem that the characters
> are not dissplayed properly? Are you using a custom layout? If so, is it
> setting the utf8 encoding or does it tell the browser it is latin1?
>
> On Friday, 1 June 2012 15:45:45 UTC-5, Udi Milo wrote:
>>
>> It does, but not completely,
>>
>> As it turns out what I copy pasted was part of it, and your function does
>> work perfectly. when I try to run it on the entire text, I get errors that
>> I can't figure our, maybe you can help me once more?
>> here is the complete text:
>>
>> <div><div class="post">
>>
>>         <div dir="rtl">...
>>
>
>>                     </div>
>>
>> Thanks!
>>
>>
>> On Friday, June 1, 2012 1:45:32 AM UTC-4, mweissen wrote:
>>>
>>> I have found at 
>>> http://wiki.python.org/moin/**EscapingXml<http://wiki.python.org/moin/EscapingXml>
>>> :
>>>
>>> import xml.parsers.expat
>>>
>>> def unescape(s):
>>>     want_unicode = False
>>>     if isinstance(s, unicode):
>>>         s = s.encode("utf-8")
>>>         want_unicode = True
>>>
>>>     # the rest of this assumes that `s` is UTF-8
>>>     list = []
>>>
>>>     # create and initialize a parser object
>>>     p = xml.parsers.expat.**ParserCreate("utf-8")
>>>     p.buffer_text = True
>>>     p.returns_unicode = want_unicode
>>>     p.CharacterDataHandler = list.append
>>>
>>>     # parse the data wrapped in a dummy element
>>>     # (needed so the "document" is well-formed)
>>>     p.Parse("<e>", 0)
>>>     p.Parse(s, 0)
>>>     p.Parse("</e>", 1)
>>>
>>>     # join the extracted strings and return
>>>     es = ""
>>>     if want_unicode:
>>>         es = u""
>>>     return es.join(list)
>>>
>>> With
>>>
>>> t="""&#x5DE;&#x5E4;&#x5EA;&#**x5D7;&#x5D9;&#x5DD;
>>> &#x5E8;&#x5D1;&#x5D9;&#x5DD; &#x5DE;&#x5D1;&#x5E7;&#x5E9;&#**x5D9;&#x5DD;
>>> &#x5D0;&#x5EA; &#x5E2;&#x5D6;&#x5E8;&#x5EA;&#**x5D9;
>>> &#x5D1;&#x5E4;&#x5EA;&#x5E8;&#**x5D5;&#x5DF;
>>> &#x5D1;&#x5E2;&#x5D9;&#x5D5;&#**x5EA; &#x5E9;&#x5DC;
>>> &#x5D1;&#x5D9;&#x5E6;&#x5D5;&#**x5E2;&#x5D9; Visual Studio.
>>> \n&#x5D1;&#x5D3;&#x201D;&#**x5DB; &#x5D0;&#x5EA; &#x5E8;&#x5D5;&#x5D1;
>>> &#x5D4;&#x5D1;&#x5E2;&#x5D9;&#**x5D5;&#x5EA;
>>> &#x5E0;&#x5D9;&#x5EA;&#x5DF; &#x5DC;&#x5E4;&#x5EA;&#x5D5;&#**x5E8;
>>> &#x5D9;&#x5D7;&#x5E1;&#x5D9;&#**x5EA; &#x5D1;&#x5E7;&#x5DC;&#x5D5;&#**x5EA;,
>>> \n&#x5D5;&#x5DB;&#x5DB;&#x5DC; &#x5E9;&#x5E2;&#x5D5;&#x5D1;&#**x5E8;
>>> &#x5D4;&#x5D6;&#x5DE;&#x5DF; &#x5D0;&#x5E0;&#x5D9;
>>> &#x5DE;&#x5D5;&#x5E6;&#x5D0; &#x5D0;&#x5EA; &#x5E2;&#x5E6;&#x5DE;&#x5D9;
>>> &#x5DE;&#x5E1;&#x5E4;&#x5E7; &#x5E4;&#x5D7;&#x5D5;&#x5EA; &#x5D0;&#x5D5;
>>> &#x5D9;&#x5D5;&#x5EA;&#x5E8; &#x5D0;&#x5EA; &#x5D0;&#x5D5;&#x5EA;&#x5DF;
>>> &#x5D4;&#x5EA;&#x5E9;&#x5D5;&#**x5D1;&#x5D5;&#x5EA;, \n&#x5DE;&#x5D4;
>>> &#x5E9;&#x5D2;&#x5E8;&#x5DD; &#x5DC;&#x5D9; &#x5DC;&#x5D7;&#x5E9;&#x5D5;&#
>>> **x5D1; &#x5E9;&#x5DB;&#x5E0;&#x5E8;&#**x5D0;&#x5D4;
>>> &#x5D4;&#x5D2;&#x5D9;&#x5E2; &#x5D4;&#x5D6;&#x5DE;&#x5DF;
>>> &#x5DC;&#x5D4;&#x5E2;&#x5DC;&#**x5D5;&#x5EA;
>>> &#x5D0;&#x5D5;&#x5EA;&#x5DF; &#x5D1;&#x5E6;&#x5D5;&#x5E8;&#**x5D4;
>>> &#x5DE;&#x5E1;&#x5D5;&#x5D3;&#**x5E8;&#x5EA;
>>> &#x5DC;&#x5E4;&#x5D5;&#x5E1;&#**x5D8;."""
>>> print unescape (t)
>>>
>>> the result is
>>>
>>> מפתחים רבים מבקשים את עזרתי בפתרון בעיות של ביצועי Visual Studio.
>>> בד”כ את רוב הבעיות ניתן לפתור יחסית בקלות,
>>> וככל שעובר הזמן אני מוצא את עצמי מספק פחות או יותר את אותן התשובות,
>>> מה שגרם לי לחשוב שכנראה הגיע הזמן להעלות אותן בצורה מסודרת לפוסט.
>>>
>>> I hope it helps.
>>> Regards Martin
>>>
>>> 2012/6/1 Udi Milo <udim...@gmail.com>
>>>
>>>> part of my product receives user text, saves it and shows it later.
>>>>
>>>> one of my users added a hebrew text attached below and I do not know
>>>> how to translate it into letter instead of hex.
>>>> simple text.encode('UTF-8') doesn't work, and I am far from being an
>>>> expert in the subject. can someone help me out?
>>>>
>>>> see attached text:
>>>>
>>>> &#x5DE;&#x5E4;&#x5EA;&#x5D7;&#**x5D9;&#x5DD;
>>>> &#x5E8;&#x5D1;&#x5D9;&#x5DD; &#x5DE;&#x5D1;&#x5E7;&#x5E9;&#**x5D9;&#x5DD;
>>>> &#x5D0;&#x5EA; &#x5E2;&#x5D6;&#x5E8;&#x5EA;&#**x5D9;
>>>> &#x5D1;&#x5E4;&#x5EA;&#x5E8;&#**x5D5;&#x5DF;
>>>> &#x5D1;&#x5E2;&#x5D9;&#x5D5;&#**x5EA; &#x5E9;&#x5DC;
>>>> &#x5D1;&#x5D9;&#x5E6;&#x5D5;&#**x5E2;&#x5D9; Visual Studio.
>>>> &#x5D1;&#x5D3;&#x201D;&#x5DB; &#x5D0;&#x5EA; &#x5E8;&#x5D5;&#x5D1;
>>>> &#x5D4;&#x5D1;&#x5E2;&#x5D9;&#**x5D5;&#x5EA;
>>>> &#x5E0;&#x5D9;&#x5EA;&#x5DF; &#x5DC;&#x5E4;&#x5EA;&#x5D5;&#**x5E8;
>>>> &#x5D9;&#x5D7;&#x5E1;&#x5D9;&#**x5EA; &#x5D1;&#x5E7;&#x5DC;&#x5D5;&#**
>>>> x5EA;,
>>>> &#x5D5;&#x5DB;&#x5DB;&#x5DC; &#x5E9;&#x5E2;&#x5D5;&#x5D1;&#**x5E8;
>>>> &#x5D4;&#x5D6;&#x5DE;&#x5DF; &#x5D0;&#x5E0;&#x5D9;
>>>> &#x5DE;&#x5D5;&#x5E6;&#x5D0; &#x5D0;&#x5EA; &#x5E2;&#x5E6;&#x5DE;&#x5D9;
>>>> &#x5DE;&#x5E1;&#x5E4;&#x5E7; &#x5E4;&#x5D7;&#x5D5;&#x5EA; &#x5D0;&#x5D5;
>>>> &#x5D9;&#x5D5;&#x5EA;&#x5E8; &#x5D0;&#x5EA; &#x5D0;&#x5D5;&#x5EA;&#x5DF;
>>>> &#x5D4;&#x5EA;&#x5E9;&#x5D5;&#**x5D1;&#x5D5;&#x5EA;,
>>>> &#x5DE;&#x5D4; &#x5E9;&#x5D2;&#x5E8;&#x5DD; &#x5DC;&#x5D9;
>>>> &#x5DC;&#x5D7;&#x5E9;&#x5D5;&#**x5D1; 
>>>> &#x5E9;&#x5DB;&#x5E0;&#x5E8;&#**x5D0;&#x5D4;
>>>> &#x5D4;&#x5D2;&#x5D9;&#x5E2; &#x5D4;&#x5D6;&#x5DE;&#x5DF;
>>>> &#x5DC;&#x5D4;&#x5E2;&#x5DC;&#**x5D5;&#x5EA;
>>>> &#x5D0;&#x5D5;&#x5EA;&#x5DF; &#x5D1;&#x5E6;&#x5D5;&#x5E8;&#**x5D4;
>>>> &#x5DE;&#x5E1;&#x5D5;&#x5D3;&#**x5E8;&#x5EA;
>>>> &#x5DC;&#x5E4;&#x5D5;&#x5E1;&#**x5D8;.
>>>>
>>>
>>>
>>>
>>>

Reply via email to