[Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-22 Thread Aristotle Pagaltzis
* Marc SCHAEFER [2009-11-21 23:30]: > After investigating, the Content-Length: is one off per non > 7-bit character. As if the standard iso-8859-1 byte stream was > sent as is, but was, internally converted to UTF-8 just for > generating a wrong byte count. Very strange. Normally that > process sh

[Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-22 Thread Aristotle Pagaltzis
Hi Marc, * Marc SCHAEFER [2009-11-22 15:05]: > On Sun, Nov 22, 2009 at 02:10:29PM +0100, Aristotle Pagaltzis wrote: > > As a quick fix, you want to utf8::downgrade the $c->res->body > > at the last moment before emitting the data to the wire. > > Interestingly, the data arrives on the other side

[Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Aristotle Pagaltzis
* Carl Johnstone [2009-11-23 15:35]: > Aristotle Pagaltzis wrote: > > # everything should be bytes at this point, but just in case > > $response->content_length( bytes::length( $response->body ) ); > > > > I was shocked to discover this! Any code that uses > > bytes::length is automaticall

[Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Aristotle Pagaltzis
* Marc SCHAEFER [2009-11-23 17:20]: > On Mon, Nov 23, 2009 at 07:42:06AM -0800, Bill Moseley wrote: > >Still not following. You are talking about Catalyst::View::TT? > > It appears that the latin1 -> htmlentities conversion is done > by View:TT's htmlentity, e.g.: > >[% FOREACH h IN cols %][

[Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Aristotle Pagaltzis
* Carl Johnstone [2009-11-23 18:50]: > Aristotle Pagaltzis wrote: > > Please plese don’t make statements like “not in this case” > > without knowing what the thing you are talking about does, > > i.e. in this case bytes::length, does. There are enough > > misconceptions about Unicode in Perl alrea

[Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Aristotle Pagaltzis
* Bill Moseley [2009-11-23 20:10]: > I'd argue that when it's time to set the length it should die > if utf8 flag is still set. I’m of two minds about this… it may well be that a string is correctly encoded but has gotten upgraded, and such a string will produce the right output anyhow. I don’t k

[Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-08 Thread Aristotle Pagaltzis
* Jonathan Rockway [2009-12-08 06:40]: > Basically, if you are doing things right, this code will cause > no harm Yes it will, in some cases. > (as the string will be an octet stream There is no such thing as an octet stream in Perl. There are only strings, and strings are sequences of arbitrar

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Carl Johnstone
Aristotle Pagaltzis wrote: > # everything should be bytes at this point, but just in case > $response->content_length( bytes::length( $response->body ) ); > > I was shocked to discover this! Any code that uses bytes::length > is automatically broken. Not in this case, the HTTP spec says th

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Carl Johnstone
Aristotle Pagaltzis wrote: > But there’s no room for “likelies” here: that’s programming by > coincidence. The "likely" was correct. When using UTF-8 whether the length of the string is different in bytes and characters depends entirely on what the contents of the string are. Given a particular

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread J. Shirley
On Mon, Nov 23, 2009 at 8:34 AM, Aristotle Pagaltzis wrote: > > [huge snip] > Aristotle++ This was a fantastic explanation with examples. Even though I *think* I understand the unicode issues in perl, I still can find myself getting confused. These examples just help that. Thanks for this, i

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Bill Moseley
On Mon, Nov 23, 2009 at 10:14 AM, Aristotle Pagaltzis wrote: > > > > Encode the string to the destination encoding (not just character > set), so that the string represents an encoded octet stream, and > then look at the plain old character length of that string. That > will always give you the ri

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Marc SCHAEFER
On Mon, Nov 23, 2009 at 05:43:25PM +0100, Aristotle Pagaltzis wrote: > If you use the `html` filter instead of `html_entity`, it will > escape only the five characters that have to be. Thank you. It works like a charm. > I had an IRC convo with Tomas Doran last night and explained the > problem t

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-23 Thread Tomas Doran
On 23 Nov 2009, at 18:24, Marc SCHAEFER wrote: On Mon, Nov 23, 2009 at 05:43:25PM +0100, Aristotle Pagaltzis wrote: I had an IRC convo with Tomas Doran last night and explained the problem to him. He knocked out some tests for the broken Thank you for your time! It's nice to see the respo

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-30 Thread Tomas Doran
On 23 Nov 2009, at 22:20, Tomas Doran wrote: On 23 Nov 2009, at 18:24, Marc SCHAEFER wrote: On Mon, Nov 23, 2009 at 05:43:25PM +0100, Aristotle Pagaltzis wrote: I had an IRC convo with Tomas Doran last night and explained the problem to him. He knocked out some tests for the broken Than

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-11-30 Thread Tomas Doran
Apologies for replying to myself - late night and I'm getting confused. On 1 Dec 2009, at 02:21, Tomas Doran wrote: Any chance of a confirmation that this is fixed in Catalyst for you (or not)? http://search.cpan.org/CPAN/authors/id/B/BO/BOBTFISH/Catalyst-Runtime-5.80014_01.tar.gz http://s

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-07 Thread Jonathan Rockway
Sorry to dig up a very old thread, but I am very behind on email and wanted to comment :) * On Sun, Nov 22 2009, Aristotle Pagaltzis wrote: > So I went thrawling the Catalyst sources and found what appears > to be the offending line. From finalize_headers in Catalyst.pm: > > # everything shou

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-08 Thread Bill Moseley
On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis wrote: > > There is no such thing as an octet stream in Perl. There are only > strings, and strings are sequences of arbitrarily large integers. > Help me out here. What I've stuck in my mind is that the poorly-named utf8 flag on Perl strings

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-08 Thread Tomas Doran
On 8 Dec 2009, at 05:34, Jonathan Rockway wrote: Sorry to dig up a very old thread, but I am very behind on email and wanted to comment :) No problem. Your insight as to why things are they way they are is useful :) I was shocked to discover this! Any code that uses bytes::length is auto

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-08 Thread Jonathan Rockway
* On Tue, Dec 08 2009, Bill Moseley wrote: > On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis wrote: > > There is no such thing as an octet stream in Perl. There are only > strings, and strings are sequences of arbitrarily large integers. > > Help me out here. > > What I've stuck in my

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-08 Thread Jonathan Rockway
* On Tue, Dec 08 2009, Aristotle Pagaltzis wrote: > This will work only if the string is using one of the two kinds > of internal representation but not in the other. Exactly my point. > The case the OP had was that he wanted to send Latin-1 and his > strings contained sequences of Latin-1 cha

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-08 Thread Jonathan Rockway
* On Tue, Dec 08 2009, Tomas Doran wrote: > On 8 Dec 2009, at 05:34, Jonathan Rockway wrote: >> Sorry to dig up a very old thread, but I am very behind on email and >> wanted to comment :) > > No problem. Your insight as to why things are they way they are is > useful :) > >>> >>> I was shocked to

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-08 Thread Bill Moseley
On Tue, Dec 8, 2009 at 7:05 PM, Jonathan Rockway wrote: > * On Tue, Dec 08 2009, Bill Moseley wrote: > > On Tue, Dec 8, 2009 at 12:26 AM, Aristotle Pagaltzis > wrote: > > > > There is no such thing as an octet stream in Perl. There are only > > strings, and strings are sequences of arbit

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-08 Thread Bill Moseley
On Tue, Dec 8, 2009 at 8:32 PM, Bill Moseley wrote: > > The UTF8 flag doesn't mean anything more than >> any of the other SV flags. > > > But the flag on indicates the the string was decoded. > Obviously, that's not the only way to get that flag set. What I meant was if the flag is on I'm pret

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-10 Thread Wade Stuart
On Wed, Dec 9, 2009 at 1:27 AM, Bill Moseley wrote: > > > On Tue, Dec 8, 2009 at 8:32 PM, Bill Moseley wrote: > >> >> The UTF8 flag doesn't mean anything more than >>> any of the other SV flags. >> >> >> But the flag on indicates the the string was decoded. >> > > Obviously, that's not the only

Re: [Catalyst] Re: Avoiding UTF8 in Catalyst

2009-12-10 Thread Tomas Doran
Wade Stuart wrote: How about making it skip that code for default behavior and a config var check to re-enable the backcompat behavior with bytes::length. I think you're confused. The bytes::length thing is already fixed (in 015). It's the having upgraded characters in a header issue which is