On Fri, Mar 2, 2012 at 9:11 AM, Craig A. Berry <craigbe...@mac.com> wrote:

> On Mar 2, 2012, at 3:07 AM, Eric Brine wrote:
> > On Thu, Mar 1, 2012 at 6:17 PM, Craig A. Berry <craigbe...@mac.com>
> wrote:
> > What happens on Unix when you have a pipe buffer that is 8192 bytes and
> you set $/ to 8193 and read a record containing UTF-8 data through the pipe?
> >
> > Perl requests 8K (formerly 4K) chunks until it has received enough. It
> requests 8K even if it only needs 1 byte.
>
> I think you're thinking of the PerlIO buffer that I increased from 4K to
> the larger of 8K and BUFSIZ in 5.14,


I'm not "thinking" anything. I'm reporting what Perl does as seen by
C<strace>.


> and which only applies to the perlio layer.


Yes, I'm only reporting what happens on a standard unix build. But isn't
that what you asked?

I was thinking of a situation where something external to Perl limits how
> much data you can get in one read and thus gives you less than the full
> amount requested by $/.


That's exactly the situation I described. Here, let me provide the strace
output.

$ strace perl -e'$/=\40; <>;' < /dev/random
...
read(0, "\5|\200\"\360T0*\325\223\276\322\20S\244\16\341", 8192) = 17
read(0, "\370\356 \2652\236\27>", 8192) = 8
read(0, "\0\270\ve\332\223\225\312", 8192) = 8
read(0, "\316\366\272\311\215.\204\361", 8192) = 8
...


>  I'm pretty sure you'll get mangled UTF-8 if you happen to be
> mid-character when you hit the end of the device buffer.


No, because Perl will just ask for more. You'll get mangled UTF-8 if you
happen to request a number of bytes that ends you mid-character (which is
what this ticket is about).

(If we were talking about sysread instead of readline or read, then yes, it
could happen then. Unlike read and readline, sysread returns as soon as
bytes are available.)

- Eric

Reply via email to