On 26.04.2017 03:55, josef.p...@gmail.com wrote:
> On Tue, Apr 25, 2017 at 9:27 PM, Charles R Harris
> wrote:
>>
>>
>> On Tue, Apr 25, 2017 at 5:50 PM, Robert Kern wrote:
>>>
>>> On Tue, Apr 25, 2017 at 3:47 PM, Chris Barker - NOAA Federal
>>> wrote:
>>>
> Presumably you're getting byte stri
On Wed, Apr 26, 2017 at 7:20 AM Stephan Hoyer wrote:
> On Tue, Apr 25, 2017 at 9:21 PM Robert Kern wrote:
>
>> On Tue, Apr 25, 2017 at 6:27 PM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>> > The maximum length of an UTF-8 character is 4 bytes, so we could use
>> that to size arr
On Wed, Apr 26, 2017 at 3:15 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
> On 26.04.2017 03:55, josef.p...@gmail.com wrote:
> > On Tue, Apr 25, 2017 at 9:27 PM, Charles R Harris
> > wrote:
> >>
> >>
> >> On Tue, Apr 25, 2017 at 5:50 PM, Robert Kern
> wrote:
> >>>
> >>> On Tue, Apr
> I think we can implement viewers for strings as ndarray subclasses. Then one
> could
> do `my_string_array.view(latin_1)`, and so on. Essentially that just
> changes the default
> encoding of the 'S' array. That could also work for uint8 arrays if needed.
>
> Chuck
To handle structured data-typ
> > I DO recommend Latin-1 As a default encoding ONLY for "mostly ascii, with
> > a few extra characters" data. With all the sloppiness over the years, there
> > are way to many files like that.
>
> That sloppiness that you mention is precisely the "unknown encoding" problem.
Exactly -- but fro
On Wed, Apr 26, 2017 at 2:15 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
> Indeed,
> Most of this discussion is irrelevant to numpy.
> Numpy only really deals with the in memory storage of strings. And in
> that it is limited to fixed length strings (in bytes/codepoints).
> How you g
On 26.04.2017 19:08, Robert Kern wrote:
> On Wed, Apr 26, 2017 at 2:15 AM, Julian Taylor
> mailto:jtaylor.deb...@googlemail.com>>
> wrote:
>
>> Indeed,
>> Most of this discussion is irrelevant to numpy.
>> Numpy only really deals with the in memory storage of strings. And in
>> that it is limited
On Wed, Apr 26, 2017 at 3:27 AM, Anne Archibald
wrote:
>
> On Wed, Apr 26, 2017 at 7:20 AM Stephan Hoyer wrote:
>>
>> On Tue, Apr 25, 2017 at 9:21 PM Robert Kern
wrote:
>>>
>>> On Tue, Apr 25, 2017 at 6:27 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:
>>>
>>> > The maximum length of a
On Apr 26, 2017 9:30 AM, "Chris Barker - NOAA Federal" <
chris.bar...@noaa.gov> wrote:
UTF-8 does not match the character-oriented Python text model. Plenty
of people argue that that isn't the "correct" model for Unicode text
-- maybe so, but it is the model python 3 has chosen. I wrote a much
lo
On Wed, Apr 26, 2017 at 2:31 PM, Nathaniel Smith wrote:
> On Apr 26, 2017 9:30 AM, "Chris Barker - NOAA Federal"
> wrote:
>
>
> UTF-8 does not match the character-oriented Python text model. Plenty
> of people argue that that isn't the "correct" model for Unicode text
> -- maybe so, but it is the
On Wed, 2017-04-26 at 19:43 +0200, Julian Taylor wrote:
> On 26.04.2017 19:08, Robert Kern wrote:
> > On Wed, Apr 26, 2017 at 2:15 AM, Julian Taylor
> > mailto:jtaylor.deb...@googlemail.co
> > m>>
> > wrote:
> >
> > > Indeed,
> > > Most of this discussion is irrelevant to numpy.
> > > Numpy only r
On Wed, Apr 26, 2017 at 10:43 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
>
> On 26.04.2017 19:08, Robert Kern wrote:
> > On Wed, Apr 26, 2017 at 2:15 AM, Julian Taylor
> > mailto:jtaylor.deb...@googlemail.com>>
> > wrote:
> >
> >> Indeed,
> >> Most of this discussion is irrelevant to
On Wed, Apr 26, 2017 at 11:38 AM, Sebastian Berg
wrote:
> I remember talking with a colleague about something like that. And
> basically an annoying thing there was that if you strip the zero bytes
> in a zero padded string, some encodings (UTF16) may need one of the
> zero bytes to work right. (
On Wed, Apr 26, 2017 at 11:31 AM, Nathaniel Smith wrote:
> UTF-8 does not match the character-oriented Python text model. Plenty
> of people argue that that isn't the "correct" model for Unicode text
> -- maybe so, but it is the model python 3 has chosen. I wrote a much
> longer rant about that e
On Wed, Apr 26, 2017 at 11:38 AM, Sebastian Berg wrote:
> I remember talking with a colleague about something like that. And
> basically an annoying thing there was that if you strip the zero bytes
> in a zero padded string, some encodings (UTF16) may need one of the
> zero bytes to work right.
On Wed, Apr 26, 2017 at 10:45 AM, Robert Kern wrote:
> >>> > The maximum length of an UTF-8 character is 4 bytes, so we could use
> that to size arrays by character length. The advantage over UTF-32 is that
> it is easily compressible, probably by a factor of 4 in many cases.
>
isn't UTF-32 pret
On Wed, Apr 26, 2017 at 3:27 PM, Chris Barker wrote:
> When a numpy user wants to put a string into a numpy array, they should
> know how long a string they can fit -- with "length" defined how python
> strings define it.
>
Sorry, I remain unconvinced (for the reasons that Robert, Nathaniel and
On Apr 26, 2017 12:09 PM, "Robert Kern" wrote:
On Wed, Apr 26, 2017 at 10:43 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
[...]
> I have read every mail and it has been a large waste of time, Everything
> has been said already many times in the last few years.
> Even if you memory ma
On Wed, Apr 26, 2017 at 4:30 PM, Stephan Hoyer wrote:
>
> Sorry, I remain unconvinced (for the reasons that Robert, Nathaniel and
> myself have already given), but we seem to be talking past each other here.
>
yeah -- I think it's not clear what the use cases we are talking about are.
> I am s
On Wed, Apr 26, 2017 at 4:49 PM, Nathaniel Smith wrote:
>
> On Apr 26, 2017 12:09 PM, "Robert Kern" wrote:
>> It's worthwhile enough that both major HDF5 bindings don't support
Unicode arrays, despite user requests for years. The sticking point seems
to be the difference between HDF5's view of a
On Wed, Apr 26, 2017 at 5:02 PM, Chris Barker wrote:
> But a bunch of folks have brought up that while we're messing around with
string encoding, let's solve another problem:
>
> * Exchanging unicode text at the binary level with other systems that
generally don't use UCS-4.
>
> For THAT -- utf-8
On Wed, Apr 26, 2017 at 5:17 PM, Robert Kern wrote:
> The proposal is for only latin-1 and UTF-32 to be supported at first, and
> the eventual support of UTF-8 will be constrained by specification of the
> width in terms of characters rather than bytes, which conflicts with the
> use cases of UTF
On Wed, Apr 26, 2017 at 4:49 PM, Nathaniel Smith wrote:
> It's worthwhile enough that both major HDF5 bindings don't support Unicode
> arrays, despite user requests for years. The sticking point seems to be the
> difference between HDF5's view of a Unicode string array (defined in size
> by the b
23 matches
Mail list logo