On Thu, Apr 20, 2017 at 1:16 PM, Phil Hodge wrote:
>
> On 04/20/2017 03:17 PM, Anne Archibald wrote:
>>
>> Actually if I understood the spec, FITS header lines are 80 bytes long
and contain ASCII with no NULLs; strings are quoted and trailing spaces are
stripped.
>
> FITS BINTABLE extensions can h
On 04/20/2017 03:17 PM, Anne Archibald wrote:
Actually if I understood the spec, FITS header lines are 80 bytes long
and contain ASCII with no NULLs; strings are quoted and trailing
spaces are stripped.
FITS BINTABLE extensions can have columns containing strings, and in
that case the value
On Thu, Apr 20, 2017 at 12:51 PM, Stephan Hoyer wrote:
>
> On Thu, Apr 20, 2017 at 12:17 PM, Robert Kern
wrote:
>>
>> On Thu, Apr 20, 2017 at 12:05 PM, Stephan Hoyer wrote:
>> >
>> > On Thu, Apr 20, 2017 at 11:53 AM, Robert Kern
wrote:
>> >>
>> >> I don't know of a format off-hand that works wi
> I suggest a new data type 'text[encoding]', 'T'.
I like the suggestion very much (it is even in between S and U!). The
utf-8 manifesto linked to above convinced me that the number that
should follow is the number of bytes, which is nicely consistent with
use in all numerical dtypes.
Any way, m
On Thu, Apr 20, 2017 at 12:27 PM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
>
> On 20.04.2017 20:53, Robert Kern wrote:
> > On Thu, Apr 20, 2017 at 6:15 AM, Julian Taylor
> > mailto:jtaylor.deb...@googlemail.com>>
> > wrote:
> >
> >> Do you have comments on how to go forward, in particu
On Thu, Apr 20, 2017 at 12:17 PM, Robert Kern wrote:
> On Thu, Apr 20, 2017 at 12:05 PM, Stephan Hoyer wrote:
> >
> > On Thu, Apr 20, 2017 at 11:53 AM, Robert Kern
> wrote:
> >>
> >> I don't know of a format off-hand that works with numpy uniform-length
> strings and Unicode as well. HDF5 (to m
On Thu, Apr 20, 2017 at 12:17 PM, Anne Archibald
wrote:
>
> On Thu, Apr 20, 2017 at 8:55 PM Robert Kern wrote:
>> For example, to my understanding, FITS files more or less follow numpy
assumptions for its string columns (i.e. uniform-length). But it enforces
7-bit-clean ASCII and pads with termi
On 20.04.2017 20:59, Anne Archibald wrote:
> On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor
> mailto:jtaylor.deb...@googlemail.com>>
> wrote:
>
> I probably have formulated my goal with the proposal a bit better, I am
> not very interested in a repetition of which encoding to use debate.
>
I suggest a new data type 'text[encoding]', 'T'.
1. text can be cast to python strings via decoding.
2. Conceptually casting to python bytes first cast to a string then
calls encode(); the current encoding in the meta data is used by
default, but the new encoding can be overridden.
I slightly f
On Thu, Apr 20, 2017 at 12:53 PM, Robert Kern wrote:
> On Thu, Apr 20, 2017 at 6:15 AM, Julian Taylor <
> jtaylor.deb...@googlemail.com> wrote:
>
> > Do you have comments on how to go forward, in particular in regards to
> > new dtype vs modify np.unicode?
>
> Can we restate the use cases explici
On 20.04.2017 20:53, Robert Kern wrote:
> On Thu, Apr 20, 2017 at 6:15 AM, Julian Taylor
> mailto:jtaylor.deb...@googlemail.com>>
> wrote:
>
>> Do you have comments on how to go forward, in particular in regards to
>> new dtype vs modify np.unicode?
>
> Can we restate the use cases explicitly? I
On Thu, Apr 20, 2017 at 12:05 PM, Stephan Hoyer wrote:
>
> On Thu, Apr 20, 2017 at 11:53 AM, Robert Kern
wrote:
>>
>> I don't know of a format off-hand that works with numpy uniform-length
strings and Unicode as well. HDF5 (to my recollection) supports arrays of
NULL-terminated, uniform-length AS
On Thu, Apr 20, 2017 at 8:55 PM Robert Kern wrote:
> On Thu, Apr 20, 2017 at 6:15 AM, Julian Taylor <
> jtaylor.deb...@googlemail.com> wrote:
>
> > Do you have comments on how to go forward, in particular in regards to
> > new dtype vs modify np.unicode?
>
> Can we restate the use cases explicitl
Perhaps `np.encoded_str[encoding]` as the name for the new type, if we
decide a new type is necessary?
Am I right in thinking that the general problem here is that it's very easy
to discard metadata when working with dtypes, and that by adding metadata
to `unicode_`, we risk existing code careless
On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor
wrote:
> I probably have formulated my goal with the proposal a bit better, I am
> not very interested in a repetition of which encoding to use debate.
> In the end what will be done allows any encoding via a dtype with
> metadata like datetime.
> Thi
On Thu, Apr 20, 2017 at 11:53 AM, Robert Kern wrote:
> I don't know of a format off-hand that works with numpy uniform-length
> strings and Unicode as well. HDF5 (to my recollection) supports arrays of
> NULL-terminated, uniform-length ASCII like FITS, but only variable-length
> UTF8 strings.
>
On Thu, Apr 20, 2017 at 6:15 AM, Julian Taylor <
jtaylor.deb...@googlemail.com> wrote:
> Do you have comments on how to go forward, in particular in regards to
> new dtype vs modify np.unicode?
Can we restate the use cases explicitly? I feel like we ended up with the
current sub-optimal situation
On Thu, 20 Apr 2017 10:26:13 -0700
Stephan Hoyer wrote:
>
> I agree with Anne here. Variable-length encoding would be great to have,
> but even fixed length UTF-8 (in terms of memory usage, not characters)
> would solve NumPy's Python 3 string problem. NumPy's memory model needs a
> fixed size pe
On Thu, Apr 20, 2017 at 4:21 AM, Ralf Gommers
wrote:
>
>
> On Thu, Apr 20, 2017 at 6:28 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>> Hi All,
>>
>> Currently numpy master has a bogus stride that will cause an error when
>> downstream projects misuse it. That is done in order to
On Thu, Apr 20, 2017 at 10:43 AM, Chris Barker
wrote:
> On Thu, Apr 20, 2017 at 10:26 AM, Stephan Hoyer wrote:
>
>> I agree with Anne here. Variable-length encoding would be great to have,
>> but even fixed length UTF-8 (in terms of memory usage, not characters)
>> would solve NumPy's Python 3 s
I probably have formulated my goal with the proposal a bit better, I am
not very interested in a repetition of which encoding to use debate.
In the end what will be done allows any encoding via a dtype with
metadata like datetime.
This allows any codec (including truncated utf8) to be added easily
> if you truncate a utf-8 bytestring, you may get invalid data
Note that in general truncating unicode codepoints is not a safe operation
either, as combining characters are a thing. So I don't think this is a
good argument against UTF8.
Also, is silent truncation a think that we want to allow to
On Thu, Apr 20, 2017 at 10:36 AM, Neal Becker wrote:
> I'm no unicode expert, but can't we truncate unicode strings so that only
> valid characters are included?
>
sure -- it's just a bit fiddly -- and you need to make sure that everything
gets passed through the proper mechanism. numpy is all a
On Thu, Apr 20, 2017 at 10:26 AM, Stephan Hoyer wrote:
> I agree with Anne here. Variable-length encoding would be great to have,
> but even fixed length UTF-8 (in terms of memory usage, not characters)
> would solve NumPy's Python 3 string problem. NumPy's memory model needs a
> fixed size per a
I'm no unicode expert, but can't we truncate unicode strings so that only
valid characters are included?
On Thu, Apr 20, 2017 at 1:32 PM Chris Barker wrote:
> On Thu, Apr 20, 2017 at 9:47 AM, Anne Archibald > wrote:
>
>> Is there any reason not to support all Unicode encodings that python
>> do
On Thu, Apr 20, 2017 at 9:47 AM, Anne Archibald
wrote:
> Is there any reason not to support all Unicode encodings that python does,
> with the same names and semantics? This would surely be the simplest to
> understand.
>
I think it should support all fixed-length encodings, but not the non-fixe
Julian -- thanks for taking this on. NumPy's handling of strings on Python
3 certainly needs fixing.
On Thu, Apr 20, 2017 at 9:47 AM, Anne Archibald
wrote:
> Variable-length encodings, of which UTF-8 is obviously the one that makes
> good handling essential, are indeed more complicated. But is i
Thanks so much for reviving this conversation -- we really do need to
address this.
My thoughts:
What people apparently want is a string type for Python3 which uses less
> memory for the common science use case which rarely needs more than
> latin1 encoding.
>
Yes -- I think there is a real dema
On Thu, Apr 20, 2017 at 3:17 PM Julian Taylor
wrote:
> To please everyone I think we need to go with a dtype that supports
> multiple encodings via metadata, similar to how datatime supports
> multiple units.
> E.g.: 'U10[latin1]' are 10 characters in latin1 encoding
>
> Encodings we should suppo
Hello,
As you probably know numpy does not deal well with strings in Python3.
The np.string type is actually zero terminated bytes and not a string.
In Python2 this happened to work out as it treats bytes and strings the
same way. But in Python3 this type is pretty hard to work with as each
time yo
On Thu, Apr 20, 2017 at 6:28 AM, Charles R Harris wrote:
> Hi All,
>
> Currently numpy master has a bogus stride that will cause an error when
> downstream projects misuse it. That is done in order to help smoke out
> errors. Previously that bogus stride has been fixed up for releases, but
> that
31 matches
Mail list logo