Re: [Numpy-discussion] Tensor Contraction (HPTT) and Tensor Transposition (TCL)

2017-08-16 Thread Anne Archibald
(NB all: the thread title seems to interchange the acronyms for the Thread Contraction Library (TCL) and the High-Perormance Tensor Transpose (HPTT) packages. I'm not fixing it so as not to break threading.) On Wed, Aug 16, 2017 at 11:40 AM Paul Springer wrote: > If you want to get it into Numpy

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-26 Thread Anne Archibald
On Wed, Apr 26, 2017 at 7:20 AM Stephan Hoyer wrote: > On Tue, Apr 25, 2017 at 9:21 PM Robert Kern wrote: > >> On Tue, Apr 25, 2017 at 6:27 PM, Charles R Harris < >> charlesr.har...@gmail.com> wrote: >> >> > The maximum length of an UTF-8 character is 4 bytes, so we could use >> that to size arr

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-25 Thread Anne Archibald
On Tue, Apr 25, 2017 at 7:52 PM Phil Hodge wrote: > On 04/25/2017 01:34 PM, Anne Archibald wrote: > > I know they're not numpy-compatible, but FITS header values are > > space-padded; does that occur elsewhere? > > Strings in FITS headers are delimited by single quotes

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-25 Thread Anne Archibald
On Tue, Apr 25, 2017 at 6:36 PM Chris Barker wrote: > > This is essentially my rant about use-case (2): > > A compact dtype for mostly-ascii text: > I'm a little confused about exactly what you're trying to do. Do you need your in-memory format for this data to be compatible with anything in par

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-25 Thread Anne Archibald
On Tue, Apr 25, 2017 at 7:09 PM Robert Kern wrote: > * HDF5 supports fixed-length and variable-length string arrays encoded in > ASCII and UTF-8. In all cases, these strings are NULL-terminated (despite > the documentation claiming that there are more options). In practice, the > ASCII strings pe

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-25 Thread Anne Archibald
On Tue, Apr 25, 2017 at 6:05 PM Chris Barker wrote: > Anyway, I think I made the mistake of mingling possible solutions in with > the use-cases, so I'm not sure if there is any consensus on the use cases > -- which I think we really do need to nail down first -- as Robert has made > clear. > I w

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-20 Thread Anne Archibald
On Thu, Apr 20, 2017 at 8:55 PM Robert Kern wrote: > On Thu, Apr 20, 2017 at 6:15 AM, Julian Taylor < > jtaylor.deb...@googlemail.com> wrote: > > > Do you have comments on how to go forward, in particular in regards to > > new dtype vs modify np.unicode? > > Can we restate the use cases explicitl

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-20 Thread Anne Archibald
On Thu, Apr 20, 2017 at 8:17 PM Julian Taylor wrote: > I probably have formulated my goal with the proposal a bit better, I am > not very interested in a repetition of which encoding to use debate. > In the end what will be done allows any encoding via a dtype with > metadata like datetime. > Thi

Re: [Numpy-discussion] proposal: smaller representation of string arrays

2017-04-20 Thread Anne Archibald
On Thu, Apr 20, 2017 at 3:17 PM Julian Taylor wrote: > To please everyone I think we need to go with a dtype that supports > multiple encodings via metadata, similar to how datatime supports > multiple units. > E.g.: 'U10[latin1]' are 10 characters in latin1 encoding > > Encodings we should suppo