[Numpy-discussion] Re: Exporting numpy arrays to binary JSON (BJData) for better portability

2022-08-25 Thread Qianqian Fang
On 8/25/22 18:33, Neal Becker wrote: the loading time (from an nvme drive, Ubuntu 18.04, python 3.6.9, numpy 1.19.5) for each file is listed below: |0.179s  eye1e4.npy (mmap_mode=None)|| ||0.001s  eye1e4.npy (mmap_mode=r)|| ||0.718s  eye1e4_bjd_raw_ndsyntax.jdb|| ||1.4

[Numpy-discussion] Re: Exporting numpy arrays to binary JSON (BJData) for better portability

2022-08-25 Thread Neal Becker
> > > the loading time (from an nvme drive, Ubuntu 18.04, python 3.6.9, numpy > 1.19.5) for each file is listed below: > > 0.179s eye1e4.npy (mmap_mode=None) > 0.001s eye1e4.npy (mmap_mode=r) > 0.718s eye1e4_bjd_raw_ndsyntax.jdb > 1.474s eye1e4_bjd_zlib.jdb > 0.635s eye1e4_bjd_lzma.jdb > > > c

[Numpy-discussion] Re: Exporting numpy arrays to binary JSON (BJData) for better portability

2022-08-25 Thread Bill Ross
>> For my case, I'd be curious about the time to add one 1T-entries file to >> another. > as I mentioned in the previous reply, bjdata is appendable [3], so you can > simply append another array (or a slice) to the end of the file. I'm thinking of numerical ops here, e.g. adding an array to i

[Numpy-discussion] Exporting numpy arrays to binary JSON (BJData) for better portability

2022-08-25 Thread Qianqian Fang
To avoid derailing the other thread on extending .npy files, I am going to start a new thread on alternative array storage file formats using binary JSON - in case there is such a need a

[Numpy-discussion] Re: An extension of the .npy file format

2022-08-25 Thread Robert Kern
On Thu, Aug 25, 2022 at 3:47 PM Qianqian Fang wrote: > On 8/25/22 12:25, Robert Kern wrote: > > I don't quite know what this means. My installed version of `jq`, for > example, doesn't seem to know what to do with these files. > > ❯ jq --version > jq-1.6 > > ❯ jq . eye5chunk_bjd_raw.jdb > parse e

[Numpy-discussion] Re: An extension of the .npy file format

2022-08-25 Thread Qianqian Fang
On 8/25/22 12:25, Robert Kern wrote: No one is really proposing another format, just a minor tweak to the existing NPY format. agreed. I was just following the previous comment on alternative formats (such as hdf5) and pros/cons of npy. I don't quite know what this means. My installed versi

[Numpy-discussion] Re: An extension of the .npy file format

2022-08-25 Thread Robert Kern
On Thu, Aug 25, 2022 at 10:45 AM Qianqian Fang wrote: > I am curious what you and other developers think about adopting > JSON/binary JSON as a similarly simple, reverse-engineering-able but > universally parsable array exchange format instead of designing another > numpy-specific binary format.

[Numpy-discussion] Re: An extension of the .npy file format

2022-08-25 Thread Bill Ross
Can you give load times for these? > 8000128 eye5chunk.npy > 5004297 eye5chunk_bjd_raw.jdb > 10338 eye5chunk_bjd_zlib.jdb >2206 eye5chunk_bjd_lzma.jdb For my case, I'd be curious about the time to add one 1T-entries file to another. Thanks, Bill -- Phobrain.com On 2022-08-24 20

[Numpy-discussion] Re: writing a known-size 1D ndarray serially as it's calced

2022-08-25 Thread Robert Kern
On Thu, Aug 25, 2022 at 4:27 AM Bill Ross wrote: > Thanks, np.lib.format.open_memmap() works great! With prediction procs > using minimal sys memory, I can get twice as many on GPU, with fewer > optimization warnings. > > Why even have the number of records in the header? Shouldn't record size >

[Numpy-discussion] Re: An extension of the .npy file format

2022-08-25 Thread Qianqian Fang
I am curious what you and other developers think about adopting JSON/binary JSON as a similarly simple, reverse-engineering-able but universally parsable array exchange format instead of designing another numpy-specific binary format. I am interested in this topic (as well as thoughts among nu

[Numpy-discussion] Re: writing a known-size 1D ndarray serially as it's calced

2022-08-25 Thread Bill Ross
Thanks, np.lib.format.open_memmap() works great! With prediction procs using minimal sys memory, I can get twice as many on GPU, with fewer optimization warnings. Why even have the number of records in the header? Shouldn't record size plus system-reported/growable file size be enough? I'd lov