Also you should avoid calling release directly, because it will also be
called automatically here:
https://github.com/apache/arrow/blob/master/python/pyarrow/_plasma.pyx#L222
Instead, you should call "del buffer" on the PlasmaBuffer. I'll submit a PR
to make the release method private.
The only
Hi Corey,
It is possible that the current eviction policy will evict a ton of objects
at once. Since the plasma store is single threaded, this could cause the
plasma store to be unresponsive while the eviction is happening (though it
should not hang permanently, just temporarily).
You could alway
Robert,
Yes I am using separate Plasma clients in each different thread. I also
verified that I am not using up all the file descriptors or reaching the
overcommit limit.
I do see that the Plasma server is evicting objects every so often. I'm
assuming this eviction may be going on in the backgrou
Seems like we might want to write down some best practices for this
level of large scale usage, essentially a supercomputer-like rig. I
wouldn't even know where to come by a machine with a machine with >
2TB memory for scalability / concurrency load testing
On Mon, Jul 16, 2018 at 2:59 PM, Robert
Are you using the same plasma client from all of the different threads? If
so, that could cause race conditions as the client is not thread safe.
Alternatively, if you have a separate plasma client for each thread, then
you may be running out of file descriptors somewhere (either the client
proces
Update:
I'm investigating the possibility that I've reached the overcommit limit in
the kernel as a result of all the parallel processes.
This still doesn't fix the client.release() problem but it might explain
why the processing appears to halt, after some time, until I restart the
Jupyter kerne
Wes,
Unfortunately, my code is on a separate network. I'll try to explain what
I'm doing and if you need further detail, I can certainly pseudocode
specifics.
I am using multiprocessing.Pool() to fire up a bunch of threads for
different filenames. In each thread, I'm performing a pd.read_csv(),
s
hi Corey,
Can you provide the code (or a simplified version thereof) that shows
how you're using Plasma?
- Wes
On Tue, Jul 10, 2018 at 11:45 AM, Corey Nolet wrote:
> I'm on a system with 12TB of memory and attempting to use Pyarrow's Plasma
> client to convert a series of CSV files (via Pandas)