You have to use a dedicated framework to distribute the computation on a
cluster like you cray system.
You can use mpi, or dask with dask-jobqueue but the also need to run
parallel algorithms that are efficient when running in a distributed with a
high cost for communication between distributed wo
Hi Manuel,
thanks for your reply, before trying an alternative as PipeGraph, or
implementing the class as you propose, I would prefer to include some code
in the _fit method of BaggingClassifier, so the correct value of X can be
passed to the base_estimator (the dataframe or its array of values).
M
Sorry, but just now I reread your answer more closely.
It seems that the "n_jobs" parameter of the DBScan routine brings no
benefit to performance. If I want to improve the performance of the
DBScan routine I will have to redesign the solution to use MPI
resources.
Is it correct?
---
Ats.,
My laptop has Intel I7 processor with 4 cores. When I run the program on
Windows 10, the "joblib.cpu_count()" routine returns "4". In these
cases, the same test I did on the Cray computer caused a 10% increase in
the processing time of the DBScan routine when I used the "n_jobs = 4"
parameter c
You can always add a first step that turns you numpy array into a DataFrame
such as the one required afterwards.
A bit of object oriented programming might be required though, for deriving
you class from BaseTransformer and writing you particular code for fit and
transform method.
Alternatively you
Thank you! that's very helpful :)
On Thu, 27 Jun 2019 at 12:27, Roman Yurchak via scikit-learn <
scikit-learn@python.org> wrote:
> Meanwhile, loading the CSV from OpenML (https://www.openml.org/d/40945)
> would also work,
>
> pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl')
>
>
> where you can see "ncpus = 1" (I still do not know why 4 lines were
> printed -
>
> (total of 40 nodes) and each node has 1 CPU and 1 GPU!
>
> #PBS -l select=1:ncpus=8:mpiprocs=8
> aprun -n 4 p.sh ./ncpus.py
>
You can request 8 CPUs from a job scheduler, but if each node the script
runs on c