Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-28 Thread Olivier Grisel
You have to use a dedicated framework to distribute the computation on a cluster like you cray system. You can use mpi, or dask with dask-jobqueue but the also need to run parallel algorithms that are efficient when running in a distributed with a high cost for communication between distributed wo

Re: [scikit-learn] baggingClassifier with pipeline

2019-06-28 Thread Roxana Danger
Hi Manuel, thanks for your reply, before trying an alternative as PipeGraph, or implementing the class as you propose, I would prefer to include some code in the _fit method of BaggingClassifier, so the correct value of X can be passed to the base_estimator (the dataframe or its array of values). M

Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-28 Thread Mauricio Reis
Sorry, but just now I reread your answer more closely. It seems that the "n_jobs" parameter of the DBScan routine brings no benefit to performance. If I want to improve the performance of the DBScan routine I will have to redesign the solution to use MPI resources. Is it correct? --- Ats.,

Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-28 Thread Mauricio Reis
My laptop has Intel I7 processor with 4 cores. When I run the program on Windows 10, the "joblib.cpu_count()" routine returns "4". In these cases, the same test I did on the Cray computer caused a 10% increase in the processing time of the DBScan routine when I used the "n_jobs = 4" parameter c

Re: [scikit-learn] baggingClassifier with pipeline

2019-06-28 Thread Manuel CASTEJÓN LIMAS via scikit-learn
You can always add a first step that turns you numpy array into a DataFrame such as the one required afterwards. A bit of object oriented programming might be required though, for deriving you class from BaseTransformer and writing you particular code for fit and transform method. Alternatively you

Re: [scikit-learn] titanic dataset, use for book

2019-06-28 Thread Sole Galli
Thank you! that's very helpful :) On Thu, 27 Jun 2019 at 12:27, Roman Yurchak via scikit-learn < scikit-learn@python.org> wrote: > Meanwhile, loading the CSV from OpenML (https://www.openml.org/d/40945) > would also work, > > pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl') >

Re: [scikit-learn] Scikit Learn in a Cray computer

2019-06-28 Thread Brown J.B. via scikit-learn
> > where you can see "ncpus = 1" (I still do not know why 4 lines were > printed - > > (total of 40 nodes) and each node has 1 CPU and 1 GPU! > > #PBS -l select=1:ncpus=8:mpiprocs=8 > aprun -n 4 p.sh ./ncpus.py > You can request 8 CPUs from a job scheduler, but if each node the script runs on c