If X2 doesn't have the same ordering you wouldn't be able to pass that
directly either. The data is split before the being run into the pipeline,
so just using hstack is fine.
I've got the code I use to make this easier here by the way :
https://github.com/andaag/scikit_helpers .
On Fri, Aug 22,
Hey,
Your tip put me very far in the right direction. I have one further question.
It seems that just appending the features in the featureunion as a pipeline
step may create havoc as when I implement gridsearchcv the data is going to
have some sort of randomness in the data order compared to t
Great work, and I hope to see it merged soon. Thanks!
On 17 August 2014 23:54, Issam wrote:
> Hi all,
>
> I finished writing the final summary of my work for GSoC 2014. It is
> posted here: http://issamlaradji.blogspot.com/
>
> Thank you!
>
> Best regards,
> --Issam Laradji
>
>
> -
Really interesting work, well done in GSoC!
On 22 August 2014 09:35, Manoj Kumar wrote:
> Hi,
>
> A quick wrap up post about my Summer of Code
>
> http://manojbits.wordpress.com/2014/08/21/gsoc-the-end-of-another-journey/
>
>
> --
> Godspeed,
> Manoj Kumar,
> Mech Undergrad
> http://manojbits.w
Hi,
A quick wrap up post about my Summer of Code
http://manojbits.wordpress.com/2014/08/21/gsoc-the-end-of-another-journey/
--
Godspeed,
Manoj Kumar,
Mech Undergrad
http://manojbits.wordpress.com
--
Slashdot TV.
Vide
Hi,
OS denied me memory upon running CV in the script below. I am still
investigating whether it was a mistake of the scheduler on the server, but
I think the process had access to 240 GB memory but reproducibly crashes
upon using 120035176K with the error message below. I paste my conda info
outp
Great effort Hamzeh -- your GSoC has dealt with problems everyone has put
off until later and helped the community a lot, thanks!
On 19 August 2014 11:32, Hamzeh Alsalhi wrote:
> Hello, I am wrapping up my final blogpost and I want to say that this was
> an awesome summer of code! It has been a
Hi, Zoraida,
thanks for the follow up! I went with a short, custom ColumnSelector class, but
the itemgetter is even nicer.
Best,
Sebastian
On Aug 21, 2014, at 2:57 PM, ZORAIDA HIDALGO SANCHEZ
wrote:
> Sebastian,
>
> a few days ago, I asked a very similar question and I got this link as a
>
Sebastian,
a few days ago, I asked a very similar question and I got this link as a
response:
https://github.com/scikit-learn/scikit-learn/issues/2034
I think that you could try something similar.
Best,
Zoraida.-
El 21/08/14 18:48, "Sebastian Okser" escribió:
>I am trying to use the pipe
I am trying to use the pipeline combined with a countvectorizer,
tfidftransformer and randomforest. However the output of the second step is a
sparse array and randomforest requires a dense one. How can I add a step to
allow for a conversion of the matrix from sparse to dense, using something
a
If you set n_jobs to XXX, it will spawn XXX threads or processes. Thus, you will
need to ask for XXX cores. Note that it’s often possible to retrieve XXX in
your script using os.environ.
If you use less than the XXX cores, then you won’t
use all the available cpu. If you ask for more than XXX cor
I still have following doubt:
I understand that n_jobs "should be depending on the number of cpu cores
available on your machine". But I am running code on Grid computing
environment where I have to specify the number of CPU cores in advance.
Does this mean if I (reserve 64 cores and) specify n_j
2014-08-21 13:44 GMT+02:00 Joel Nothman :
> I think RandomForestClassifier, using multithreading in version 0.15, should
> work nested in multiprocessing.
It would work, but the p * n threads from p processes using n threads
each would still compete for the cores, right?
-
On 21 August 2014 21:46, Gael Varoquaux
wrote:
> On Thu, Aug 21, 2014 at 09:44:37PM +1000, Joel Nothman wrote:
> > I think RandomForestClassifier, using multithreading in version 0.15,
> should
> > work nested in multiprocessing.
>
> Good point, as it uses threading. Thus, for version 0.15, what
On Thu, Aug 21, 2014 at 09:44:37PM +1000, Joel Nothman wrote:
> I think RandomForestClassifier, using multithreading in version 0.15, should
> work nested in multiprocessing.
Good point, as it uses threading. Thus, for version 0.15, what I just
said was irrelevant.
G
On 21 August 2014 21:39, Gael Varoquaux
wrote:
> On Thu, Aug 21, 2014 at 12:32:08PM +0200, Sheila the angel wrote:
> > 2. If I use the classifier such as RandomForestClassifier where
> > 'n_jobs' can be specified, will it make any difference if I specify
> > "n_jobs" at the classifier level also-
On Thu, Aug 21, 2014 at 12:32:08PM +0200, Sheila the angel wrote:
> 2. If I use the classifier such as RandomForestClassifier where
> 'n_jobs' can be specified, will it make any difference if I specify
> "n_jobs" at the classifier level also-
We don't support nested parallelism, unfortunately.
G
First Thanks for reply.
@Hames : I understand that n_jobs "should be depending on the number of cpu
cores available on your machine". But I am running code on Grid computing
environment where I have to specify the number of CPUs in advance.
Does this mean if I (reserve 64 cores and) specify n_job
2014-08-21 12:32 GMT+02:00 Sheila the angel :
> 1. What should be the n_jobs value, 8 or (8*4=) 32 ?
n_jobs is the number of CPUs you want to use, not the amount of work.
(It's a misnomer because the number of jobs/work items is variable;
the parameter determines the number of workers performing t
Hi,
1. The n_jobs parameter controls the number of physical processes started in
parallel. It should be set depending on the
number of cpu cores available on your machine, independent of the type of or
size of the CV search you are trying to
run. On a typical desktop machine with four cores t
Hi,
Using GridSearchCV, I am trying to optimize two parameters values.
In total, I have 8 parameter combinations and doing 4 fold cross validation.
I want to run it in parallel environment.
My questions are:
1. What should be the n_jobs value, 8 or (8*4=) 32 ?
(I know I can specify n_jobs=-1 but du
There was a thread on the mailing-list a while ago on instance reduction
methods.
It was decided to not include such methods for the time being as changing
n_samples is not supported by transformers or pipelines.
It is also not clear yet how such methods would play with grid search, for
instance.
22 matches
Mail list logo