ML Pipeline question about caching

2015-03-17 Thread Cesar Flores
Hello all: I am using the ML Pipeline, which I consider very powerful. I have the next use case: - I have three transformers, which I will call A,B,C, that basically extract features from text files, with no parameters. - I have a final stage D, which is the logistic regression

Re: ML Pipeline question about caching

2015-03-17 Thread Peter Rudenko
Hi Cesar, I had a similar issue. Yes for now it’s better to do A,B,C outside a crossvalidator. Take a look to my comment https://issues.apache.org/jira/browse/SPARK-4766?focusedCommentId=14320038page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14320038 and this