Re: [scikit-learn] Pipegraph is on its way!

2018-02-12 Thread Manuel Castejón Limas
While we keep working on the docs and figures, here is a little example you all can already run: import numpy as np import pandas as pd from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression from sklearn

Re: [scikit-learn] Applying clustering to cosine distance matrix

2018-02-12 Thread prince gosavi
Will look into it.Although I have problem generating cluster as my data is 14000x14000 distance_matrix and it says "Memory Error". I have 6GB RAM. Any insight on this error is welcomed. Regards On Tue, Feb 13, 2018 at 3:19 AM, federico vaggi wrote: > [image: Boxbe]

Re: [scikit-learn] Applying clustering to cosine distance matrix

2018-02-12 Thread federico vaggi
As a caveat, a lot of clustering algorithms assume that the distance matrix is a proper metric. If your distance is not a proper metric then the results might be meaningless (the narrative docs do a good job of discussing this). On Mon, 12 Feb 2018 at 13:30 prince gosavi wrote: > Hi, > Thanks f

Re: [scikit-learn] Multi-Output Decision Trees for mixedclassification-regerssion problems

2018-02-12 Thread prince gosavi
Thanks for the reply will definitely try to PR this issue. Regrads On Tue, Feb 13, 2018 at 2:10 AM, Joel Nothman wrote: > [image: Boxbe] This message is eligible > for Automatic Cleanup! (joel.noth...@gmail.com) Add cleanup rule >

Re: [scikit-learn] Applying clustering to cosine distance matrix

2018-02-12 Thread prince gosavi
Hi, Thanks for those tips Sebastian.That just saved my day. Regards, Rajkumar On Tue, Feb 13, 2018 at 12:44 AM, Sebastian Raschka wrote: > [image: Boxbe] This message is eligible > for Automatic Cleanup! (se.rasc...@gmail.com) Add cleanup rule >

Re: [scikit-learn] Multi-Output Decision Trees for mixed classification-regerssion problems

2018-02-12 Thread Joel Nothman
presuming there are clear applications for this, other models should be able to support mixed targets similarly, like MLP. since we don't really have an API design for this, it might take some time to find consensus on what it should look like. but a PR would be a good way to concretely consider it

Re: [scikit-learn] Applying clustering to cosine distance matrix

2018-02-12 Thread Sebastian Raschka
Hi, by default, the clustering classes from sklearn, (e.g., DBSCAN), take an [num_examples, num_features] array as input, but you can also provide the distance matrix directly, e.g., by instantiating it with metric='precomputed' my_dbscan = DBSCAN(..., metric='precomputed') my_dbscan.fit(my_dis

[scikit-learn] Applying clustering to cosine distance matrix

2018-02-12 Thread prince gosavi
I have generated a cosine distance matrix and would like to apply clustering algorithm to the given matrix. np.shape(distance_matrix)==(14000,14000) I would like to know which clustering suits better and is there any need to process the data further to get it in the form so that a model can be app

Re: [scikit-learn] Multi-Output Decision Trees for mixed classification-regerssion problems

2018-02-12 Thread Дмитрий Игнатов
Just a comment: it would be a useful tool. -Dmitry Отправлено с iPhone > 12 февр. 2018 г., в 14:40, Evgeniya Korneva > написал(а): > > > Dear all, > > For my research, I'm working with multi-output decision trees. In the current > sklearn implementation, a tree can predict either several n

[scikit-learn] Multi-Output Decision Trees for mixed classification-regerssion problems

2018-02-12 Thread Evgeniya Korneva
Dear all, For my research, I'm working with multi-output decision trees. In the current sklearn implementation, a tree can predict either several numerical or several categorical targets simultaneously, but not a mixture of those. However, predicting various targets jointly is often beneficia