[scikit-learn] urgent help in scikit-learn

2017-03-30 Thread Shuchi Mala
Hi everyone, I have the data with following attributes: (Latitude, Longitude). Now I am performing clustering using DBSCAN for my data. I have following doubts: 1. How can I add data to the data set of the package? 2. How I can calculate Rand index for my data? 3. How to use make_blobs command fo

[scikit-learn] GSoC 2017 Proposal: Improve online learning for linear models

2017-03-30 Thread Yizheng Zhao
Hi developers, It is excited that I have opportunity work with you! I am Yizheng Zhao, a graduate student at Carnegie Mellon University majoring in Software Engineering and I’ve got my Bachelor’s degree in Math in 2016 at Jilin University. I love python and machine learning and that why I wann

Re: [scikit-learn] urgent help in scikit-learn

2017-03-30 Thread Sebastian Raschka
Hi, Shuchi, > 1. How can I add data to the data set of the package? You don’t need to add your dataset to the dataset module to run your analysis. A convenient way to load it into a numpy array would be via pandas. E.g., import pandas as pd df = pd.read_csv(‘your_data.txt', delimiter=r"\s+”) X

Re: [scikit-learn] urgent help in scikit-learn

2017-03-30 Thread Shane Grigsby
Since you're using lat / long coords, you'll also want to convert them to radians and specify 'haversine' as your distance metric; i.e. : coords = np.vstack([lats.ravel(),longs.ravel()]).T coords *= np.pi / 180. # to radians ...and: db = DBSCAN(eps=0.3, min_samples=10, metric='haversi

Re: [scikit-learn] urgent help in scikit-learn

2017-03-30 Thread Shuchi Mala
Thank you so much for your quick reply. I have one more doubt. The below statement is used to calculate rand score. metrics.adjusted_rand_score(labels_true, labels_pred) In my case what will be labels_true and labels_pred and how I will calculate labels_pred? With Best Regards, Shuchi Mala Resea