Re: [scikit-learn] Need help in dealing with large dataset

2018-03-05 Thread Sebastian Raschka
Like Guillaume suggested, you don't want to load the whole array into memory if it's that large. There are many different ways for how to deal with this. The most naive way would be to break up your NumPy array into smaller NumPy array and load them iteratively with a running accuracy calculatio

Re: [scikit-learn] Need help in dealing with large dataset

2018-03-05 Thread Guillaume LemaƮtre
If you work with deep net you need to check the utils from the deep net library. For instance in keras, you should create a batch generator if you need to deal with large dataset. In patch torch you can use the data loader which and the ImageFolder from torchvision which manage the loading for you.

[scikit-learn] Need help in dealing with large dataset

2018-03-05 Thread CHETHAN MURALI
Dear All, I am working on building a CNN model for image classification problem. As par of it I have converted all my test images to numpy array. Now when I am trying to split the array into training and test set I am getting memory error. Details are as below: X = np.load("./data/X_train.npy",