Re: [scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-08 Thread Andrew Howe
Doesn't seem like a sklearn issue, but an OS / hardware issue. Again, a full stack trace would be useful information. Either way, you can try training on a sample or via cross-validation. I believe some estimators can also use incremental training. Andrew <~~~> J. Andrew H

Re: [scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-07 Thread Liu James
Thanks for reply. I tested different size of data on different distros ,and found when data is over 500 thousand rows (with 50 columns), the crash will happened with same error message -- kernel page error. Guillaume Lemaître 于2021年1月6日周三 下午10:33写道: > And it seems that the piece of traceback re

Re: [scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Guillaume Lemaître
And it seems that the piece of traceback refer to NumPy. On Wed, 6 Jan 2021 at 12:48, Andrew Howe wrote: > A core dump generally happens when a process tries to access memory > outside it's allocated address space. You've not specified what estimator > you were using, but I'd guess it attempted

Re: [scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Andrew Howe
A core dump generally happens when a process tries to access memory outside it's allocated address space. You've not specified what estimator you were using, but I'd guess it attempted to do something with the dataset that resulted in it being duplicated or otherwise expanded beyond the memory capa

[scikit-learn] 2 million samples dataset caused python and OS crash

2021-01-06 Thread Liu James
Hi all, I'm using a medium dataset KDD99 IDS( https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset) for model training, and the dataset has 2 million samples. When using fit_transform(), the OS crashed with log "Process 13851(python) of user xxx dumped core. Sta