> Are you concerned about storing the whole corpus text in memory, or the
> whole corpus' statistics? If the text, use input='file' or input='filename'
> (or a generator of texts).
I am not really sure which stage takes the most memory as my program
kills itself due to memory limitation. But I sus
Are you concerned about storing the whole corpus text in memory, or the
whole corpus' statistics? If the text, use input='file' or input='filename'
(or a generator of texts).
On Tue, 28 Jan 2020 at 18:01, Peng Yu wrote:
> Hi,
>
> To use TfidfVectorizer, the whole corpus must be used into memory.
Hi,
To use TfidfVectorizer, the whole corpus must be used into memory.
This can be a problem for machines without a lot of memory. Is there a
way to use only a small amount of memory by saving most intermediate
results in the disk? Thanks.
--
Regards,
Peng
___