: FWIW: I used the script below to build myself 3.8 million documents, with : 300 "text fields" consisting of anywhere from 1-10 "words" (integers : between 1 and 200)
Whoops ... forgot to post the script... #!/usr/bin/perl use strict; use warnings; my $num_docs = 3_800_000; my $max_words_in_field = 10; my $words_in_vocab = 200; my $num_fields = 300; # header print "id"; map { print ",${_}_t" } 1..$num_fields; print "\n"; while ($num_docs--) { print "$num_docs"; # uniqueKey for (1..$num_fields) { my $words_in_field = int(rand($max_words_in_field)); print ",\""; map { print int(rand($words_in_vocab)) . " " } 0..$words_in_field; print "\""; } print "\n"; }