All, I am a newbie Mahout user and am trying to use the "Quick tour of text analysis using the Mahout command line" . Thank you to whomever contributed to that page.
> https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis > +using+the+Mahout+command+line Went all the way from beginning to end of the page with "seemingly" no hiccups. At the very end of the "tour", I became confused because the command: > mahout seqdumper -i reuters-matrix/matrix | more Allowed me to see output (snippet) > Key: 1: Value: > /reut2-000.sgm-1.txt:{312:0.1250488193181003,2962:0.07532412503846121,4403:0.2 > 2792379999043863,5405:0.0964390139170019,5997:0.030023608542497426,10108:0.126 > 28552842745744,13043:0.14709923014699935,13653:0.07372109235301716,13750:0.188 > 8955967611108,15886:0.1543819831189062,15901:0.10756083643096839,15969:0.36601 > 581899071867,16138:0.12548750176412274,16553:0.11490460601515046,17734:0.10869 > 648237816114,17978:0.11932381316475806,18019:0.1051527785317777,22224:0.123091 > 46422711122,22456:0.1371221887995933,22837:0.19295627853659875,25480:0.0616936 > 10076373216,25958:0.09251293588851367,26105:0.10304941346400417,26507:0.123271 > 84002913602,28332:0.1794774670703689,28335:0.10843140748339948,28480:0.0801873 > 7549811794,29541:0.11169278315306423,30534:0.18480378614987836,30921:0.1987470 > 224449987,31071:0.17024007142554856,31386:0.22792379999043863,31433:0.14788025 > 30196623,31815:0.06001469365693789,32099:0.1284458798636675,32334:0.1097379357 > 6935256,32385:0.12143572490835457,34782:0.030407287755940444,35425:0.035819767 > 691229826,37264:0.20518922008525398,37355:0.2879544482952078,37818:0.108198203 > 50102567,39273:0.10347873039101099,39831:0.08810699655751153,39979:0.095282500 > 26282217,40427:0.18975048184863322,41154:0.06582064373931332,} Reading through that snippet of data made me think that there exists a document with rowed 41154 with cosine value of ~0.0658 (the last element in the snippet). The problem is that the folder > /Users/scottccote/Documents/toy-workspace/MiA/reuters-extracted Only has 21578 files in it. Indeed, my dictionary file (output command used shown below) > mahout seqdumper -i reuters-matrix/docIndex | tail Has a max key of > Key: 21576: Value: /reut2-021.sgm-98.txt > Key: 21577: Value: /reut2-021.sgm-99.txt > Count: 21578 So I cannot find the document with key value 41154 . What does the 41154 related to???? Obviously I have misunderstood something that I did or need to do in the tour. Can someone please shine a light on where I strayed? I have scripted every step that I took and can share them here if desired (I noticed that some of the output file names changed since the page was written so I made adjustments). Regards, SCott PS Thanks TD for helping me earlier