I'm experiencing a problem whereby it appears that Nutch is only indexing the first 200 files in a given directory (from seeds.txt). The evidence I have for this is:
1.) I am not getting the search hits I'd expect. 2.) In the _0.fdt file, only the first 200 html files are referenced, in alphabetical order. By "the first 200 files" I mean the first 200 files in one of the directories in seeds.txt. The actual number of files referenced in _0.fdt happens to be 202. seeds.txt contains 2 directories. One of these directories contains two files that I'd expect to be indexed. The paths to these 2 files are listed in _0.fdt. The second directory in seeds.txt contains 2900 files that I'd expect to be indexed. Of these 2900 files, only the first 200 (alphabetically ordered) files are in _0.fdt. These files are indexed directly from the filesystem. In general, our methods of indexing and searching function properly. We only have a problem when there are more than 200 files in a directory to be indexed. I do not see any Nutch configuration that would impose such a limit. Does anyone know why this may be happening? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-Files-Limited-to-200-tp2825662p2825662.html Sent from the Nutch - User mailing list archive at Nabble.com.

