Hello Sebastian, Thanks again.
Yes you are absolutely right, the indexer is running once, I didn't write my idea well, what I was trying to say was that the indexer was writing the documents info in the file (nutch.csv) twice, so at the end I found just last 11 document in the file: org.apache.nutch.indexwriter.csv.CSVIndexWriter 2022-11-22 06:32:56,448 INFO o.a.n.i.c.CSVIndexWriter [pool-5-thread-1] Finished CSV index in csvindexwriter/nutch.csv ... org.apache.nutch.indexwriter.csv.CSVIndexWriter 2022-11-22 06:32:56,563 WARN o.a.n.i.c.CSVIndexWriter [pool-5-thread-1] Removing existing output path csvindexwriter/nutch.csv ... org.apache.nutch.indexwriter.csv.CSVIndexWriter 2022-11-22 06:32:56,650 INFO o.a.n.i.c.CSVIndexWriter [pool-5-thread-1] Finished CSV index in csvindexwriter/nutch.csv I don't know how to control the indexer to write all documents without being reloaded. It is writing the first 14 documents, stopping, reloading and starting with the last 11 again, I think I'm missing some configuration, but I haven't found it yet (I read https://cwiki.apache.org/confluence/display/NUTCH/IndexWriters#IndexWriters-CSVindexerproperties ) Best, El mié, 23 nov 2022 a las 9:00, Sebastian Nagel (<wastl.na...@googlemail.com>) escribió: > Hi Paul, > > as far I can see the indexer is run only once and now indexes 26 documents: > > org.apache.nutch.indexer.IndexingJob 2022-11-22 06:32:57,164 INFO > o.a.n.i.IndexingJob [main] Indexer: 26 indexed (add/update) > > The logs also indicate that both segments are indexed at once: > > org.apache.nutch.indexer.IndexerMapReduce 2022-11-22 06:32:51,811 INFO > o.a.n.i.IndexerMapReduce [main] IndexerMapReduces: adding segment: > > file:/home/paulesco/Downloads/apache-nutch-1.19/crawl/segments/20221122062645 > org.apache.nutch.indexer.IndexerMapReduce 2022-11-22 06:32:51,814 INFO > o.a.n.i.IndexerMapReduce [main] IndexerMapReduces: adding segment: > > file:/home/paulesco/Downloads/apache-nutch-1.19/crawl/segments/20221122062728 > > > Best, > Sebastian > > -- Paul Escobar Mossos skype: paulescom telefono: +57 1 3006815404