Hello Sebastian,

Thanks again.

Yes you are absolutely right, the indexer is running once, I didn't write
my idea well, what I was trying to say was that the indexer was writing the
documents info in the file (nutch.csv) twice, so at the end I found just
last 11 document in the file:

org.apache.nutch.indexwriter.csv.CSVIndexWriter 2022-11-22 06:32:56,448
INFO o.a.n.i.c.CSVIndexWriter [pool-5-thread-1] Finished CSV index in
csvindexwriter/nutch.csv
...
org.apache.nutch.indexwriter.csv.CSVIndexWriter 2022-11-22 06:32:56,563
WARN o.a.n.i.c.CSVIndexWriter [pool-5-thread-1] Removing existing output
path csvindexwriter/nutch.csv
...
org.apache.nutch.indexwriter.csv.CSVIndexWriter 2022-11-22 06:32:56,650
INFO o.a.n.i.c.CSVIndexWriter [pool-5-thread-1] Finished CSV index in
csvindexwriter/nutch.csv

I don't know how to control the indexer to write all documents without
being reloaded. It is writing the first 14 documents, stopping, reloading
and starting with the last 11 again, I think I'm missing some
configuration, but I haven't found it yet (I read
https://cwiki.apache.org/confluence/display/NUTCH/IndexWriters#IndexWriters-CSVindexerproperties
)

Best,


El mié, 23 nov 2022 a las 9:00, Sebastian Nagel (<wastl.na...@googlemail.com>)
escribió:

> Hi Paul,
>
> as far I can see the indexer is run only once and now indexes 26 documents:
>
> org.apache.nutch.indexer.IndexingJob 2022-11-22 06:32:57,164 INFO
> o.a.n.i.IndexingJob [main] Indexer:     26  indexed (add/update)
>
> The logs also indicate that both segments are indexed at once:
>
> org.apache.nutch.indexer.IndexerMapReduce 2022-11-22 06:32:51,811 INFO
> o.a.n.i.IndexerMapReduce [main] IndexerMapReduces: adding segment:
>
> file:/home/paulesco/Downloads/apache-nutch-1.19/crawl/segments/20221122062645
> org.apache.nutch.indexer.IndexerMapReduce 2022-11-22 06:32:51,814 INFO
> o.a.n.i.IndexerMapReduce [main] IndexerMapReduces: adding segment:
>
> file:/home/paulesco/Downloads/apache-nutch-1.19/crawl/segments/20221122062728
>
>
> Best,
> Sebastian
>
>

-- 
Paul Escobar Mossos
skype: paulescom
telefono: +57 1 3006815404

Reply via email to