Done, thank Markus.

El vie, 25 nov 2022 a las 8:04, Markus Jelsma (<markus.jel...@openindex.io>)
escribió:

> Hi Paul, the account has been created. You should receive an email from
> Jira in your inbox or spam box.
>
> Thanks,
> Markus
>
> Op vr 25 nov. 2022 om 14:01 schreef Paul Escobar <
> paul.escobar.mos...@gmail.com>:
>
> > Hello Markus,
> >
> > I'm very comfortable with your proposal, open source projects must take
> > advantage of any little contribution no matter the way.
> >
> > Best,
> >
> > El vie, 25 nov 2022 a las 7:21, Markus Jelsma (<
> markus.jel...@openindex.io
> > >)
> > escribió:
> >
> > > Hello Paul,
> > >
> > > > I tried to comment on this jira issue, but I don't have access,
> > > unfortunately I don't know how to do it.
> > >
> > > Due to too much spam, it is no longer possible to create an account for
> > > yourself, but we can do that for you if you wish
> > >
> > > Regards,
> > > Markus
> > >
> > > Op do 24 nov. 2022 om 22:46 schreef Paul Escobar <
> > > paul.escobar.mos...@gmail.com>:
> > >
> > > > Hello Sebastian,
> > > >
> > > > I got it, csv indexer needs one task to run properly, I tested it and
> > it
> > > > worked. Thank you for the advice.
> > > >
> > > > I tried to comment on this jira issue, but I don't have access,
> > > > unfortunately I don't know how to do it.
> > > >
> > > > I think if a commiter changed CSVIndexerWriter.java:
> > > >
> > > > if (fs.exists(csvLocalOutFile)) {
> > > >    // clean-up
> > > >    LOG.warn("Removing existing output path {}", csvLocalOutFile);
> > > >    fs.delete(csvLocalOutFile, true);
> > > > }
> > > >
> > > > Trying to append data instead of delete and create the file, the
> issue
> > > > would be fixed in local mode, at least.
> > > >
> > > > Thanks again,
> > > >
> > > >
> > > > El jue, 24 nov 2022 a las 7:38, Sebastian Nagel (<
> > > > wastl.na...@googlemail.com>)
> > > > escribió:
> > > >
> > > > > Hi Paul,
> > > > >
> > > > >  > the indexer was writing the
> > > > >  > documents info in the file (nutch.csv) twice,
> > > > >
> > > > > Yes, I see. And now I know what I've overseen:
> > > > >
> > > > >   .../bin/nutch index -Dmapreduce.job.reduces=2
> > > > >
> > > > > You need to run the CSV indexer with only a single reducer.
> > > > > In order to do so, please pass the option
> > > > >    --num-tasks 1
> > > > > to the script bin/crawl.
> > > > >
> > > > > Alternatively, you could change
> > > > >    NUM_TASKS=2
> > > > > in bin/crawl to
> > > > >    NUM_TASKS=1
> > > > >
> > > > > This is related to why at now you can't run the CSV indexer
> > > > > in (pseudo)distributed mode, see my previous note:
> > > > >
> > > > >  > A final note: the CSV indexer only works in local mode, it does
> > not
> > > > yet
> > > > >  > work in distributed mode (on a real Hadoop cluster). It was
> > > initially
> > > > >  > thought for debugging, not for larger production set up.
> > > > >
> > > > > The issue is described here:
> > > > >    https://issues.apache.org/jira/browse/NUTCH-2793
> > > > >
> > > > > It's a though one because a solution requires a change of the
> > > IndexWriter
> > > > > interface. Index writers are plugins and do not know from which
> > reducer
> > > > > task they are run and to which path on a distributed or
> parallelized
> > > > system
> > > > > they have to write. On Hadoop the writing the output is done in two
> > > > steps:
> > > > > write to a local file and then "commit" the output to the final
> > > location
> > > > > on the
> > > > > distributed file system.
> > > > >
> > > > > But yes, should have a look again at this issue which is stalled
> > since
> > > > > quite
> > > > > some time. Also because, it's now clear that you might run into
> > issues
> > > > even
> > > > > in local mode.
> > > > >
> > > > > Thanks for reporting the issue! If you can, please also comment on
> > the
> > > > > Jira issue!
> > > > >
> > > > > Best,
> > > > > Sebastian
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Paul Escobar Mossos
> > > > skype: paulescom
> > > > telefono: +57 1 3006815404
> > > >
> > >
> >
> >
> > --
> > Paul Escobar Mossos
> > skype: paulescom
> > telefono: +57 1 3006815404
> >
>


-- 
Paul Escobar Mossos
skype: paulescom
telefono: +57 1 3006815404

Reply via email to