Hi Kamil,
> 1. json-indexer: indexes documents in json lines format
Sounds good. There's already an indexer-csv (works only in local mode).
> 2. selenium extracts the html tag vs the body tag
Definitely makes sense.
> I am hesitant about this change because it could have bigger effects.
In d
Hello,
I have a few improvements to Nutch that I would like to get feedback on
whether this community thinks I should submit them to the main branch. Once
I get my first PR approved I can start to add these. Some of these might
not be good ideas as well so happy to hear that feedback.
1. json-ind
2 matches
Mail list logo