Re: Inquiries on potential improvements

2023-01-21 Thread Sebastian Nagel
Hi Kamil, > 1. json-indexer: indexes documents in json lines format Sounds good. There's already an indexer-csv (works only in local mode). > 2. selenium extracts the html tag vs the body tag Definitely makes sense. > I am hesitant about this change because it could have bigger effects. In d

Inquiries on potential improvements

2023-01-20 Thread Kamil Mroczek
Hello, I have a few improvements to Nutch that I would like to get feedback on whether this community thinks I should submit them to the main branch. Once I get my first PR approved I can start to add these. Some of these might not be good ideas as well so happy to hear that feedback. 1. json-ind