I'm editing a simple scraper that crawls a Youtube video's comment page. The crawler uses Ajax to page through comments on the page (infinite scroll) and then saves them to a json file. Even with small number of comments (< 5), it still takes 3+ min for the comments to be added to the json file.
I've tried including requests-cache and using ujson instead of json to see if there are any benefits but there's no noticeable difference. You can view the code here: http://stackoverflow.com/questions/34575586/how-to-speed-up-ajax-requests-python-youtube-scraper I'm new to Python so I'm not sure where the bottlenecks are. The finished script will be used to parse through 100,000+ comments so performance is a large factor. -Would using multithreading solve the issue? And if so how would I refactor this to benefit from it? -Or is this strictly a network issue? Thanks! -- https://mail.python.org/mailman/listinfo/python-list