Ajax Request + Write to Json Extremely Slow (Webpage Crawler)

jonafleuraime Sun, 03 Jan 2016 03:07:40 -0800

I'm editing a simple scraper that crawls a Youtube video's comment page. The 
crawler uses Ajax to page through comments on the page (infinite scroll) and 
then saves them to a json file. Even with small number of comments (< 5), it 
still takes 3+ min for the comments to be added to the json file.


I've tried including requests-cache and using ujson instead of json to see if 
there are any benefits but there's no noticeable difference.

You can view the code here: 
http://stackoverflow.com/questions/34575586/how-to-speed-up-ajax-requests-python-youtube-scraper

I'm new to Python so I'm not sure where the bottlenecks are. The finished 
script will be used to parse through 100,000+ comments so performance is a 
large factor.

-Would using multithreading solve the issue? And if so how would I refactor 
this to benefit from it?
-Or is this strictly a network issue?

Thanks!
-- 
https://mail.python.org/mailman/listinfo/python-list

Ajax Request + Write to Json Extremely Slow (Webpage Crawler)

Reply via email to