Thank you for the reply Mats. I agree the fact that files are wrapped in an .exe is ridiculous. We're talking about a $15B company that is doing this by the way, not a ma and pa shop. Anyways...
If I understand you correctly, you're saying I can: 1) Use Python to download the file from the web (but not by using a webscraper, according to Alan) 2) Simply ignore the .exe wrapper and use, maybe Windows Task Manager, to unzip the file and place the .txt file in the desired folder Am I understanding you correctly? Thank you -Ian On Mon, May 1, 2017 at 4:14 PM, Mats Wichmann <m...@wichmann.us> wrote: > On 05/01/2017 03:44 PM, Alan Gauld via Tutor wrote: > > On 01/05/17 18:20, Ian Monat wrote: > >> ... I've written a script using the requests module but I > >> think a web scraper like Scrapy, Beautiful Soup or Selinium may be > >> required. > > > > I'm not sure what you are looking for. Scrapy, BS etc will > > help you read the HTML but not to fetch the file. Also do > > you want to process the file (extract the text) in Python > > too, or is it enough to just fetch the file? > > > > If the problem is with reading the HTML then you need to > > give us more detail about the problem areas and HTML > > format. > > > > If the problem is fetching the file, it sounds like you > > have already done that and it should be a case of fine > > tuning/tidying up the code you've written. > > > > What kind of help exactly are you asking for? > > > > This is a completely non-Python, non-Tutor response to part of this: > > The self-extracting archive. Convenience, at a price: running > executables of unverified reliability is just a terrible idea. > > I know you said your disty won't change their website, but you should > tell them they should: a tremendous number of organizations have > policies that don't just allow pulling down and running an exe file from > a website. Even if that's not currently the case for you, you could say > that you're not allowed, and get someone in your management chain to > promise to support that if there's a question - should not be hard. It > may be wired into the distributor's content delivery system, but that's > a stupid choice on their part. > > "Then you have you run the .exe which produces a zipped file" > > Don't do this ("run"), unless there's a way you trust to be able to > verify the security of what is offered. Just about any payload could be > buried in the exe, especially if someone broke in to the distributor's > site. > > Possibly slightly pythonic: > > if it is really just a wrapper for a zipfile (i.e. the aforementioned > self-extracting archive), you should be able to open it in 7zip or > similar, and extract the zipfile, without ever "running" it. And if > that is the case, you should be able to script extracting the zipfile > from the .exe, and then extracting the text file from the zipfile, using > Python (or other scripting languages: that's not particularly > Python-specific). > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor