Hey WeeWX'ers!!! =D I have a fix in the hopper.
There's nothing that can be done for the occasional HTTP 404, or even 503's I am now seeing, but the HTTP 403 was due to a change on WU's part where they are rejecting certain HTTP User-Agent strings. The fact that they are putting Akamai in the middle is almost certainly a great thing re: their scalability issues; however, they probably inherited some default settings that filter "bots" and malware and such, which is likely why the HTTP User-Agent now matters. I have set the User-Agent to "CURL" and it works. I have set it to "Mozilla" and it works. I'm going with that one, since it means Mosaic Killer, both of which were among the the very first User-Agents I ever worked with, circa 1993 back before there was such as thing as Netscape. =D /ye-olde-farte mode off ;-) My testing has so far been under Python3, but coincidentally (and not a causation), WU started throwing HTTP 503's around the time that I tried validating my code also under Python2. Everything is working against today's date. It's when I go after yesterday's date that I get the HTTP server error 503. I expect the 404's and 503's to go away eventually, but at least for now I have a fix for the 403 (forbidden)'s, just based on the User-Agent string. I'll submit a change for wunderfixer both to the 3.9.x "master" and 4.0.x "development" branches in a moment and reply back with direct links for anyone who wants a fix sooner. =D Isn't this fun? =D Regards, \Leon -- Leon Shaner :: Dearborn, Michigan (iPad Pro) > On May 22, 2019, at 4:20 PM, Leon Shaner <l...@isylum.org> wrote: > > I'm still working on this. > CURL is telling me they are not only using https, but also TLSv1.2. > Here is a transcript, in case one of y'all beats me to the fix. =D > > -- > You received this message because you are subscribed to the Google Groups > "weewx-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to weewx-user+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org. > For more options, visit https://groups.google.com/d/optout. > <wu.txt> > > > > Working from here: > https://docs.python.org/2/library/ssl.html > > So far I have tried this, to no avail. > Really just doing the "import ssl" and using https in the URL, and adding > context=ssl_context to the urllib.request. > > A snippet of that looks as follows, but still getting 403 forbidden. :-( > > # For new WU interface which uses SSL and TLSv1.2 > import ssl > > ... > > _url = > "https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=%s" \ > "&month=%d&day=%d&year=%d&format=1" % (self.station, > dayRequested_tt[1], > dayRequested_tt[2], > dayRequested_tt[0]) > > # specify TLSv1.2 and SSLv2, but not SSLv3 > ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2) > ssl_context.options |= ssl.PROTOCOL_SSLv23 > ssl_context.options |= ssl.OP_NO_SSLv3 > > try : > # Hit the weather underground site: > _wudata = urllib.request.urlopen(_url, context=ssl_context) > > > > Regards, > \Leon > -- > Leon Shaner :: Dearborn, Michigan (iPad Pro) > >> On May 22, 2019, at 2:42 PM, Leon Shaner <l...@isylum.org> wrote: >> >> Jarom, >> >> CURL is pretty sophisticated in its ability to emulate browser state in >> pretty much any way but JavaScript. When it worked this morning, I saw some >> cookies were involved. >> It may well be that the python way isn't handling that part. >> I don't know enough about how python fetches pages to work that out, but I >> am very familiar with CURL, so if I can find a path that works consistently, >> then I'll go back to the python to see about how to implement same. >> >> I was getting 404's in the browser even, when I looked at it earlier. >> >> I'll keep working on it, but not too hard, so as to not get on their radar >> in any unwanted sort of way. ;-) >> >> Regards, >> \Leon >> -- >> Leon Shaner :: Dearborn, Michigan (iPad Pro) >> >>> On May 22, 2019, at 2:04 PM, Jarom Hatch <jsha...@gmail.com> wrote: >>> >>> Interesting, using curl sometimes I can it fine, but wunderfixer is always >>> getting a 403 Forbidden, as if it is actively being blocked... When it >>> doesn't work in curl I get `HTTP/1.1 404 Not Found` and when it does work I >>> get `HTTP/1.1 200 OK`. Curl never gets a 403 error. >>> >>>> On Wednesday, May 22, 2019 at 11:48:08 AM UTC-6, Jarom Hatch wrote: >>>> I was able to get it to work twice in my web browser, but as you said, it >>>> is sporadic. I don't ever recall them using Akamai before so that may >>>> very well be a contributing factor. >>>> >>>> I wonder if we can find out the origin address and see what happens if we >>>> can bypass Akamai... >>>> >>>>> On Wednesday, May 22, 2019 at 7:35:18 AM UTC-6, Leon Shaner wrote: >>>>> For one thing, the URL of this form: >>>>> >>>>> http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1 >>>>> >>>>> Is now redirecting to one using HTTPS: >>>>> >>>>> https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1 >>>>> >>>>> Also, the redirect itself takes an excruciatingly long time. >>>>> So I just changed the URL to https directly... >>>>> >>>>> The first time I tried any of the above using CURL this morning it >>>>> worked, but then after that I started getting: >>>>> >>>>> An error occurred while processing your request. >>>>> Reference #30.6f451160.1558531514.16ced4f6 >>>>> >>>>> It looks as if they've put some kind of Akamai proxy in the middle, which >>>>> is fine for static content, but not so fine for a query of this nature. >>>>> Strange that it worked for me the very first time. It's almost as if the >>>>> Akamai "farm" has lost some "state" information and not all nodes have >>>>> the same content, so if you get stuck going through a bad node you get a >>>>> bogus response. >>>>> >>>>> Attached is a transcript of a failed attempt. I put SOMESTATION there >>>>> only after the fact. The actual query was for my actual station, which >>>>> used to work. >>>>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "weewx-user" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to weewx-user+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/weewx-user/07ac6f86-ae4d-4854-8398-ce4ab8d846c1%40googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "weewx-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to weewx-user+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "weewx-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to weewx-user+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/weewx-user/FA3780B4-F4CB-4897-9CA5-87557D62DAF7%40isylum.org. For more options, visit https://groups.google.com/d/optout.