Hey WeeWX'ers!!!  =D

I have a fix in the hopper.

There's nothing that can be done for the occasional HTTP 404, or even 503's I 
am now seeing, but the HTTP 403 was due to a change on WU's part where they are 
rejecting certain HTTP User-Agent strings.  The fact that they are putting 
Akamai in the middle is almost certainly a great thing re: their scalability 
issues; however, they probably inherited some default settings that filter 
"bots" and malware and such, which is likely why the HTTP User-Agent now 
matters.

I have set the User-Agent to "CURL" and it works.
I have set it to "Mozilla" and it works.  I'm going with that one, since it 
means Mosaic Killer, both of which were among the the very first User-Agents I 
ever worked with, circa 1993 back before there was such as thing as Netscape.  
=D

/ye-olde-farte mode off  ;-)

My testing has so far been under Python3, but coincidentally (and not a 
causation), WU started throwing HTTP 503's around the time that I tried 
validating my code also under Python2.

Everything is working against today's date.
It's when I go after yesterday's date that I get the HTTP server error 503.

I expect the 404's and 503's to go away eventually, but at least for now I have 
a fix for the 403 (forbidden)'s, just based on the User-Agent string.

I'll submit a change for wunderfixer both to the 3.9.x "master" and 4.0.x 
"development" branches in a moment and reply back with direct links for anyone 
who wants a fix sooner.  =D

Isn't this fun?  =D

Regards,
\Leon
--
Leon Shaner :: Dearborn, Michigan (iPad Pro)

> On May 22, 2019, at 4:20 PM, Leon Shaner <l...@isylum.org> wrote:
> 
> I'm still working on this.
> CURL is telling me they are not only using https, but also TLSv1.2.
> Here is a transcript, in case one of y'all beats me to the fix.  =D
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "weewx-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to weewx-user+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org.
> For more options, visit https://groups.google.com/d/optout.
> <wu.txt>
> 
> 
> 
> Working from here:
> https://docs.python.org/2/library/ssl.html
> 
> So far I have tried this, to no avail.
> Really just doing the "import ssl" and using https in the URL, and adding 
> context=ssl_context to the urllib.request.
> 
> A snippet of that looks as follows, but still getting 403 forbidden.  :-(
> 
> # For new WU interface which uses SSL and TLSv1.2
> import ssl
> 
> ...
> 
>         _url = 
> "https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=%s"; \
>                "&month=%d&day=%d&year=%d&format=1" % (self.station, 
> dayRequested_tt[1],
>                                                       dayRequested_tt[2], 
> dayRequested_tt[0])
> 
>         # specify TLSv1.2 and SSLv2, but not SSLv3
>         ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
>         ssl_context.options |= ssl.PROTOCOL_SSLv23
>         ssl_context.options |= ssl.OP_NO_SSLv3
> 
>         try :
>             # Hit the weather underground site:
>             _wudata = urllib.request.urlopen(_url, context=ssl_context)
> 
> 
> 
> Regards,
> \Leon
> --
> Leon Shaner :: Dearborn, Michigan (iPad Pro)
> 
>> On May 22, 2019, at 2:42 PM, Leon Shaner <l...@isylum.org> wrote:
>> 
>> Jarom,
>> 
>> CURL is pretty sophisticated in its ability to emulate browser state in 
>> pretty much any way but JavaScript.  When it worked this morning, I saw some 
>> cookies were involved.
>> It may well be that the python way isn't handling that part.
>> I don't know enough about how python fetches pages to work that out, but I 
>> am very familiar with CURL, so if I can find a path that works consistently, 
>> then I'll go back to the python to see about how to implement same.
>> 
>> I was getting 404's in the browser even, when I looked at it earlier.
>> 
>> I'll keep working on it, but not too hard, so as to not get on their radar 
>> in any unwanted sort of way.  ;-)
>> 
>> Regards,
>> \Leon
>> --
>> Leon Shaner :: Dearborn, Michigan (iPad Pro)
>> 
>>> On May 22, 2019, at 2:04 PM, Jarom Hatch <jsha...@gmail.com> wrote:
>>> 
>>> Interesting, using curl sometimes I can it fine, but wunderfixer is always 
>>> getting a 403 Forbidden, as if it is actively being blocked...  When it 
>>> doesn't work in curl I get `HTTP/1.1 404 Not Found` and when it does work I 
>>> get `HTTP/1.1 200 OK`.  Curl never gets a 403 error.
>>> 
>>>> On Wednesday, May 22, 2019 at 11:48:08 AM UTC-6, Jarom Hatch wrote:
>>>> I was able to get it to work twice in my web browser, but as you said, it 
>>>> is sporadic.  I don't ever recall them using Akamai before so that may 
>>>> very well be a contributing factor.
>>>> 
>>>> I wonder if we can find out the origin address and see what happens if we 
>>>> can bypass Akamai...
>>>> 
>>>>> On Wednesday, May 22, 2019 at 7:35:18 AM UTC-6, Leon Shaner wrote:
>>>>> For one thing, the URL of this form:
>>>>> 
>>>>> http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1
>>>>> 
>>>>> Is now redirecting to one using HTTPS:
>>>>> 
>>>>> https://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=SOMESTATION&month=5&day=22&year=2019&format=1
>>>>> 
>>>>> Also, the redirect itself takes an excruciatingly long time.
>>>>> So I just changed the URL to https directly...
>>>>> 
>>>>> The first time I tried any of the above using CURL this morning it 
>>>>> worked, but then after that I started getting:
>>>>> 
>>>>> An error occurred while processing your request.
>>>>> Reference #30.6f451160.1558531514.16ced4f6
>>>>> 
>>>>> It looks as if they've put some kind of Akamai proxy in the middle, which 
>>>>> is fine for static content, but not so fine for a query of this nature.  
>>>>> Strange that it worked for me the very first time.  It's almost as if the 
>>>>> Akamai "farm" has lost some "state" information and not all nodes have 
>>>>> the same content, so if you get stuck going through a bad node you get a 
>>>>> bogus response.
>>>>> 
>>>>> Attached is a transcript of a failed attempt.  I put SOMESTATION there 
>>>>> only after the fact.  The actual query was for my actual station, which 
>>>>> used to work.
>>>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "weewx-user" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to weewx-user+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/weewx-user/07ac6f86-ae4d-4854-8398-ce4ab8d846c1%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "weewx-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to weewx-user+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/weewx-user/DA01E425-B99A-4959-8FB2-B564A61B3E77%40isylum.org.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"weewx-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to weewx-user+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/weewx-user/FA3780B4-F4CB-4897-9CA5-87557D62DAF7%40isylum.org.
For more options, visit https://groups.google.com/d/optout.

Reply via email to