On Wednesday, 28 April 2021 06:52:30 BST Glyph wrote:
>
> > On Apr 27, 2021, at 8:58 PM, Wim Lewis <[email protected]> wrote:
> >
> > On Thursday, April 8, 2021 8:43:35 AM PDT, Barry Scott wrote:
> >> We just added a patch to our twisted to prevent twisted from doing idna
> >> validation.
> >> _idnaBytes and _idnaText not convert from bytes to unicode based on the
> >> type of
> >> the provided arg.
> >>
> >> We had to do this because there are domain names that youtube.com uses
> >> that are
> >> not valid under IDNA-2008
> >> https://tools.ietf.org/html/rfc5891#section-4.2.3.1
> >
> > My reading of the RFC is that the YouTube domain you mention
> > (r2---sn-aigzrn7e.googlevideo.com) is an invalid "U-Label", but that
> > doesn't mean it's an entirely invaid domain label. It just means you can't
> > legally run it through IDNA and turn it into "xn--r2---sn-aigzrn7e-". The
> > intent, as I understand it, is to forbid any possibility of double-encoding
> > or double-decoding a label, not to forbid the possibility of using labels
> > like the one you mention.
>
> I agree with this reading.
>
> >> I can see why a UI would need to do IDNA-2008 converts and validation
> >> but I'm not clear why its of value deep in the guts of twisted.
> >
> > My guess is that this is just an accident of the way that the
> > bytes/characters distinction and the IDNA features were added to Twisted,
> > and is probably a bug.
>
> +1.
>
> We also have other issues with the Python IDNA library:
> https://github.com/kjd/idna/issues/18 <https://github.com/kjd/idna/issues/18>
> and would generally like to reduce our strictness via whatever mechanisms we
> can, even for things that genuinely require it (which this does not).
>
> >> Why is this code needed at all in twisted?
> >> If its for a high level API then why isn't it being called at the
> >> edge of the high level API calls?
> >
> > I'd argue that resolving URLs is in fact a high level API (from the point
> > of view of the name resoution system) but even so, it seems to me that
> > Twisted is doing the wrong thing here. The format of that label should
> > prevent it from ever being transformed by IDNA, but shouldn't prevent it
> > from being passed through unchanged, since it doesn't contain any
> > codepoints outside of the usual ASCII range.
>
> Also agreed with all of this.
>
> >> The key idea here is that its human input that will be converted.
> >> But the code is used deep in the _sslverify.py where no human
> >> input is entered.
> >
> > _sslverify has to check whether the information in the server's certificate
> > matches the URL that the user supplied. Certificates can contain Unicode
> > text — at least in the (completely obsolete) CN-as-domain-name situation —
> > so _sslverify probably picked up the requirement for IDNA transformations
> > from that. (I don't remember whether dNSName SANs can contain unicode.)
>
> Yep.
>
> > What is the patch you decided to add to your version? Where in _sslverify
> > did the problem surface?
When _idaBytes was called to raise an exception in ClientTLSOptions.__init__.
> I am also very curious about this :).
Attached is the patch we are using. We are using 19.07 for sad reasons.
Barry
Only in tmp2: twisted-remove-idna-checks.patch
diff -r -u tmp1/twisted-twisted-19.7.0/src/twisted/internet/_idna.py tmp2/twisted-twisted-19.7.0/src/twisted/internet/_idna.py
--- tmp1/twisted-twisted-19.7.0/src/twisted/internet/_idna.py 2019-07-28 10:17:29.000000000 +0100
+++ tmp2/twisted-twisted-19.7.0/src/twisted/internet/_idna.py 2021-04-08 14:38:04.449987636 +0100
@@ -22,14 +22,10 @@
@return: The domain name's IDNA representation, encoded as bytes.
@rtype: L{bytes}
"""
- try:
- import idna
- except ImportError:
- return text.encode("idna")
+ if type(text) == bytes:
+ return text
else:
- return idna.encode(text)
-
-
+ return text.encode('ascii')
def _idnaText(octets):
"""
@@ -43,12 +39,10 @@
@return: A human-readable domain name.
@rtype: L{unicode}
"""
- try:
- import idna
- except ImportError:
- return octets.decode("idna")
+ if type(octets) == bytes:
+ return octets.decode('ascii')
else:
- return idna.decode(octets)
+ return octets
_______________________________________________
Twisted-Python mailing list
[email protected]
https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python