[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-11 Thread Steve Barnes
-Original Message- From: Steven D'Aprano Sent: 11 May 2020 06:02 To: [email protected] Subject: [Python-ideas] Re: Improve handling of Unicode quotes and hyphens On Mon, May 11, 2020 at 04:28:38AM +, Steve Barnes wrote: > So we currently have a situation where not only does

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-11 Thread David Mertz
A third-party module on PyPI for "fix-the-horrible-things-Outlook-does" could be useful. There is no way the standard library can or should keep up with the newest mangling techniques mail handlers employ in this week's version. I don't understand what you mean by the current interpreter not tell

[Python-ideas] Sanitize filename (path part) 2nd try

2020-05-11 Thread Steve Jorgensen
Based on responses to my previous proposal, I am convinced that it was over-ambitious and not appropriate for inclusion in the Python standard library, so starting over with a more narrowly scoped suggestion. Proposal: Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-11 Thread Steve Barnes
From: David Mertz Sent: 11 May 2020 08:34 To: Steve Barnes Cc: [email protected] Subject: Re: [Python-ideas] Re: Improve handling of Unicode quotes and hyphens A third-party module on PyPI for "fix-the-horrible-things-Outlook-does" could be useful. There is no way the standard library

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-11 Thread Chris Angelico
On Mon, May 11, 2020 at 6:09 PM Steve Barnes wrote: > > Actually, in the case of the “wrong quotes” it puts the pointer under the > character before the space character or at the end of the line (if you have a > fixed spacing font – worse if you don’t) – it still doesn’t tell you which > charac

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Steve Jorgensen
Steve Jorgensen wrote: > When escape is supplied (typically "%") it is used as the escape character > in the same way that "%" is used in URL encoding. When a non-ASCII character > is escaped, > it is represented as a sequence of encoded bytes/octets. I neglected to say that the octet sequence

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-11 Thread Serhiy Storchaka
10.05.20 10:09, Steve Barnes пише: 4. Start accepting hyphens as minus & Unicode quotation marks – this would be the ideal answer for pasted code but has a lot of possible things to iron out such as do we require that the quotes match and are in the typographically correct order. It

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-11 Thread Serhiy Storchaka
11.05.20 03:34, Steven D'Aprano пише: There are a couple of professionally published Python books written using Restructed Text, Sphinx and Python. So people do have a choice, or at least a technical choice. There was similar issue with Sphinx. It uses a third-party tools to "improve" the HTML

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Steve Jorgensen
Steve Jorgensen wrote: > Based on responses to my previous proposal, I am convinced that it was > over-ambitious > and not appropriate for inclusion in the Python standard library, so starting > over with a > more narrowly scoped suggestion. > Proposal: > Add a new function (possibly os.path.sani

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Andrew Barnert via Python-ideas
On May 11, 2020, at 00:40, Steve Jorgensen wrote: > > Proposal: > > Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for > use as a single component of a path. In the default case, the value must also > not be a reference to the current or parent directory ("." or "..")

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-11 Thread MRAB
On 2020-05-11 09:21, Chris Angelico wrote: On Mon, May 11, 2020 at 6:09 PM Steve Barnes wrote: Actually, in the case of the “wrong quotes” it puts the pointer under the character before the space character or at the end of the line (if you have a fixed spacing font – worse if you don’t) – it

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-11 Thread Andrew Barnert via Python-ideas
On May 10, 2020, at 21:51, Christopher Barker wrote: > >  > On Sun, May 10, 2020 at 9:36 PM Andrew Barnert wrote: > >> However, there is one potential problem with the property I hadn’t thought >> of until just now: I think people will understand that mylist.view[2:] is >> not mutable, but w

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-11 Thread Alex Hall
On Mon, May 11, 2020 at 12:50 AM Christopher Barker wrote: > I'm still confused what you mean by extend to all iterators? you mean that > you could use slice syntax with anything iterable> > > And where does this fit in to the iterable vs iterator continuum? > > iterables will return an iterator

[Python-ideas] Re: [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-11 Thread Andrew Barnert via Python-ideas
On May 10, 2020, at 22:36, Stephen J. Turnbull wrote: > > Andrew Barnert via Python-ideas writes: > >> A lot of people get this confused. I think the problem is that we >> don’t have a word for “iterable that’s not an iterator”, > > I think part of the problem is that people rarely see explic

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-11 Thread Andrew Barnert via Python-ideas
On May 11, 2020, at 10:57, Alex Hall wrote: > >  >> On Mon, May 11, 2020 at 12:50 AM Christopher Barker >> wrote: > > >> Though it is heading in a different direction that where Andrew was >> proposing, that this would be about making and using views on sequences, >> which really wouldn't

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Wes Turner
What does sanitizepart do with newlines \n \r \r\n in filenames? Are these control characters? What does sanitizepart do with a leading slash? assert os.path.join("a", "/b") == "/b" A new safejoin() or joinsafe() or join(safe='True') could call sanitizepart() such that: assert joinsafe("a\n", "

[Python-ideas] Re: Sanitize filename (path part)

2020-05-11 Thread Barry Scott
> On 10 May 2020, at 01:34, Steve Jorgensen wrote: > > I believe the Python standard library should include a means of sanitizing a > filesystem entry, and this should not be something requiring a 3rd party > package. snip I found that I needed to have code that could tell me if a filename

[Python-ideas] Re: Sanitize filename (path part)

2020-05-11 Thread Wes Turner
FWIW, here are some of the CWE codes for related vulnerabilities/weaknesses in implementations: CWE-73: External Control of File Name or Path https://cwe.mitre.org/data/definitions/73.html CWE-707: Improper Neutralization https://cwe.mitre.org/data/definitions/707.html CWE-22: Improper Limitatio

[Python-ideas] Re: Sanitize filename (path part)

2020-05-11 Thread Wes Turner
(Is it almost always better to just use a hash of the provided filename (maybe in a p/a/ir/tree234 implementation to avoid the max files in a directory limit of whichever filesystem) instead of the user-supplied filename string?) On Mon, May 11, 2020 at 4:48 PM Wes Turner wrote: > FWIW, here are

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Steve Jorgensen
Andrew Barnert wrote: > On May 11, 2020, at 00:40, Steve Jorgensen [email protected] wrote: > > Proposal: > > Add a new function (possibly os.path.sanitizepart) to sanitize a value for > > use as a single component of a path. In the default case, the value must > > also not be a > > reference to

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Barry Scott
> On 11 May 2020, at 18:09, Andrew Barnert via Python-ideas > wrote: > > More generally, what’s the use case for %-encoding filenames like this? Are > people expecting it to interact transparently with URLs, so if I save a file > “spam\0eggs” in a Python script and then try to browse to file

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Steve Jorgensen
Andrew Barnert wrote: > On May 11, 2020, at 00:40, Steve Jorgensen [email protected] wrote: > > Proposal: > > Add a new function (possibly os.path.sanitizepart) to sanitize a value for > > use as a single component of a path. In the default case, the value must > > also not be a > > reference to

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Andrew Barnert via Python-ideas
On May 11, 2020, at 12:59, Barry Scott wrote: > > >> On 11 May 2020, at 18:09, Andrew Barnert via Python-ideas >> wrote: >> >> More generally, what’s the use case for %-encoding filenames like this? Are >> people expecting it to interact transparently with URLs, so if I save a file >> “spam

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Andrew Barnert via Python-ideas
On May 11, 2020, at 12:54, Wes Turner wrote: > >  > What does sanitizepart do with newlines \n \r \r\n in filenames? Are these > control characters? >>> unicodedata.category('\n') Cc ___ Python-ideas mailing list -- [email protected] T

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Andrew Barnert via Python-ideas
> On May 11, 2020, at 14:18, Steve Jorgensen wrote: > > Andrew Barnert wrote: >>> On May 11, 2020, at 00:40, Steve Jorgensen [email protected] wrote: >>> Proposal: >>> Add a new function (possibly os.path.sanitizepart) to sanitize a value for >>> use as a single component of a path. In the defa

[Python-ideas] Re: Sanitize filename (path part)

2020-05-11 Thread Andrew Barnert via Python-ideas
On May 11, 2020, at 13:31, Barry Scott wrote: > > macOS and Unix version (I only use Unicode input so avoid the random bytes > problems): But that doesn’t avoid the problem. If someone gives you a character whose encoding on the target filesystem includes a null or pathsep byte, your sanitize

[Python-ideas] Re: Sanitize filename (path part) 2nd try

2020-05-11 Thread Oleg Broytman
On Mon, May 11, 2020 at 09:12:52PM -, Steve Jorgensen wrote: > When the platform is Windows, certainly, ":" should not be allowed, > and perhaps colon should not be allowed at all. https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file Forbidden characters: chr(0) < > : "

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-11 Thread Christopher Barker
On Mon, May 11, 2020 at 11:38 AM Andrew Barnert wrote: > On May 11, 2020, at 10:57, Alex Hall wrote: > >  > On Mon, May 11, 2020 at 12:50 AM Christopher Barker > wrote: > > >> Though it is heading in a different direction that where Andrew was >> proposing, that this would be about making and