[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Steve Jorgensen
Stephen J. Turnbull wrote: > Steve Jorgensen writes: > > I'm thinking of this specifically in terms of > > sanitizing input, > > assuming that later usage of the value might or might not properly > > protect against potential vulnerabilities. This is also limited to > > the case where the value is

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Ricky Teachey
On Mon, May 11, 2020 at 1:52 AM Ricky Teachey wrote: > I have nothing particularly useful to add, only that this is potentially a > really fantastic idea with a lot of promise, IMO. > > It would be nice to have some view objects with a lot of > functionality that can be sliced not only for effici

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Ricky Teachey
I have nothing particularly useful to add, only that this is potentially a really fantastic idea with a lot of promise, IMO. It would be nice to have some view objects with a lot of functionality that can be sliced not only for efficiency, but for other purposes. One might be (note that below I am

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Stephen J. Turnbull
Steve Jorgensen writes: > I'm thinking of this specifically in terms of sanitizing input, > assuming that later usage of the value might or might not properly > protect against potential vulnerabilities. This is also limited to > the case where the value is supposed to be a single path referri

[Python-ideas] [Suspected Spam]Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Stephen J. Turnbull
Andrew Barnert via Python-ideas writes: > > Which is why it's not wrong to say that a range object is an > > iterator, but is IS wrong to say that it's Just and iterator ... > > No, they’re not iterators. You’ve got it backward—every iterator is > an iterable, but most iterables are not iter

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steve Barnes
My personal experience of the most common problematic substitutions by tools such as Outlook, Word & some web tools: 1. Double Quotes \u201c & \u201d “” 2. Single Quotes \u2018 & \u2019 ‘’ 3. The m-hyphen \2013 – 4. Copyright © \xa9 and others, Registered ® \xae and trademark ™ \u2122

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steven D'Aprano
On Mon, May 11, 2020 at 04:28:38AM +, Steve Barnes wrote: > So we currently have a situation where not only does whether code > works or not depends on who typed it, in what environment, with what > settings but also on the same factors for who received it You say "currently", but that has

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Christopher Barker
On Sun, May 10, 2020 at 9:36 PM Andrew Barnert wrote: Here is where I think you (Andrew) and I (Chris B.) differ in our goals. My >> goal here is to have an easily accessible way to use the slice syntax to >> get an iterable that does not make a copy. >> > > It’s just a small difference in emp

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steve Barnes
Andrew, I already have a module that I include in a couple of widely used utilities at work that I called cmd_line_fixup (but I think that I like defancier better) it is used on the command line options prior to processing to fix several of these issues. This was written in response to the fr

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Andrew Barnert via Python-ideas
On May 10, 2020, at 15:39, Christopher Barker wrote: > >  >> On Sun, May 10, 2020 at 12:48 PM Andrew Barnert wrote: > >> Is there any way you can fix the reply quoting on your mail client, or >> manually work around it? > > I'm trying -- sorry I've missed a few. It seems more and more "moder

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Chris Angelico
On Mon, May 11, 2020 at 2:29 PM Steve Barnes wrote: > > > > So I thought that Steve made the opposite mistake, accidentally using regular > ASCII quotes when he intended to use Unicode quotes. But it turns out that > Steve's mail client sends emails with a HTML part and a plain text part, and >

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steve Barnes
So I thought that Steve made the opposite mistake, accidentally using regular ASCII quotes when he intended to use Unicode quotes. But it turns out that Steve's mail client sends emails with a HTML part and a plain text part, and the plain text part substitutes the ASCII quotes for smart quot

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steven D'Aprano
On Sun, May 10, 2020 at 02:13:37PM -0400, Richard Damon wrote: > A lot of this reminds me of a story told by a programming instructor in > the 70's, he submitted a FORTRAN program deck to the machine, the > complier gave him a warning on a statement which read INTEGER > misspelled, it than ran the

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread David Mertz
On Sun, May 10, 2020 at 9:05 PM Steven D'Aprano wrote: > On Sun, May 10, 2020 at 01:17:17PM -0700, Andrew Barnert via Python-ideas > wrote: > > (By the way, the reason I used -f rather than —fix is that I can’t > > figure out how to get the iPhone Mail.app to not replace double > > hyphens with a

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steven D'Aprano
On Sun, May 10, 2020 at 01:17:17PM -0700, Andrew Barnert via Python-ideas wrote: > (By the way, the reason I used -f rather than —fix is that I can’t > figure out how to get the iPhone Mail.app to not replace double > hyphens with an em-dash, or even how to fix it when it does. All of > the oth

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread David Mertz
I think it is a bad idea to change the design of Python because some college professor teaching intro to programming very poorly might require inappropriate tooling. On Sun, May 10, 2020, 8:39 PM Jonathan Goble wrote: > On Sun, May 10, 2020 at 8:08 PM David Mertz wrote: > >> Why would you use

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steven D'Aprano
On Sun, May 10, 2020 at 12:57:11PM -0700, Andrew Barnert via Python-ideas wrote: > Can the error message actually include the Unicode character itself? I can think of cases where that could be confusing, or mess up the display of the error message. E.g. if it were an invisible character, or swa

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steven D'Aprano
On Sun, May 10, 2020 at 07:39:32PM -0400, Jonathan Goble wrote: > Sometimes people are forced to use Word to type code. One example is > creating user manuals. MS Word is not the only word processor capable of creating user manuals. The LibreOffice people, and others, would like a word :-) If y

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Jonathan Goble
On Sun, May 10, 2020 at 8:08 PM David Mertz wrote: > Why would you use Word? LaTeX exists, and will export to PDF. Heck, the > PDF export from Jupyter is quite good (by way of LaTeX). > > On Sun, May 10, 2020, 7:40 PM Jonathan Goble < > Several of the questions required us to write Bash scripts o

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread David Mertz
Why would you use Word? LaTeX exists, and will export to PDF. Heck, the PDF export from Jupyter is quite good (by way of LaTeX). On Sun, May 10, 2020, 7:40 PM Jonathan Goble < Several of the questions required us to write Bash scripts or Python functions, and we were required to write all of that

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steven D'Aprano
By the way, did anyone else notice the irony that's Steve's examples of invalid code is actually perfectly valid? Copying and pasting into the interpreter shows that they are valid strings. On Sun, May 10, 2020 at 07:09:15AM +, Steve Barnes wrote: > But the following all result in an error

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Jonathan Goble
On Sun, May 10, 2020 at 4:00 AM Steven D'Aprano wrote: > On Sun, May 10, 2020 at 07:09:15AM +, Steve Barnes wrote about > Unicode dashes and quotes sneaking into code: > > > 2. Tell all users that they need to use a "proper" editor or IDE - > > This seems like adding an additional barrie

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread André Roberge
On Sun, May 10, 2020 at 7:34 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote: > On May 10, 2020, at 14:33, Christopher Barker wrote: > > > > Having a "tabnanny-like" function / module in the stdlib would be nice, > though I'd think a stand alone module in PyPi would be almost

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Christopher Barker
On Sat, May 9, 2020 at 1:58 PM Alex Hall wrote: > > https://github.com/PythonCHB/islice-pep/blob/master/pep-xxx-islice.rst >> >> And the prototype implementation: >> >> https://github.com/PythonCHB/islice-pep/blob/master/islice.py >> > > I think this is a good idea. For sequences I'm not sure ho

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Christopher Barker
On Sun, May 10, 2020 at 12:48 PM Andrew Barnert wrote: > Is there any way you can fix the reply quoting on your mail client, or > manually work around it? > I'm trying -- sorry I've missed a few. It seems more and more "modern" email clients make "interspersed" posting really hard. But I hate bo

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Andrew Barnert via Python-ideas
On May 10, 2020, at 14:33, Christopher Barker wrote: > > Having a "tabnanny-like" function / module in the stdlib would be nice, > though I'd think a stand alone module in PyPi would be almost as good, and a > good way to see if it gains traction. Good point. Plus, it might well turn out that

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread David Mertz
I wonder if The Fuck could be customize to handle these improved error messages envisioned: https://github.com/nvbn/thefuck It's a lovely tool. I don't mind the minor profanity, but when I teach I add an alias of 'fix' for students to see instead. On Sun, May 10, 2020 at 5:34 PM Christopher Bar

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Eric V. Smith
On 5/10/2020 4:04 PM, Steve Jorgensen wrote: I totally get what you're saying. For the sake of simplicity, I thought that the 2 permissiveness options should be one that only prevents path traversal and one that is extremely conservative, omitting characters that are often safe and appropriat

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Christopher Barker
On Sun, May 10, 2020 at 1:17 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote: > (By the way, the reason I used -f rather than —fix is that I can’t figure > out how to get the iPhone Mail.app to not replace double hyphens with an > em-dash, or even how to fix it when it does. Al

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Dan Sommers
On Sun, 10 May 2020 14:13:37 -0400 Richard Damon wrote: > An error like character (whatever) is not a quote (or is not a minus > sign) seems similar. It is one thing to not recognize a funny > character in the language, but to actually parse it well enough to > give a message that says in effect,

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Steve Jorgensen
Dan Sommers wrote: > I know what sanitize means (in English and in the technical sense I > believe you intend here), but can you provide some context and actual > use cases? > Sanitize on input so that your application code doesn't "accidentally" > spit out the contents of /etc/shadow? Sanitize

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Andrew Barnert via Python-ideas
On May 10, 2020, at 00:11, Steve Barnes wrote: > > What can be done? I think there’s another option (in addition to improving SyntaxError, not instead of it): Add a defancier module to the stdlib. It has functions that take some text and turn smart quotes into plain ASCII quotes, dashes and m

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Steve Jorgensen
Dan Sommers wrote: > On Sun, 10 May 2020 00:34:43 - > "Steve Jorgensen" ste...@stevej.name wrote: > > I believe the Python standard library should include > > a means of > > sanitizing a filesystem entry, and this should not be something > > requiring a 3rd party package. > > I'm not disagreein

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Andrew Barnert via Python-ideas
On May 10, 2020, at 03:47, Ned Batchelder wrote: > > On 5/10/20 3:09 AM, Steve Barnes wrote: >> Change the error message “SyntaxError: invalid character in identifier” to >> include which character and it’s Unicode value so that it becomes >> “SyntaxError: invalid character 0x201c “ in ident

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Andrew Barnert via Python-ideas
On May 10, 2020, at 11:09, Christopher Barker wrote: Is there any way you can fix the reply quoting on your mail client, or manually work around it? I keep reading paragraphs and saying “why is he saying the same thing I said” only to realize that you’re not, that’s just a quote from me that i

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Alex Hall
On Sun, May 10, 2020 at 8:20 PM Andrew Barnert wrote: > On May 10, 2020, at 02:42, Alex Hall wrote: > > > > - Handling negative indices for sequences (is there any reason we don't > have that now?) > > Presumably partly just to keep it minimal and simple. Itertools is all > about transforming it

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread David Mertz
On Sun, May 10, 2020 at 2:17 PM Richard Damon wrote: > An error like character (whatever) is not a quote (or is not a minus+0060 > sign) seems similar. It is one thing to not recognize a funny character > in the language, but to actually parse it well enough to give a message > that says in effec

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Andrew Barnert via Python-ideas
On May 10, 2020, at 02:42, Alex Hall wrote: > > - Handling negative indices for sequences (is there any reason we don't have > that now?) Presumably partly just to keep it minimal and simple. Itertools is all about transforming iterables into other iterables in as generic a way as possible. N

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Richard Damon
A lot of this reminds me of a story told by a programming instructor in the 70's, he submitted a FORTRAN program deck to the machine, the complier gave him a warning on a statement which read INTEGER misspelled, it than ran the program, but IGNORED the statement, even though it clearly understood w

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Christopher Barker
On Sat, May 9, 2020 at 9:11 PM Andrew Barnert wrote: > I don’t think it invalidates the basic idea at all, just that it suggests the design should be different. Originally, dict returned lists for keys, values, and items. In 2.2, iterator variants were added. In 3.0, the list and iterator varian

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread David Mertz
On Sun, May 10, 2020 at 4:03 AM Steven D'Aprano wrote: > I think that David(?) may have a Vim or Emacs mode that allows him to > use Unicode chars as syntax? > I use the vim-conceal plugin: https://github.com/khzaw/vim-conceal. I know that something similar exists for Emacs, but don't remember

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Christopher Barker
To reinforce what others have said a bit: It is absolutely OK to expect people to write code with a code editor. Period. And having more than the alread two quote characters would be a mess. But this IS an issue, not with people writing code with tools meant for writing text, but by people copyin

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Dan Sommers
On Sun, 10 May 2020 00:34:43 - "Steve Jorgensen" wrote: > I believe the Python standard library should include a means of > sanitizing a filesystem entry, and this should not be something > requiring a 3rd party package. I'm not disagreeing. > What I am envisioning is a function (presumably

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Ned Batchelder
On 5/10/20 3:09 AM, Steve Barnes wrote: Change the error message “SyntaxError: invalid character in identifier” to include which character and it’s Unicode value so that it becomes  “SyntaxError: invalid character 0x201c “  in identifier” – this is almost certainly the easiest change and fits w

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Steve Jorgensen
Steve Jorgensen wrote: > Steve Jorgensen wrote: > > I believe the Python standard library should include > > a means of sanitizing a filesystem > > entry, and this should not be something requiring a 3rd party package. > > One of reasons I think this should be in the standard lib is because that >

[Python-ideas] Re: Adding slice Iterator to Sequences (was: islice with actual slices)

2020-05-10 Thread Alex Hall
On Sun, May 10, 2020 at 5:00 AM Christopher Barker wrote: > On Sat, May 9, 2020 at 1:58 PM Alex Hall wrote: > >> I think this is a good idea. For sequences I'm not sure how big the >> benefit is - I get that it's more efficient, but I rarely care that much, >> because most lists are small. Why

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Steve Jorgensen
Steve Jorgensen wrote: > Steve Jorgensen wrote: > > Andrew Barnert wrote: > > On May 9, 2020, at 17:35, Steve Jorgensen > > ste...@stevej.name wrote: > > I believe the Python standard library should > > include > > a means of sanitizing a filesystem entry, and this should not be something > > req

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Steve Jorgensen
Steve Jorgensen wrote: > Andrew Barnert wrote: > > On May 9, 2020, at 17:35, Steve Jorgensen > > ste...@stevej.name wrote: > > I believe the Python standard library should > > include > > a means of sanitizing a filesystem entry, and this should not be something > > requiring a > > 3rd > > party

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Steve Jorgensen
Andrew Barnert wrote: > On May 9, 2020, at 17:35, Steve Jorgensen ste...@stevej.name wrote: > > I believe the Python standard library should include > > a means of sanitizing a filesystem entry, and this should not be something > > requiring a 3rd > > party package. > > One of reasons I think thi

[Python-ideas] Re: Sanitize filename (path part)

2020-05-10 Thread Steve Jorgensen
Responding to points individually to avoid confusing multi-topic threads. :) Andrew Barnert wrote: < snip > > > When permissive is False, > > characters that are generally unsafe are rejected. When permissive is > > True, only path separator characters are rejected. Generally unsafe > > character

[Python-ideas] Re: Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steven D'Aprano
On Sun, May 10, 2020 at 07:09:15AM +, Steve Barnes wrote about Unicode dashes and quotes sneaking into code: > What can be done? > > 1. Persuade Microsoft, and others, to stop being so helpful by > default - good luck with that! No, I think that in the broader picture, they are doing

[Python-ideas] Improve handling of Unicode quotes and hyphens

2020-05-10 Thread Steve Barnes
Hi All, Apologies if this has already been discussed to death. Python 3 allows Unicode characters in strings and identifiers but the actual quotation marks are only accepted in plain ASCII, i.e. the following all successfully initialise strings: ``` S1 = "Double Quoted" # Opened and closed wit