Re: Popping key causes dict derived from object to revert to object

2024-03-25 Thread Jon Ribbens via Python-list
On 2024-03-25, Loris Bennett  wrote:
> "Michael F. Stemper"  writes:
>
>> On 25/03/2024 01.56, Loris Bennett wrote:
>>> Grant Edwards  writes:
>>> 
 On 2024-03-22, Loris Bennett via Python-list  
 wrote:

> Yes, I was mistakenly thinking that the popping the element would
> leave me with the dict minus the popped key-value pair.

 It does.
>>> Indeed, but I was thinking in the context of
>>>dict_list = [d.pop('a') for d in dict_list]
>>> and incorrectly expecting to get a list of 'd' without key 'a',
>>> instead
>>> of a list of the 'd['a]'.
>> I apologize if this has already been mentioned in this thread, but are
>> you aware of "d.keys()" and "d.values"?
>>
>>  >>> d = {}
>>  >>> d['do'] = 'a deer, a female deer'
>>  >>> d['re'] = 'a drop of golden sunshine'
>>  >>> d['mi'] = 'a name I call myself'
>>  >>> d['fa'] = 'a long, long way to run'
>>  >>> d.keys()
>>  ['fa', 'mi', 'do', 're']
>>  >>> d.values()
>>  ['a long, long way to run', 'a name I call myself', 'a deer, a female 
>> deer', 'a drop of golden sunshine']
>>  >>>
>
> Yes, I am, thank you.  However, I didn't want either the keys or the
> values.  Instead I wanted to remove a key within a list comprehension.

Do you mean something like:

  [my_dict[key] for key in my_dict if key != 'a']

?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Await expressions (Posting On Python-List Prohibited)

2024-02-02 Thread Jon Ribbens via Python-list
On 2024-02-02, Lawrence D'Oliveiro  wrote:
> On 1 Feb 2024 10:09:10 GMT, Stefan Ram wrote:
>
>>   Heck, even of the respected members of this newsgroup, IIRC, no one
>>   mentioned "__await__".
>
> It’s part of the definition of an “awaitable”, if you had looked that up.
>
>

To be fair, I've been using Python for well over quarter of a century,
and I never knew it had a glossary.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to enter multiple, similar, dictionaries?

2023-12-11 Thread Jon Ribbens via Python-list
On 2023-12-11, Chris Green  wrote:
> Chris Green  wrote:
>> Is there a way to abbreviate the following code somehow?
>> 
>> lv = {'dev':'bbb', 'input':'1', 'name':'Leisure volts'}
>> sv = {'dev':'bbb', 'input':'0', 'name':'Starter volts'}
>> la = {'dev':'bbb', 'input':'2', 'name':'Leisure Amps'}
>> sa = {'dev':'bbb', 'input':'3', 'name':'Starter Amps'}
>> bv = {'dev':'adc2', 'input':0, 'name':'BowProp Volts'}
>> 
>> It's effectively a 'table' with columns named 'dev', 'input' and
>> 'name' and I want to access the values of the table using the variable
>> name.
>> 
> Or, more sensibly, make the above into a list (or maybe dictionary)
> of dictionaries:-
>
> adccfg = [
> {'abbr':'lv', 'dev':'bbb', 'input':'1', 'name':'Leisure volts'},
> {'abbr':'sv', 'dev':'bbb', 'input':'0', 'name':'Starter volts'},
> {'abbr':'la', 'dev':'bbb', 'input':'2', 'name':'Leisure Amps'},
> {'abbr':'sa', 'dev':'bbb', 'input':'3', 'name':'Starter Amps'},
> {'abbr':'bv', 'dev':'adc2', 'input':0, 'name':'BowProp Volts'}
> ]
>
> This pickles nicely, I just want an easy way to enter the data!

adccfg = [
dict(zip(('abbr', 'dev', 'input', 'name'), row))
for row in (
('lv', 'bbb', '1', 'Leisure volts'),
('sv', 'bbb', '0', 'Starter volts'),
('la', 'bbb', '2', 'Leisure Amps'),
('sa', 'bbb', '3', 'Starter Amps'),
('bv', 'adc2', 0, 'BowProp Volts'),
)
]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Golf

2023-11-07 Thread Jon Ribbens via Python-list
On 2023-11-07,   wrote:
> Discussions like this feel a bit silly after a while. How long
> something is to type on a command line is not a major issue and
> brevity can lead to being hard to remember too especially using
> obscure references.

Of course it's silly, that's why it's called "golf"!

It would be basically insane to use open(0) instead of sys.stdin
like this except where the length of the source code overrides
all other considerations - which is essentially never, unless
playing code golf...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python Golf

2023-11-07 Thread Jon Ribbens via Python-list
On 2023-11-07, Stefan Ram  wrote:
>   I read this in a shell newsgroup:
>
> perl -anE '$s += $F[1]; END {say $s}' in
>
>   , so I wrote
>
> py -c "import sys; print(sum(int(F.split()[1])for F in sys.stdin))" 
>   to show that this is possible with Python too. 
>
>   But now people complain that it's longer than the Perl version.
>
>   Do you see ways to make it shorter (beyond removing one space
>   after the semicolon ";")?

It's a bit of an unfair competition given that, unlike Perl,
Python is not designed to be an 'awk' replacement.

Having said that, you could make it a bit shorter:

py -c "print(sum(int(F.split()[1])for F in open(0)))" https://mail.python.org/mailman/listinfo/python-list


Re: Checking if email is valid

2023-11-06 Thread Jon Ribbens via Python-list
On 2023-11-06, Mats Wichmann  wrote:
> On 11/6/23 01:57, Simon Connah via Python-list wrote:
>> The thing I truly hate is when you have two telephone number fields.
>> One for landline and one for mobile. I mean who in hell has a
>> landline these days? 
>
> People who live in places with spotty, or no, mobile coverage. We do
> exist.

Catering for people in minority situations is, of course, important.

Catering for people in the majority situation is probably important too.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking if email is valid

2023-11-06 Thread Jon Ribbens via Python-list
On 2023-11-06, D'Arcy Cain  wrote:
> On 2023-11-05 06:48, Jon Ribbens via Python-list wrote:
>> Sometimes I think that these sorts of stupid, wrong, validation are the
>> fault of idiot managers. When it's apostrophes though I'm suspicious
>> that it may be idiot programmers who don't know how to prevent SQL
>> injection attacks without just saying "ban all apostrophes everywhere".
>> Or perhaps it's idiot "security consultancies" who make it a tick-box
>> requirement.
>
> https://xkcd.com/327/

Indeed. My point is that the correct way to solve this problem is not
to declare vast swathes of valid inputs verboten, but to *not execute
user input as code*. Controversial, I know.

>>> OK, now that I am started, what else?  Oh yah.  Look at your credit
>>> card.  The number has spaces in it.  Why do I have to remove them.  If
>>> you don't like them then you are a computer, just remove them.
>> 
>> Yes, this is also very stupid and annoying. Does nobody who works for
>> the companies making these sorts of websites ever use their own, or
>> indeed anyone else's, website?
>
> Gotta wonder for sure.  It could also be the case of programmers 
> depending on user input but the users insist on living with the bugs 
> and/or working around them.  We made crash reporting dead simple to 
> report on and still users didn't bother.  We would get the traceback and 
> have to guess what the user was doing.

That was another thing that I used to find ridiculous, but seems to have
improved somewhat in recent years - website error pages that said "please
contact us to let us know about this error". I'm sorry, what? You want
me to contact you to tell you about what your own website is doing? How
does that make any sense? Websites should be self-reporting problems.

(Not least because, as you say, people are absolutely terrible at
reporting problems, with almost all bug reports reading effectively as
"I was doing something that I'm not going to tell you and I as expecting
something to happen which I'm not going to tell you, but instead
something else happened, which I'm also not going to tell you".)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip/pip3 confusion and keeping up to date

2023-11-05 Thread Jon Ribbens via Python-list
On 2023-11-05, Karsten Hilbert  wrote:
> Am Fri, Nov 03, 2023 at 01:53:32PM - schrieb Jon Ribbens via Python-list:
>
>> >> Are they not available in your system's package manager?
>> >
>> > ... this clearly often answers to "no" for applications of
>> > any complexity.
>> >
>> > Is there a suggested proper path to deal with that (Debian is
>> > of interest to me here) ?
>>
>> Yes, as previously mentioned, use virtual environments.
>>
>> These days they don't even need to be "activated". For package 'foo'
>> for example you could create /usr/local/lib/foo, under which you would
>> create a virtual environment and install the 'foo' package inside it,
>> and then you could do:
>>
>> ln -s /usr/local/lib/foo/env/bin/foo /usr/local/bin/foo
>>
>> and then you could just type 'foo' to run it.
>
> This all being nice and well, but:
>
> How does one "fill" that venv with packages from pip during
>
>   apt-get install python3-app-of-interest
>
> ?
>
> Is the suggested way really to pip-install into this venv
> during apt-get install ?

I don't know what you mean by that. But if you install the apt packages
and then create your venv with --system-site-packages then I believe
your venv should be able to see the apt packages and import them.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking if email is valid

2023-11-05 Thread Jon Ribbens via Python-list
On 2023-11-05, Grant Edwards  wrote:
> On 2023-11-05, D'Arcy Cain via Python-list  wrote:
>> On 2023-11-05 00:39, Grant Edwards via Python-list wrote:
>>> Definitely. Syntactic e-mail address "validation" is one of the most
>>> useless and widely broken things on the Interwebs.  People who do
>>> anything other than require an '@' (and optionally make you enter the
>>> same @-containing string twice) are deluding themselves.
>>
>> And don't get me started on phone number validation.
>
> I can see how the truley dim-witted might forget that other countries
> have phone numbers with differing lengths and formatting/punctuation,
> but there are tons of sites where it takes multiple tries when
> entering even a bog-standard USA 10-0digit phone nubmer because they
> are completely flummuxed by an area code in parens or hyphens in the
> usual places (or lack of hyhpens in the usual places). This stuff
> isn't that hard, people...

Indeed - you just do "pip install phonenumbers" :-)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip/pip3 confusion and keeping up to date

2023-11-05 Thread Jon Ribbens via Python-list
On 2023-11-05, Chris Green  wrote:
> Jon Ribbens  wrote:
>> On 2023-11-03, Karsten Hilbert  wrote:
>> > Am Thu, Nov 02, 2023 at 04:07:33PM -0600 schrieb Mats Wichmann via 
>> > Python-list:
>> >> >So they now have only python3 and there is no python executable in
>> >> >PATH.
>> >>
>> >> FWIW, for this you install the little stub package python-is-python3.
>> >> Especially if you want to keep a python2 installation around -
>> >> "python" will still be python3 in this case.
>> >
>> > Since you seem knowledgeable in this area: Do you know of a
>> > resource for learning the *canonical* way of packaging a
>> > Python application for installation via apt which
>> >
>> > - needs some packages available via apt
>> > - needs some packages only available via pip
>> > - needs some packages newer than what is available via apt
>> >
>> > ?
>> 
>> I suspect the answer to that is that you would have to:
>> 
>>   * create packages yourself for the unpackaged dependencies
>>   * create a dependency graph of *every* Python package in the package
>> repository (whether or not the package is relevant to what you're doing)
>>   * work out what versions of every Python package are required in order
>> to have a dependency graph that can be successfully resolved, taking
>> into account the requirements of your new package also
>>   * contact every single maintainer of every single one of the packages
>> that needs updating and persuade them to update their packages and
>> reassure them that you are getting all the other package maintainers
>> to update their packages accordingly and that you have a plan and
>> that you know what you're doing
>> 
>>   ... screen fades to black, title card "3 years later", fade in to ...
>> 
>>   * publish your package
>> 
> Surely it's not that bad, the vast bulk of Debian, Ubuntu and other
> distributions are installed via systems that sort out dependencies once
> given a particular package's requirements.  Python is surely not
> unique in its dependency requirements.

I think there's a lot of work that goes on behind the scenes to keep the
entire package set consistent so that you don't end up in the situation
where, e.g. package A depends on D<2.0 and package B requires D>=2.0 and
therefore you can't install A and B at the same time (and trying to
avoid as much as possible the hacky situation where you have two
packages for D, one for <2.0 and another called D2 for >=2.0).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip/pip3 confusion and keeping up to date

2023-11-05 Thread Jon Ribbens via Python-list
On 2023-11-03, Karsten Hilbert  wrote:
> Am Thu, Nov 02, 2023 at 04:07:33PM -0600 schrieb Mats Wichmann via 
> Python-list:
>> >So they now have only python3 and there is no python executable in
>> >PATH.
>>
>> FWIW, for this you install the little stub package python-is-python3.
>> Especially if you want to keep a python2 installation around -
>> "python" will still be python3 in this case.
>
> Since you seem knowledgeable in this area: Do you know of a
> resource for learning the *canonical* way of packaging a
> Python application for installation via apt which
>
> - needs some packages available via apt
> - needs some packages only available via pip
> - needs some packages newer than what is available via apt
>
> ?

I suspect the answer to that is that you would have to:

  * create packages yourself for the unpackaged dependencies
  * create a dependency graph of *every* Python package in the package
repository (whether or not the package is relevant to what you're doing)
  * work out what versions of every Python package are required in order
to have a dependency graph that can be successfully resolved, taking
into account the requirements of your new package also
  * contact every single maintainer of every single one of the packages
that needs updating and persuade them to update their packages and
reassure them that you are getting all the other package maintainers
to update their packages accordingly and that you have a plan and
that you know what you're doing

  ... screen fades to black, title card "3 years later", fade in to ...

  * publish your package

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking if email is valid

2023-11-05 Thread Jon Ribbens via Python-list
On 2023-11-05, D'Arcy Cain  wrote:
> On 2023-11-05 00:39, Grant Edwards via Python-list wrote:
>> Definitely. Syntactic e-mail address "validation" is one of the most
>> useless and widely broken things on the Interwebs.  People who do
>> anything other than require an '@' (and optionally make you enter the
>> same @-containing string twice) are deluding themselves.
>
> And don't get me started on phone number validation.  The most annoying 
> thing to me, though, is sites that reject names that have an apostrophe 
> in them.  I hate being told that my name, that I have been using for 
> over seventy years, is invalid.

Sometimes I think that these sorts of stupid, wrong, validation are the
fault of idiot managers. When it's apostrophes though I'm suspicious
that it may be idiot programmers who don't know how to prevent SQL
injection attacks without just saying "ban all apostrophes everywhere".
Or perhaps it's idiot "security consultancies" who make it a tick-box
requirement.

> OK, now that I am started, what else?  Oh yah.  Look at your credit 
> card.  The number has spaces in it.  Why do I have to remove them.  If 
> you don't like them then you are a computer, just remove them.

Yes, this is also very stupid and annoying. Does nobody who works for
the companies making these sorts of websites ever use their own, or
indeed anyone else's, website?

Another one that's become popular recently is the sort of annoying
website that insists on "email 2FA", i.e. you try to login and then
they send you an email with a 6-digit code in that you have to enter
to authenticate yourself. So you go to your mail client and double-click
on the number to select it, and stupid thing (A) happens: for no sane
reason, the computer selects the digits *and also an invisible space
after them*. Then you copy the digits into the web form, then invisible
space tags along for the ride, and stupid thing (B) happens: the web
server rejects the code because of the trailing space.

Honestly I don't understand why every web application platform doesn't
automatically strip all leading and trailing whitespace on user input
by default. It's surely incredibly rare that it's sensible to preserve
it. (I see Django eventually got around to this in version 1.9.)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip/pip3 confusion and keeping up to date

2023-11-04 Thread Jon Ribbens via Python-list
On 2023-11-03, Karsten Hilbert  wrote:
> Am Thu, Nov 02, 2023 at 09:35:43PM - schrieb Jon Ribbens via Python-list:
>
> Regardless of ...
>
>> Because pip barely plays well by itself, let alone with other package
>> managers at the same time.
>
> ... being true ...
>
>> > I do only install a few things using pip.
>>
>> Are they not available in your system's package manager?
>
> ... this clearly often answers to "no" for applications of
> any complexity.
>
> Is there a suggested proper path to deal with that (Debian is
> of interest to me here) ?

Yes, as previously mentioned, use virtual environments.

These days they don't even need to be "activated". For package 'foo'
for example you could create /usr/local/lib/foo, under which you would
create a virtual environment and install the 'foo' package inside it,
and then you could do:

ln -s /usr/local/lib/foo/env/bin/foo /usr/local/bin/foo

and then you could just type 'foo' to run it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking if email is valid

2023-11-03 Thread Jon Ribbens via Python-list
On 2023-11-03, Chris Angelico  wrote:
> On Fri, 3 Nov 2023 at 12:21, AVI GROSS via Python-list
> wrote:
>> My guess is that a first test of an email address might be to see if
>> a decent module of that kind fills out the object to your
>> satisfaction. You can then perhaps test parts of the object, rather
>> than everything at once, to see if it is obviously invalid. As an
>> example, what does u...@alpha...com with what seems to be lots of
>> meaningless periods, get parsed into?
>
> What do you mean by "obviously invalid"? Have you read the RFC?

What do you mean by 'What do you mean by "obviously invalid"?'
Have you read the RFC?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip/pip3 confusion and keeping up to date

2023-11-02 Thread Jon Ribbens via Python-list
On 2023-11-02, Chris Green  wrote:
> Jon Ribbens  wrote:
>> On 2023-11-02, Dieter Maurer  wrote:
>> > Chris Green wrote at 2023-11-2 10:58 +:
>> >> ...
>> >>So, going on from this, how do I do the equivalent of "apt update; apt
>> >>upgrade" for my globally installed pip packages?
>> >
>> > `pip list -o` will tell you for which packages there are upgrades
>> > available.
>> > `pip install -U ...` will upgrade packages.
>> >
>> > Be careful, though.
>> > With `apt`, you usually have (`apt`) sources representing a consistent
>> > package universe. Someone tests that package upgrades in this
>> > universe do not break other packages (in this universe).
>> > Because of this, upgrading poses low risk.
>> >
>> > `PyPI` does not guarantes consistency. A new package version
>> > may be incompatible to a previous one -- and with other
>> > package you have installed.
>> >
>> > I do not think that you would want to auto-upgrade all installed
>> > packages.
>> 
>> Indeed. What you're describing is a very unfortunate failing of pip.
>> 'Upgrade' doesn't even follow requirements when you tell it what to
>> upgrade - e.g. if you do "pip install foo" and foo requires "bar<2"
>> so you end up with:
>> 
>>PackageVersion
>>-- -
>>foo1.0.0
>>bar1.2.0
>> 
>> and then a new version 1.3.0 of bar comes out and you do
>> "pip install -U foo", pip will not upgrade bar even though it could
>> and should, because foo is already at the latest version so pip won't
>> even look at its dependencies.
>> 
>> Indeed there is no way of knowing that you should upgrade bar without
>> manually following all the dependency graphs. ("pip list -o" will tell
>> you there's a newer version, but that isn't the same - e.g. if the new
>> version of bar was 2.0.0 then "pip list -o" will list it, but you should
>> not upgrade to it.)
>> 
>> You can do "pip install -I foo", which will pointlessly reinstall foo
>> and then presumably upgrade bar as well, thus probably getting to the
>> right result via a rather roundabout route, but I'm not sure if that
>> does indeed work properly and if it is a reliable and recommended way
>> of doing things.
>
> It is a bit of a minefield isn't it.  I try to minimise my use of
> packages installed using pip for this very reason.  Maybe the safest
> route would simply be to uninstall everything and then re-install it.

That is literally what I do quite often - completely erase the
virtual env and then re-create it from scratch - because it seems
to be the only / easiest way to upgrade the packages to the latest
versions consistent with given dependencies.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip/pip3 confusion and keeping up to date

2023-11-02 Thread Jon Ribbens via Python-list
On 2023-11-02, Chris Green  wrote:
> Jon Ribbens  wrote:
>> On 2023-11-02, Chris Green  wrote:
>> > I have a couple of systems which used to have python2 as well as
>> > python3 but as Ubuntu and Debian verions have moved on they have
>> > finally eliminated all dependencies on python2.
>> >
>> > So they now have only python3 and there is no python executable in
>> > PATH. 
>> >
>> > There's still both /usr/bin/pip and /usr/bin/pip3 but they're
>> > identical so presuably I can now simply use pip and it will be a
>> > python3 pip.
>> >
>> >
>> > So, going on from this, how do I do the equivalent of "apt update; apt
>> > upgrade" for my globally installed pip packages?
>> 
>> I'm not sure what that question has to do with everything that preceded
>> it, but you don't want to install python packages globally using pip.
>> Either install them with 'apt', or install them in a virtual environment.
>
> Why in a virtual environment?  When I install a package whether from
> apt or from pip I want everyone/everything on my system to be able to
> use it.

Because pip barely plays well by itself, let alone with other package
managers at the same time.

> I do only install a few things using pip.

Are they not available in your system's package manager?
I guess you might get away with "sudo -H pip install -U foo"
for a couple of things, if they don't have many dependencies.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip/pip3 confusion and keeping up to date

2023-11-02 Thread Jon Ribbens via Python-list
On 2023-11-02, Chris Green  wrote:
> I have a couple of systems which used to have python2 as well as
> python3 but as Ubuntu and Debian verions have moved on they have
> finally eliminated all dependencies on python2.
>
> So they now have only python3 and there is no python executable in
> PATH. 
>
> There's still both /usr/bin/pip and /usr/bin/pip3 but they're
> identical so presuably I can now simply use pip and it will be a
> python3 pip.
>
>
> So, going on from this, how do I do the equivalent of "apt update; apt
> upgrade" for my globally installed pip packages?

I'm not sure what that question has to do with everything that preceded
it, but you don't want to install python packages globally using pip.
Either install them with 'apt', or install them in a virtual environment.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pip/pip3 confusion and keeping up to date

2023-11-02 Thread Jon Ribbens via Python-list
On 2023-11-02, Dieter Maurer  wrote:
> Chris Green wrote at 2023-11-2 10:58 +:
>> ...
>>So, going on from this, how do I do the equivalent of "apt update; apt
>>upgrade" for my globally installed pip packages?
>
> `pip list -o` will tell you for which packages there are upgrades
> available.
> `pip install -U ...` will upgrade packages.
>
> Be careful, though.
> With `apt`, you usually have (`apt`) sources representing a consistent
> package universe. Someone tests that package upgrades in this
> universe do not break other packages (in this universe).
> Because of this, upgrading poses low risk.
>
> `PyPI` does not guarantes consistency. A new package version
> may be incompatible to a previous one -- and with other
> package you have installed.
>
> I do not think that you would want to auto-upgrade all installed
> packages.

Indeed. What you're describing is a very unfortunate failing of pip.
'Upgrade' doesn't even follow requirements when you tell it what to
upgrade - e.g. if you do "pip install foo" and foo requires "bar<2"
so you end up with:

   PackageVersion
   -- -
   foo1.0.0
   bar1.2.0

and then a new version 1.3.0 of bar comes out and you do
"pip install -U foo", pip will not upgrade bar even though it could
and should, because foo is already at the latest version so pip won't
even look at its dependencies.

Indeed there is no way of knowing that you should upgrade bar without
manually following all the dependency graphs. ("pip list -o" will tell
you there's a newer version, but that isn't the same - e.g. if the new
version of bar was 2.0.0 then "pip list -o" will list it, but you should
not upgrade to it.)

You can do "pip install -I foo", which will pointlessly reinstall foo
and then presumably upgrade bar as well, thus probably getting to the
right result via a rather roundabout route, but I'm not sure if that
does indeed work properly and if it is a reliable and recommended way
of doing things.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking if email is valid

2023-11-02 Thread Jon Ribbens via Python-list
On 2023-11-02, Simon Connah  wrote:
> Valid as in conforms to the standard. Although having looked at the
> standard that might be more difficult than originally planned.

Yes. Almost nobody actually implements "the standard" as in RFC 2822
section 3.4.1 (which can contain, for example, non-printable control
characters, and comments), nor is it particularly clear that they
should. So while checking against "the spec" might sound right, it's
highly unlikely that it's what you actually want. Would you really
want to allow:

(jam today) "chris @ \"home\""@ (Chris's host.)public.example

for example? And would you be able to do anything with it if you did?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking if email is valid

2023-11-02 Thread Jon Ribbens via Python-list
On 2023-11-02, D'Arcy Cain  wrote:
> On 2023-11-01 17:17, Chris Angelico via Python-list wrote:
>> On Thu, 2 Nov 2023 at 08:09, Grant Edwards via Python-list
>>  wrote:
>>> Make sure it has an '@' in it.  Possibly require at least one '.'
>>> after the '@'.
>> 
>> No guarantee that there'll be a dot after the at. (Technically there's
>> no guarantee of an at sign either, but email addresses without at
>> signs are local-only, so in many contexts, you can assume there needs
>> to be an at.)
>
> druid!darcy - doesn't work any more but not because it is syntactically 
> incorrect.
>
> Remember the good old days when we were able to test if an address 
> existed without sending?  That was before the black hats discovered the 
> Internet.

I remember the good old days when we were able to send email.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Checking if email is valid

2023-11-01 Thread Jon Ribbens via Python-list
On 2023-11-01, Chris Angelico  wrote:
> On Thu, 2 Nov 2023 at 05:21, Simon Connah via Python-list
> wrote:
>> Could someone push me in the right direction please? I just want to
>> find out if a string is a valid email address.
>
> There is only one way to know that a string is a valid email address,
> and that's to send an email to it.
>
> What is your goal though? For example, if you're trying to autolink
> email addresses in text, you don't really care whether it's valid,
> only that it looks like an address.

There's often value in even only partially-effective checks though.
With an email address you can easily check to see if it has an "@",
and if the stuff after the "@" is a syntactically valid domain name.
You can also go a bit further and check to see if the domain has an
MX record, and if it doesn't then it is extremely unlikely that the
address is valid.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: path to python in venv

2023-09-27 Thread Jon Ribbens via Python-list
On 2023-09-27, Larry Martell  wrote:
> On Wed, Sep 27, 2023 at 12:42 PM Jon Ribbens via Python-list
> wrote:
>> On 2023-09-27, Larry Martell  wrote:
>> > I was under the impression that in a venv the python used would be in
>> > the venv's bin dir. But in my venvs I see this in the bin dirs:
>> >
>> > lrwxrwxrwx 1 larrymartell larrymartell7 Sep 27 11:21 python -> python3
>> > lrwxrwxrwx 1 larrymartell larrymartell   16 Sep 27 11:21 python3 ->
>> > /usr/bin/python3
>> ...
>> > Not sure what this really means, nor how to get python to be in my venv.
>>
>> WHy do you want python to be "in your venv"?
>
> Isn't that the entire point of a venv? To have a completely self
> contained env? So if someone messes with the system python it will not
> break code running in the venv.

The main point of the venv is to isolate the installed packages,
rather than Python itself. I'm a bit surprised your symlinks are
as shown above though - mine link from python to python3.11 to
/usr/bin/python3.11, so it wouldn't change the version of python
used even if I installed a different system python version.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: path to python in venv

2023-09-27 Thread Jon Ribbens via Python-list
On 2023-09-27, Larry Martell  wrote:
> I was under the impression that in a venv the python used would be in
> the venv's bin dir. But in my venvs I see this in the bin dirs:
>
> lrwxrwxrwx 1 larrymartell larrymartell7 Sep 27 11:21 python -> python3
> lrwxrwxrwx 1 larrymartell larrymartell   16 Sep 27 11:21 python3 ->
> /usr/bin/python3
...
> Not sure what this really means, nor how to get python to be in my venv.

WHy do you want python to be "in your venv"?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: isinstance()

2023-08-04 Thread Jon Ribbens via Python-list
On 2023-08-02, dn  wrote:
> Can you please explain why a multi-part second-argument must be a tuple 
> and not any other form of collection-type?

The following comment may hold a clue:

if (PyTuple_Check(cls)) {
/* Not a general sequence -- that opens up the road to
   recursion and stack overflow. */

https://github.com/python/cpython/blob/main/Objects/abstract.c#L2684

Plus an almost total lack of demand for change I should think.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bug in io.TextIOWrapper?

2023-06-19 Thread Jon Ribbens via Python-list
On 2023-06-19, Inada Naoki  wrote:
> I checked TextIOWrapper source code and confirmed that it doesn't call
> encoder.write(text, finish=True) on close.
> Since TextIOWrapper allows random access, it is difficult to call it
> automatically. So please think it as just limitation rather than bug.
> Please use codec and binary file manually for now.

It could call it on seek() or flush(). It seems like a definite bug to
me, in that its behaviour appears clearly incorrect - it's just that
there isn't an entirely obvious "100% correct" behaviour to choose.
-- 
https://mail.python.org/mailman/listinfo/python-list


Bug in io.TextIOWrapper?

2023-06-19 Thread Jon Ribbens via Python-list
io.TextIOWrapper() wraps a binary stream so you can write text to it.
It takes an 'encoding' parameter, which it uses to look up the codec
in the codecs registry, and then it uses the IncrementalEncoder and
IncrementalDecoder classes for the appropriate codec.

The IncrementalEncoder.encode() function is given the object to encode
of course, and also an optional second parameter which indicates if
this is the final output.

The bug is that TextIOWrapper() never sets the second parameter to
indicate that the output is complete - not even if you call close().

Example:

>>> import io
>>> buffer = io.BytesIO()
>>> stream = io.TextIOWrapper(buffer, encoding='idna')
>>> stream.write('abc.example.com')
15
>>> stream.flush()
>>> buffer.getvalue()
b'abc.example.'

Obviously using the 'idna' wrapper as an encoding on a stream is a bit
unlikely, but nevertheless any other codec which cares about the 'final'
parameter will also have this problem.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: What to use instead of nntplib?

2023-05-22 Thread Jon Ribbens via Python-list
On 2023-05-22, Skip Montanaro  wrote:
>> My understanding is that nntplib isn't being erased from reality,
>> it's merely being removed from the set of modules that are provided
>> by default.
>>
>> I presume that once it's removed from the core, it will still be
>> possible to install it via pip or some other mechanism.
>
> It won't magically be available via pip unless someone steps up to maintain
> it as a PyPI package

That would appear to have already happened over a month ago.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to get get_body() to work? (about email)

2023-03-20 Thread Jon Ribbens via Python-list
On 2023-03-19, Greg Ewing  wrote:
> On 20/03/23 7:07 am, Jon Ribbens wrote:
>> Ah, apparently it got removed in Python 3, which is a bit odd as the
>> last I heard it was added in Python 2.2 in order to achieve consistency
>> with other types.
>
> As far as I remember, the file type came into existence
> with type/class unification, and "open" became an alias
> for the file type, so you could use open() and file()
> interchangeably.
>
> With the Unicode revolution in Python 3, file handling got
> a lot more complicated. Rather than a single file type,
> there are now a bunch of classes that handle low-level I/O,
> encoding/decoding, etc, and open() is a function again
> that builds the appropriate combination of underlying
> objects.

This is true, however there does exist a base class which, according to
the documentation, underlies all of the different IO classes - IOBase -
so it might have been neater to make 'file' be an alias for that.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to get get_body() to work? (about email)

2023-03-20 Thread Jon Ribbens via Python-list
On 2023-03-19, Stefan Ram  wrote:
> Jon Ribbens  writes:
>>(Also, I too find it annoying to have to avoid, but calling a local
>>variable 'file' is somewhat suspect since it shadows the builtin.)
>
>   Thanks for your remarks, but I'm not aware
>   of such a predefined name "file"!

Ah, apparently it got removed in Python 3, which is a bit odd as the
last I heard it was added in Python 2.2 in order to achieve consistency
with other types.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to get get_body() to work? (about email)

2023-03-20 Thread Jon Ribbens via Python-list
On 2023-03-19, Stefan Ram  wrote:
> Peng Yu  writes:
>>But when I try the following code, get_body() is not found. How to get
>>get_body() to work?
>
>   Did you know that this post of mine here was posted to
>   Usenet with a Python script I wrote?
>
>   That Python script has a function to show the body of
>   a post before posting. The post is contained in a file,
>   so it reads the post from that file.
>
>   I copy it here, maybe it can help some people to see
>   how I do this.
>
> # Python 3.5
>
> import email
>
>   ...
>
> def showbody( file ): # lightly edited for posting on 2023-03-19
> output = ''
> msg = email.message_from_binary_file\
> ( file, policy=email.policy.default )

I wouldn't generally be pedantic about code style, but that's giving me
painful convulsions. Backslashes for line continuations are generally
considered a bad idea (as they mean that any whitespace after the
backslash, which is often invisible, becomes significant). And not
indenting the continuation line(s) is pretty shocking. Writing it as
below is objectively better:

msg = email.message_from_binary_file(
file, policy=email.policy.default )

(Also, I too find it annoying to have to avoid, but calling a local
variable 'file' is somewhat suspect since it shadows the builtin.)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: =- and -= snag

2023-03-14 Thread Jon Ribbens via Python-list
On 2023-03-13, Morten W. Petersen  wrote:
> I was working in Python today, and sat there scratching my head as the
> numbers for calculations didn't add up.  It went into negative numbers,
> when that shouldn't have been possible.
>
> Turns out I had a very small typo, I had =- instead of -=.
>
> Isn't it unpythonic to be able to make a mistake like that?

Why would it be? How could it be? Mandating white-space between
operators would be unpythonic.

That's nothing anyway - yesterday I had an issue in TypeScript which
confused me for a while which turned out to be because 1 + 1 = 11.
(I thought the whole point of TypeScript was to prevent things like
that...)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 2.7 range Function provokes a Memory Error

2023-03-02 Thread Jon Ribbens via Python-list
On 2023-03-02, Stephen Tucker  wrote:
> The range function in Python 2.7 (and yes, I know that it is now
> superseded), provokes a Memory Error when asked to deiliver a very long
> list of values.
>
> I assume that this is because the function produces a list which it then
> iterates through.
>
> 1. Does the  range  function in Python 3.x behave the same way?

No, in Python 3 it is an iterator which produces the next number in the
sequence each time.

> 2. Is there any equivalent way that behaves more like a  for loop (that is,
> without producing a list)?

Yes, 'xrange' in Python 2 behaves like 'range' in Python 3.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Look free ID genertion (was: Is there a more efficient threading lock?)

2023-03-01 Thread Jon Ribbens via Python-list
On 2023-03-02, Chris Angelico  wrote:
> On Thu, 2 Mar 2023 at 08:01, <2qdxy4rzwzuui...@potatochowder.com> wrote:
>> On 2023-03-01 at 14:35:35 -0500,
>> avi.e.gr...@gmail.com wrote:
>> > What would have happened if all processors had been required to have
>> > some low level instruction that effectively did something in an atomic
>> > way that allowed a way for anyone using any language running on that
>> > machine a way to do a simple thing like set a lock or check it?
>>
>> Have happened?  I don't know about "required," but processors have
>> indeed had such instructions for decades; e.g., the MC68000 from the
>> early to mid 1980s (and used in the original Apple Macintosh, but I
>> digress) has/had a Test and Set instruction.
>
> As have all CPUs since; it's the only way to implement locks (push the
> locking all the way down to the CPU level).

Indeed, I remember thinking it was very fancy when they added the SWP
instruction to the ARM processor.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 3.10 Fizzbuzz

2023-03-01 Thread Jon Ribbens via Python-list
On 2023-03-01, Simon Ward  wrote:
> On Tue, Feb 28, 2023 at 04:05:19PM -0500, avi.e.gr...@gmail.com wrote:
>>Is it rude to name something "black" to make it hard for some of us to 
>>remind them of the rules or claim that our personal style is so often 
>>the opposite that it should be called "white" or at least shade of 
>>gray?
>>
>>The usual kidding aside, I have no idea what it was called black but in 
>>all seriousness this is not a black and white issue. Opinions may 
>>differ when a language provides many valid options on how to write 
>>code. If someone wants to standardize and impose some decisions, fine. 
>>But other may choose their own variant and take their chances.
>
> https://pypi.org/project/grey/
> https://pypi.org/project/white/
> https://pypi.org/project/blue/
> https://pypi.org/project/oitnb/
>
>:o
>
> It amuses me that opinionated formatter, with very little 
> configurability by design, in the face of differing opinions just 
> results in forks or wrappers that modify the behaviours that might 
> otherwise have been configuration options.

The mysterious bit is that two of the above projects do nothing except
change the default of the one configuration option that *does* exist
(line length). I mean, "black"'s line-length choice of 88 is insane,
but I don't see the point of creating new pypi projects that do nothing
except run another project with a single option set!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Jon Ribbens via Python-list
On 2023-02-28, Thomas Passin  wrote:
> On 2/28/2023 10:05 AM, Roel Schroeven wrote:
>> Op 28/02/2023 om 14:35 schreef Thomas Passin:
>>> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
 [...]
 (2) Searching for a string in another string, in a performant way, is 
 not as simple as it first appears. Your version works correctly, but 
 slowly. In some situations it doesn't matter, but in other cases it 
 will. For better performance, string searching algorithms jump ahead 
 either when they found a match or when they know for sure there isn't 
 a match for some time (see e.g. the Boyer–Moore string-search 
 algorithm). You could write such a more efficient algorithm, but then 
 it becomes more complex and more error-prone. Using a well-tested 
 existing function becomes quite attractive.
>>>
>>> Sure, it all depends on what the real task will be.  That's why I 
>>> wrote "Without knowing how general your expressions will be". For the 
>>> example string, it's unlikely that speed will be a factor, but who 
>>> knows what target strings and keys will turn up in the future?
>> On hindsight I think it was overthinking things a bit. "It all depends 
>> on what the real task will be" you say, and indeed I think that should 
>> be the main conclusion here.
>
> It is interesting, though, how pre-processing the search pattern can 
> improve search times if you can afford the pre-processing.  Here's a 
> paper on rapidly finding matches when there may be up to one misspelled 
> character.  It's easy enough to implement, though in Python you can't 
> take the additional step of tuning it to stay in cache.
>
> https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf

You've somehow title-cased that URL. The correct URL is:

https://robert.muth.org/Papers/1996-approx-multi.pdf
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Jon Ribbens via Python-list
On 2023-02-26, Chris Angelico  wrote:
> On Sun, 26 Feb 2023 at 16:16, Jon Ribbens via Python-list
> wrote:
>> On 2023-02-25, Paul Rubin  wrote:
>> > The GIL is an evil thing, but it has been around for so long that most
>> > of us have gotten used to it, and some user code actually relies on it.
>> > For example, with the GIL in place, a statement like "x += 1" is always
>> > atomic, I believe.  But, I think it is better to not have any shared
>> > mutables regardless.
>>
>> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
>> Any replacement for the GIL would have to keep the former at least,
>> plus the fact that you can do hundreds of things like list.append(foo)
>> which are all effectively atomic.
>
> The GIL is most assuredly *not* an evil thing. If you think it's so
> evil, go ahead and remove it, because we'll clearly be better off
> without it, right?

If you say so. I said nothing whatsoever about the GIL being evil.

> As it turns out, most GIL-removal attempts have had a fairly nasty
> negative effect on performance. The GIL is a huge performance boost.
>
> As to what is atomic and what is not... it's complicated, as always.
> Suppose that x (or foo.x) is a custom type:

Yes, sure, you can make x += 1 not work even single-threaded if you
make custom types which override basic operations. I'm talking about
when you're dealing with simple atomic built-in types such as integers.

> Here's the equivalent with just incrementing a global:
>
>>>> def thrd():
> ... x += 1
> ...
>>>> dis.dis(thrd)
>   1   0 RESUME   0
>
>   2   2 LOAD_FAST_CHECK  0 (x)
>   4 LOAD_CONST   1 (1)
>   6 BINARY_OP   13 (+=)
>  10 STORE_FAST   0 (x)
>  12 LOAD_CONST   0 (None)
>  14 RETURN_VALUE
>>>>
>
> The exact same sequence: load, add, store. Still not atomic.

And yet, it appears that *something* changed between Python 2
and Python 3 such that it *is* atomic:

import sys, threading
class Foo:
x = 0
foo = Foo()
y = 0
def thrd():
global y
for _ in range(1):
foo.x += 1
y += 1
threads = [threading.Thread(target=thrd) for _ in range(50)]
for t in threads: t.start()
for t in threads: t.join()
print(sys.version)
print(foo.x, y)

2.7.5 (default, Jun 28 2022, 15:30:04)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
(64489, 59854)

3.8.10 (default, Nov 14 2022, 12:59:47)
[GCC 9.4.0]
50 50

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-26 Thread Jon Ribbens via Python-list
On 2023-02-26, Barry Scott  wrote:
> On 25/02/2023 23:45, Jon Ribbens via Python-list wrote:
>> I think it is the case that x += 1 is atomic but foo.x += 1 is not.
>
> No that is not true, and has never been true.
>
>:>>> def x(a):
>:...    a += 1
>:...
>:>>>
>:>>> dis.dis(x)
>   1   0 RESUME   0
>
>   2   2 LOAD_FAST    0 (a)
>   4 LOAD_CONST   1 (1)
>   6 BINARY_OP   13 (+=)
>  10 STORE_FAST   0 (a)
>  12 LOAD_CONST   0 (None)
>  14 RETURN_VALUE
>:>>>
>
> As you can see there are 4 byte code ops executed.
>
> Python's eval loop can switch to another thread between any of them.
>
> Its is not true that the GIL provides atomic operations in python.

That's oversimplifying to the point of falsehood (just as the opposite
would be too). And: see my other reply in this thread just now - if the
GIL isn't making "x += 1" atomic, something else is.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Jon Ribbens via Python-list
On 2023-02-25, Paul Rubin  wrote:
> Jon Ribbens  writes:
>>> 1) you generally want to use RLock rather than Lock
>> Why?
>
> So that a thread that tries to acquire it twice doesn't block itself,
> etc.  Look at the threading lib docs for more info.

Yes, I know what the docs say, I was asking why you were making the
statement above. I haven't used Lock very often, but I've literally
never once in 25 years needed to use RLock. As you say, it's best
to keep the lock-protected code brief, so it's usually pretty
obvious that the code can't be re-entered.

>> What does this mean? Are you saying the GIL has been removed?
>
> Last I heard there was an experimental version of CPython with the GIL
> removed.  It is supposed to take less of a performance hit due to
> INCREF/DECREF than an earlier attempt some years back.  I don't know its
> current status.
>
> The GIL is an evil thing, but it has been around for so long that most
> of us have gotten used to it, and some user code actually relies on it.
> For example, with the GIL in place, a statement like "x += 1" is always
> atomic, I believe.  But, I think it is better to not have any shared
> mutables regardless.

I think it is the case that x += 1 is atomic but foo.x += 1 is not.
Any replacement for the GIL would have to keep the former at least,
plus the fact that you can do hundreds of things like list.append(foo)
which are all effectively atomic.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a more efficient threading lock?

2023-02-25 Thread Jon Ribbens via Python-list
On 2023-02-25, Paul Rubin  wrote:
> Skip Montanaro  writes:
>> from threading import Lock
>
> 1) you generally want to use RLock rather than Lock

Why?

> 2) I have generally felt that using locks at the app level at all is an
> antipattern.  The main way I've stayed sane in multi-threaded Python
> code is to have every mutable strictly owned by exactly one thread, pass
> values around using Queues, and have an event loop in each thread taking
> requests from Queues.
>
> 3) I didn't know that no-gil was a now thing and I'm used to having the
> GIL.  So I would have considered the multiprocessing module rather than
> threading, for something like this.

What does this mean? Are you saying the GIL has been removed?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Licensing?

2023-02-02 Thread Jon Ribbens via Python-list
On 2023-02-03, Greg Ewing  wrote:
> On 3/02/23 6:38 am, Jon Ribbens wrote:
>> If you change someone else's code then you have created a derived
>> work, which requires permission from both the original author and you
>> to copy. (Unless you change it so much that nothing remains of the
>> original author's code, of course.)
>
> "Nothing" is probably a bit extreme; somewhere between "exactly the
> same" and "completely different" there will be a borderline case,
> although exactly where the border lies would require a court case
> to determine.

Well yes, technically if you remove so much code that what remains
of the original is so de minimis that it can't be considered
copyrightable then you're good. But that doesn't seem that useful
to know, because if you've removed that much then what remains,
pretty much by definition, isn't going to be useful. You'd be
better off simply starting from scratch and having an unimpeachable
claim to own the entire copyright yourself.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Licensing?

2023-02-02 Thread Jon Ribbens via Python-list
On 2023-02-02, Stefan Ram  wrote:
>   Many licenses in the Python world are like: "You can make
>   changes, but have to leave in my Copyright notice.".
>
>   Would it be possible that the original author could not
>   claim a Copyright anymore when code has been changed?

No. If you change someone else's code then you have created a derived
work, which requires permission from both the original author and you
to copy. (Unless you change it so much that nothing remains of the
original author's code, of course.)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Usenet vs. Mailing-list

2023-01-29 Thread Jon Ribbens via Python-list
On 2023-01-29, Peter J. Holzer  wrote:
> On 2023-01-29 02:09:28 -0000, Jon Ribbens via Python-list wrote:
>> I'm not aware of any significant period in the last twenty-one years
>> that
> [the gateway]
>> hasn't been working. Although sometimes it does feel like it isn't, in
>> that I reply to a post with an answer and then several other people
>> reply significantly later with the same answer, as if my one had never
>> existed...
>
> That's just because people don't read before they post.
>
> Happens in any usenet group or mailing list (and probably in web forums,
> too; but I don't really use those). I have to admit that I'm sometimes
> guilty of this behaviour, too.

Well, let's just assume for a moment that I'm familiar with Usenet and
with mailing lists ;-) This sort of replying-without-reading seems to
happen on comp.lang.python/python-list more than usual.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Usenet vs. Mailing-list

2023-01-28 Thread Jon Ribbens via Python-list
On 2023-01-29, Ben Bacarisse  wrote:
> "Peter J. Holzer"  writes:
>
>> On 2023-01-27 21:04:58 +, Ben Bacarisse wrote:
>>> mutt...@dastardlyhq.com writes:
>>> 
>>> > Hi
>>> 
>>> It looks like you posted this question via Usenet.  comp.lang.python is
>>> essentially dead as a Usenet group.  It exists, and gets NNTP versions
>>> of mail sent to the mailing list, but nothing posted to the group via
>>> NNTP get send on the mailing list.
>>
>> This is wrong. I did get Muttley's any your postings via the
>> mailing-list.
>
> Ah, OK.  I thought that was the case but I am obviously wrong.  Has
> there been a change, or have I been wrong for a long time!?

I'm not aware of any significant period in the last twenty-one years
that it hasn't been working. Although sometimes it does feel like it
isn't, in that I reply to a post with an answer and then several
other people reply significantly later with the same answer, as if
my one had never existed... but whenever I check into it, my message
has actually always made it to the list.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Usenet vs. Mailing-list (was: evaluation question)

2023-01-28 Thread Jon Ribbens via Python-list
On 2023-01-28, Peter J. Holzer  wrote:
> On 2023-01-27 21:04:58 +, Ben Bacarisse wrote:
>> It looks like you posted this question via Usenet.  comp.lang.python is
>> essentially dead as a Usenet group.  It exists, and gets NNTP versions
>> of mail sent to the mailing list, but nothing posted to the group via
>> NNTP get send on the mailing list.
>
> This is wrong. I did get Muttley's any your postings via the
> mailing-list.

Yes, it's certainly false. I only ever post via the newsgroup,
and I can see my postings reach the list because they appear
in the list archive on the web.

https://mail.python.org/pipermail/python-list/

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ok, I feel stupid, but there must be a better way than this! (finding name of unique key in dict)

2023-01-20 Thread Jon Ribbens via Python-list
On 2023-01-20, Dino  wrote:
>
> let's say I have this list of nested dicts:
>
> [
>{ "some_key": {'a':1, 'b':2}},
>{ "some_other_key": {'a':3, 'b':4}}
> ]
>
> I need to turn this into:
>
> [
>{ "value": "some_key", 'a':1, 'b':2},
>{ "value": "some_other_key", 'a':3, 'b':4}
> ]

[{"value": key, **value} for d in input_data for key, value in d.items()]

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: FTP without username and password

2022-12-06 Thread Jon Ribbens via Python-list
On 2022-12-06, ^Bart  wrote:
> Hi Guys,
>
> usually I use this code on my Debian Bullseye:
>
> # python3 -m pyftpdlib -i 192.168.0.71 -p 21 -d /home/my_user/ftp
>
> It works, it's simply easy and perfect but... a device in my lan needs a 
> ftp folder without username and password!
>
> I tried to search on internet how to set the code above to be available 
> without username and password but... I didn't understand how to fix it :\

The code above already does make the directory available without a
username and password. Do you mean you need the directory to be
*writable* without a username and password? If so try the '-w' option.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: In code, list.clear doesn't throw error - it's just ignored

2022-11-14 Thread Jon Ribbens via Python-list
On 2022-11-14, Stefan Ram  wrote:
> Jon Ribbens  writes:
>>"""Create an array and print its length"""
>>array = [1, 2, 3]
>>array.clear
>
>   BTW: Above, there are /two/ expression statements
>   with no effect; the other one is
>
> """Create an array and print its length"""
>
>   . Apparently, linters know this and will not create
>   a warning for such string literals.

Not only do they know this, pylint will complain if you *don't* include
that line, which is why I included it ;-)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: In code, list.clear doesn't throw error - it's just ignored

2022-11-13 Thread Jon Ribbens via Python-list
On 2022-11-14, Greg Ewing  wrote:
> On 14/11/22 1:31 pm, Jon Ribbens wrote:
>> On 2022-11-13, DFS  wrote:
>>> But why is it allowed in the first place?
>> 
>> Because it's an expression, and you're allowed to execute expressions.
>
> To put it a bit more clearly, you're allowed to evaluate
> an expression and ignore the result.

... because it may have side effects, and it's not possible to determine
whether it will or not in advance.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: In code, list.clear doesn't throw error - it's just ignored

2022-11-13 Thread Jon Ribbens via Python-list
On 2022-11-13, DFS  wrote:
> On 11/13/2022 5:20 PM, Jon Ribbens wrote:
>> On 2022-11-13, DFS  wrote:
>>> In code, list.clear is just ignored.
>>> At the terminal, list.clear shows
>>> 
>>>
>>>
>>> in code:
>>> x = [1,2,3]
>>> x.clear
>>> print(len(x))
>>> 3
>>>
>>> at terminal:
>>> x = [1,2,3]
>>> x.clear
>>> 
>>> print(len(x))
>>> 3
>>>
>>>
>>> Caused me an hour of frustration before I noticed list.clear() was what
>>> I needed.
>>>
>>> x = [1,2,3]
>>> x.clear()
>>> print(len(x))
>>> 0
>> 
>> If you want to catch this sort of mistake automatically then you need
>> a linter such as pylint:
>> 
>>$ cat test.py
>>"""Create an array and print its length"""
>> 
>>array = [1, 2, 3]
>>array.clear
>>print(len(array))
>>$ pylint -s n test.py
>>* Module test
>>test.py:4:0: W0104: Statement seems to have no effect 
>> (pointless-statement)
>
>
> Thanks, I should use linters more often.
>
> But why is it allowed in the first place?

Because it's an expression, and you're allowed to execute expressions.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: In code, list.clear doesn't throw error - it's just ignored

2022-11-13 Thread Jon Ribbens via Python-list
On 2022-11-13, DFS  wrote:
> In code, list.clear is just ignored.
> At the terminal, list.clear shows
>
>
>
> in code:
> x = [1,2,3]
> x.clear
> print(len(x))
> 3
>
> at terminal:
> x = [1,2,3]
> x.clear
>
> print(len(x))
> 3
>
>
> Caused me an hour of frustration before I noticed list.clear() was what 
> I needed.
>
> x = [1,2,3]
> x.clear()
> print(len(x))
> 0

If you want to catch this sort of mistake automatically then you need
a linter such as pylint:

  $ cat test.py
  """Create an array and print its length"""

  array = [1, 2, 3]
  array.clear
  print(len(array))
  $ pylint -s n test.py
  * Module test
  test.py:4:0: W0104: Statement seems to have no effect (pointless-statement)

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Jon Ribbens via Python-list
On 2022-10-24, Chris Angelico  wrote:
> On Tue, 25 Oct 2022 at 02:45, Jon Ribbens via Python-list
> wrote:
>>
>> On 2022-10-24, Chris Angelico  wrote:
>> > On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer  wrote:
>> >> Yes, I got that. What I wanted to say was that this is indeed a bug in
>> >> html.parser and not an error (or sloppyness, as you called it) in the
>> >> input or ambiguity in the HTML standard.
>> >
>> > I described the HTML as "sloppy" for a number of reasons, but I was of
>> > the understanding that it's generally recommended to have the closing
>> > tags. Not that it matters much.
>>
>> Some elements don't need close tags, or even open tags. Unless you're
>> using XHTML you don't need them and indeed for the case of void tags
>> (e.g. , ) you must not include the close tags.
>
> Yep, I'm aware of void tags, but I'm talking about the container tags
> - in this case,  and  - which, in a lot of older HTML pages,
> are treated as "separator" tags.

Yes, hence why I went on to talk about container tags.

> Consider this content:
>
>
> Hello, world!
>
> Paragraph 2
>
> Hey look, a third paragraph!
>
>
> Stick a doctype onto that and it should be valid HTML5,

Nope, it's missing a .

>> Adding in the omitted , , , , and 
>> would make no difference and there's no particular reason to recommend
>> doing so as far as I'm aware.
>
> And yet most people do it. Why?

They agree with Tim Peters that "Explicit is better than implicit",
I suppose? ;-)

> Are you saying that it's better to omit them all?

No, I'm saying it's neither option is necessarily better than the other.

> More importantly: Would you omit all the  closing tags you can, or
> would you include them?

It would depend on how much content was inside them I guess.
Something like:

  
First item
Second item
Third item
  

is very easy to understand, but if each item was many lines long then it
may be less confusing to explicitly close - not least for indentation
purposes.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Beautiful Soup - close tags more promptly?

2022-10-24 Thread Jon Ribbens via Python-list
On 2022-10-24, Chris Angelico  wrote:
> On Mon, 24 Oct 2022 at 23:22, Peter J. Holzer  wrote:
>> Yes, I got that. What I wanted to say was that this is indeed a bug in
>> html.parser and not an error (or sloppyness, as you called it) in the
>> input or ambiguity in the HTML standard.
>
> I described the HTML as "sloppy" for a number of reasons, but I was of
> the understanding that it's generally recommended to have the closing
> tags. Not that it matters much.

Some elements don't need close tags, or even open tags. Unless you're
using XHTML you don't need them and indeed for the case of void tags
(e.g. , ) you must not include the close tags.

A minimal HTML file might look like this:


Minimal HTML file
Minimal HTML fileThis is a minimal HTML file.

which would be parsed into this:



  

Minimal HTML file
  
  

  Minimal HTML file
  This is a minimal HTML file.

  


Adding in the omitted , , , , and 
would make no difference and there's no particular reason to recommend
doing so as far as I'm aware.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: xml.etree and namespaces -- why?

2022-10-19 Thread Jon Ribbens via Python-list
On 2022-10-19, Robert Latest  wrote:
> If the XML input has namespaces, tags and attributes with prefixes
> in the form prefix:sometag get expanded to {uri}sometag where the
> prefix is replaced by the full URI.
>
> Which means that given an Element e, I cannot directly access its attributes
> using e.get() because in order to do that I need to know the URI of the
> namespace.

That's because you *always* need to know the URI of the namespace,
because that's its only meaningful identifier. If you assume that a
particular namespace always uses the same prefix then your code will be
completely broken. The following two pieces of XML should be understood
identically:

http://www.inkscape.org/namespaces/inkscape;>
  

and:

http://www.inkscape.org/namespaces/inkscape;>
  

So you can see why e.get('inkscape:label') cannot possibly work, and why
e.get('{http://www.inkscape.org/namespaces/inkscape}label') makes sense.

The xml.etree author obviously knew that this was cumbersome, and
hence you can do something like:

namespaces = {'inkspace': 'http://www.inkscape.org/namespaces/inkscape'}
element = root.find('inkspace:foo', namespaces)

which will work for both of the above pieces of XML.

But unfortunately as far as I can see nobody's thought about doing the
same for attributes rather than tags.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Find the path of a shell command

2022-10-13 Thread Jon Ribbens via Python-list
On 2022-10-12, Paulo da Silva  wrote:
> Às 22:38 de 12/10/22, Jon Ribbens escreveu:
>> On 2022-10-12, Jon Ribbens  wrote:
>>> On 2022-10-12, Paulo da Silva  wrote:
 Às 19:14 de 12/10/22, Jon Ribbens escreveu:
> On 2022-10-12, Paulo da Silva  
> wrote:
>> Às 05:00 de 12/10/22, Paulo da Silva escreveu:
>>> Hi!
>>>
>>> The simple question: How do I find the full path of a shell command
>>> (linux), i.e. how do I obtain the corresponding of, for example,
>>> "type rm" in command line?
>>>
>>> The reason:
>>> I have python program that launches a detached rm. It works pretty well
>>> until it is invoked by cron! I suspect that for cron we need to specify
>>> the full path.
>>> Of course I can hardcode /usr/bin/rm. But, is rm always in /usr/bin?
>>> What about other commands?
>>>
>> Thank you all who have responded so far.
>> I think that the the suggestion of searching the PATH env seems the best.
>> Another thing that I thought of is that of the 'which', but, to avoid
>> the mentioned recurrent problem of not knowing where 'which' is I would
>> use 'type' instead. 'type' is a bash (sh?) command.
>
> If you're using subprocess.run / subprocess.Popen then the computer is
> *already* searching PATH for you.
 Yes, and it works out of cron.
> Your problem must be that your cron
> job is being run without PATH being set, perhaps you just need to edit
> your crontab to set PATH to something sensible.
 I could do that, but I am using /etc/cron.* for convenience.

> Or just hard-code your
> program to run '/bin/rm' explicitly, which should always work (unless
> you're on Windows, of course!)
 It can also be in /bin, at least.
>>>
>>> I assume you mean /usr/bin. But it doesn't matter. As already
>>> discussed, even if 'rm' is in /usr/bin, it will be in /bin as well
>>> (or /usr/bin and /bin will be symlinks to the same place).
>>>
 A short idea is to just check /bin/rm and /usr/bin/rm, but I prefer
 searching thru PATH env. It only needs to do that once.
>>>
>>> I cannot think of any situation in which that will help you. But if for
>>> some reason you really want to do that, you can use the shutil.which()
>>> function from the standard library:
>>>
>>>  >>> import shutil
>>>  >>> shutil.which('rm')
>>>  '/usr/bin/rm'
>> 
>> Actually if I'm mentioning shutil I should probably mention
>> shutil.rmtree() as well, which does the same as 'rm -r', without
>> needing to find or run any other executables.
> Except that you can't have parallel tasks, at least in an easy way.
> Using Popen I just launch rm's and end the script.

[threading.Thread(target=shutil.rmtree, args=(item,)).start()
for item in items_to_delete]

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Find the path of a shell command

2022-10-12 Thread Jon Ribbens via Python-list
On 2022-10-12, Jon Ribbens  wrote:
> On 2022-10-12, Paulo da Silva  wrote:
>> Às 19:14 de 12/10/22, Jon Ribbens escreveu:
>>> On 2022-10-12, Paulo da Silva  wrote:
 Às 05:00 de 12/10/22, Paulo da Silva escreveu:
> Hi!
>
> The simple question: How do I find the full path of a shell command
> (linux), i.e. how do I obtain the corresponding of, for example,
> "type rm" in command line?
>
> The reason:
> I have python program that launches a detached rm. It works pretty well
> until it is invoked by cron! I suspect that for cron we need to specify
> the full path.
> Of course I can hardcode /usr/bin/rm. But, is rm always in /usr/bin?
> What about other commands?
>
 Thank you all who have responded so far.
 I think that the the suggestion of searching the PATH env seems the best.
 Another thing that I thought of is that of the 'which', but, to avoid
 the mentioned recurrent problem of not knowing where 'which' is I would
 use 'type' instead. 'type' is a bash (sh?) command.
>>> 
>>> If you're using subprocess.run / subprocess.Popen then the computer is
>>> *already* searching PATH for you.
>> Yes, and it works out of cron.
>>> Your problem must be that your cron
>>> job is being run without PATH being set, perhaps you just need to edit
>>> your crontab to set PATH to something sensible.
>> I could do that, but I am using /etc/cron.* for convenience.
>>
>>> Or just hard-code your
>>> program to run '/bin/rm' explicitly, which should always work (unless
>>> you're on Windows, of course!)
>> It can also be in /bin, at least.
>
> I assume you mean /usr/bin. But it doesn't matter. As already
> discussed, even if 'rm' is in /usr/bin, it will be in /bin as well
> (or /usr/bin and /bin will be symlinks to the same place).
>
>> A short idea is to just check /bin/rm and /usr/bin/rm, but I prefer 
>> searching thru PATH env. It only needs to do that once.
>
> I cannot think of any situation in which that will help you. But if for
> some reason you really want to do that, you can use the shutil.which()
> function from the standard library:
>
> >>> import shutil
> >>> shutil.which('rm')
> '/usr/bin/rm'

Actually if I'm mentioning shutil I should probably mention
shutil.rmtree() as well, which does the same as 'rm -r', without
needing to find or run any other executables.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Find the path of a shell command

2022-10-12 Thread Jon Ribbens via Python-list
On 2022-10-12, Paulo da Silva  wrote:
> Às 19:14 de 12/10/22, Jon Ribbens escreveu:
>> On 2022-10-12, Paulo da Silva  wrote:
>>> Às 05:00 de 12/10/22, Paulo da Silva escreveu:
 Hi!

 The simple question: How do I find the full path of a shell command
 (linux), i.e. how do I obtain the corresponding of, for example,
 "type rm" in command line?

 The reason:
 I have python program that launches a detached rm. It works pretty well
 until it is invoked by cron! I suspect that for cron we need to specify
 the full path.
 Of course I can hardcode /usr/bin/rm. But, is rm always in /usr/bin?
 What about other commands?

>>> Thank you all who have responded so far.
>>> I think that the the suggestion of searching the PATH env seems the best.
>>> Another thing that I thought of is that of the 'which', but, to avoid
>>> the mentioned recurrent problem of not knowing where 'which' is I would
>>> use 'type' instead. 'type' is a bash (sh?) command.
>> 
>> If you're using subprocess.run / subprocess.Popen then the computer is
>> *already* searching PATH for you.
> Yes, and it works out of cron.
>> Your problem must be that your cron
>> job is being run without PATH being set, perhaps you just need to edit
>> your crontab to set PATH to something sensible.
> I could do that, but I am using /etc/cron.* for convenience.
>
>> Or just hard-code your
>> program to run '/bin/rm' explicitly, which should always work (unless
>> you're on Windows, of course!)
> It can also be in /bin, at least.

I assume you mean /usr/bin. But it doesn't matter. As already
discussed, even if 'rm' is in /usr/bin, it will be in /bin as well
(or /usr/bin and /bin will be symlinks to the same place).

> A short idea is to just check /bin/rm and /usr/bin/rm, but I prefer 
> searching thru PATH env. It only needs to do that once.

I cannot think of any situation in which that will help you. But if for
some reason you really want to do that, you can use the shutil.which()
function from the standard library:

>>> import shutil
>>> shutil.which('rm')
'/usr/bin/rm'

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Find the path of a shell command

2022-10-12 Thread Jon Ribbens via Python-list
On 2022-10-12, Jon Ribbens  wrote:
> On 2022-10-12, Joe Pfeiffer  wrote:
>> Jon Ribbens  writes:
>>
>>> On 2022-10-12, Michael F. Stemper  wrote:
 On 12/10/2022 07.20, Chris Green wrote:
> ... and rm will just about always be in /usr/bin.

 On two different versions of Ubuntu, it's in /bin.
>>>
>>> It will almost always be in /bin in any Unix or Unix-like system,
>>> because it's one of the fundamental utilities that may be vital in
>>> fixing the system when it's booted in single-user mode and /usr may
>>> not be available. Also, the Filesystem Hierarchy Standard *requires*
>>> it to be in /bin.
>>>
>>> Having said that, nothing requires it not to be elsewhere *as well*,
>>> and in Ubuntu and other Linux systems it is in /usr/bin too. And because
>>> PATH for non-root users will usually contain /usr/bin before /bin (or
>>> indeed may not contain /bin at all), 'command -v rm' or 'which rm' will
>>> usually list the version of rm that is in /usr/bin.
>>>
>>> e.g. on Amazon Linux:
>>>
>>> $ which rm
>>> /usr/bin/rm
>>> $ sudo which rm
>>> /bin/rm
>>
>> Have some major Linux distributions not done usrmerge yet?  For any that
>> have, /bin is a symbolic link to /usr/bin
>
> I have immediate access to CentOS 7, Ubuntu 20, and Amazon Linux 2,
> and none of those have done that.

Sorry, in fact they have done that - I misread your comment as being
that they had symlinked the executables not the directories. This seems
quite an unwise move to me but presumably they've thought it through.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Find the path of a shell command

2022-10-12 Thread Jon Ribbens via Python-list
On 2022-10-12, Paulo da Silva  wrote:
> Às 05:00 de 12/10/22, Paulo da Silva escreveu:
>> Hi!
>> 
>> The simple question: How do I find the full path of a shell command 
>> (linux), i.e. how do I obtain the corresponding of, for example,
>> "type rm" in command line?
>> 
>> The reason:
>> I have python program that launches a detached rm. It works pretty well 
>> until it is invoked by cron! I suspect that for cron we need to specify 
>> the full path.
>> Of course I can hardcode /usr/bin/rm. But, is rm always in /usr/bin? 
>> What about other commands?
>> 
> Thank you all who have responded so far.
> I think that the the suggestion of searching the PATH env seems the best.
> Another thing that I thought of is that of the 'which', but, to avoid 
> the mentioned recurrent problem of not knowing where 'which' is I would 
> use 'type' instead. 'type' is a bash (sh?) command.

If you're using subprocess.run / subprocess.Popen then the computer is
*already* searching PATH for you. Your problem must be that your cron
job is being run without PATH being set, perhaps you just need to edit
your crontab to set PATH to something sensible. Or just hard-code your
program to run '/bin/rm' explicitly, which should always work (unless
you're on Windows, of course!)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Find the path of a shell command

2022-10-12 Thread Jon Ribbens via Python-list
On 2022-10-12, Joe Pfeiffer  wrote:
> Jon Ribbens  writes:
>
>> On 2022-10-12, Michael F. Stemper  wrote:
>>> On 12/10/2022 07.20, Chris Green wrote:
 ... and rm will just about always be in /usr/bin.
>>>
>>> On two different versions of Ubuntu, it's in /bin.
>>
>> It will almost always be in /bin in any Unix or Unix-like system,
>> because it's one of the fundamental utilities that may be vital in
>> fixing the system when it's booted in single-user mode and /usr may
>> not be available. Also, the Filesystem Hierarchy Standard *requires*
>> it to be in /bin.
>>
>> Having said that, nothing requires it not to be elsewhere *as well*,
>> and in Ubuntu and other Linux systems it is in /usr/bin too. And because
>> PATH for non-root users will usually contain /usr/bin before /bin (or
>> indeed may not contain /bin at all), 'command -v rm' or 'which rm' will
>> usually list the version of rm that is in /usr/bin.
>>
>> e.g. on Amazon Linux:
>>
>> $ which rm
>> /usr/bin/rm
>> $ sudo which rm
>> /bin/rm
>
> Have some major Linux distributions not done usrmerge yet?  For any that
> have, /bin is a symbolic link to /usr/bin

I have immediate access to CentOS 7, Ubuntu 20, and Amazon Linux 2,
and none of those have done that.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Find the path of a shell command

2022-10-12 Thread Jon Ribbens via Python-list
On 2022-10-12, Michael F. Stemper  wrote:
> On 12/10/2022 07.20, Chris Green wrote:
>> ... and rm will just about always be in /usr/bin.
>
> On two different versions of Ubuntu, it's in /bin.

It will almost always be in /bin in any Unix or Unix-like system,
because it's one of the fundamental utilities that may be vital in
fixing the system when it's booted in single-user mode and /usr may
not be available. Also, the Filesystem Hierarchy Standard *requires*
it to be in /bin.

Having said that, nothing requires it not to be elsewhere *as well*,
and in Ubuntu and other Linux systems it is in /usr/bin too. And because
PATH for non-root users will usually contain /usr/bin before /bin (or
indeed may not contain /bin at all), 'command -v rm' or 'which rm' will
usually list the version of rm that is in /usr/bin.

e.g. on Amazon Linux:

$ which rm
/usr/bin/rm
$ sudo which rm
/bin/rm
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: for -- else: what was the motivation?

2022-10-10 Thread Jon Ribbens via Python-list
On 2022-10-10, Calvin Spealman  wrote:
> On Sat, Oct 8, 2022 at 5:35 PM rbowman  wrote:
>> On 10/7/22 21:32, Axy wrote:
>> > So, seriously, why they needed else if the following pieces produce same
>> > result? Does anyone know or remember their motivation?
>>
>> In real scenarios there would be more logic in the for block that would
>> meet a condition and break out of the loop. If the condition is never
>> met, the else block runs. To steal from w3schools:
>>
>>
>> fruits = ["apple", "peach", "cherry"]
>> for x in fruits:
>>print(x)
>>if x == "banana":
>>  break
>> else:
>>print("Yes we got no bananas")
>>
>
> I wonder if for/else could have been less confusing if it was referred to
> as for-break-else and if the else clause was only valid syntax if the for
> loop actually contained a break statement in the first place.

Watch out, I suggested that here some years ago and it was derided
as being an "arrogant and foolish" idea.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Asynchronous execution of synchronous functions

2022-09-26 Thread Jon Ribbens via Python-list
On 2022-09-26, Stefan Ram  wrote:
>   So, I wanted to try to download all pages in parallel with
>   processes to avoid any GIL effect, while I don't understand
>   what the GIL actuall is. But processes didn't work here, so
>   I tried threads. This worked and now the total run time is
>   down to about 50 seconds.

Downloading things from the network is *extremely* I/O-bound.
So, as you have discovered, the GIL is going to make essentially
no difference whatsoever.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-22 Thread Jon Ribbens via Python-list
On 2022-08-22, Peter J. Holzer  wrote:
> On 2022-08-22 00:45:56 -0000, Jon Ribbens via Python-list wrote:
>> With the offset though, BeautifulSoup made an arbitrary decision to
>> use ISO-8859-1 encoding and so when you chopped the bytestring at
>> that offset it only worked because BeautifulSoup had happened to
>> choose a 1-byte-per-character encoding. Ironically, *without* the
>> "\xed\xa0\x80\xed\xbc\x9f" it wouldn't have worked.
>
> Actually it would. The unit is bytes if you feed it with bytes, and
> characters if you feed it with str.

No it isn't. If you give BeautifulSoup's 'html.parser' bytes as input,
it first chooses an encoding and decodes the bytes before sending that
output to html.parser, which is what provides the offset. So the offsets
it gives are in characters, and you've no simple way of converting that
back to byte offsets.

> (OTOH it seems that the html parser doesn't heed any 
> tags, which seems less than ideal for more pedestrian purposes.)

html.parser doesn't accept bytes as input, so it couldn't do anything
with the encoding even if it knew it. BeautifulSoup's 'html.parser'
however does look for and use  (using a regexp, natch).

>> It looks like BeautifulSoup is doing something like that, yes.
>> Personally I would be nervous about some of my files being parsed
>> as UTF-8 and some of them ISO-8859-1 (due to decoding errors rather
>> than some of the files actually *being* ISO-8859-1 ;-) )
>
> Since none of the syntactically meaningful characters have a code >=
> 0x80, you can parse HTML at the byte level if you know that it's encoded
> in a strict superset of ASCII (which all of the ISO-8859 family and
> UTF-8 are). Only if that's not true (e.g. if your files might be UTF-16
> (or Shift-JIS  or EUC, if I remember correctly) then you have to know
> the the character set.
>
> (By parsing I mean only "create a syntax tree". Obviously you have to
> know the encoding to know whether to display =ABc3 bc=BB as =AB=FC=BB or =
>=AB=C3=BC=BB.)

But the job here isn't to create a syntax tree. It's to change some of
the content, which for all we know is not ASCII.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-22 Thread Jon Ribbens via Python-list
On 2022-08-21, Chris Angelico  wrote:
> On Mon, 22 Aug 2022 at 05:43, Jon Ribbens via Python-list
> wrote:
>> On 2022-08-21, Chris Angelico  wrote:
>> > On Sun, 21 Aug 2022 at 09:31, Jon Ribbens via Python-list
>> > wrote:
>> >> On 2022-08-20, Chris Angelico  wrote:
>> >> > On Sun, 21 Aug 2022 at 03:27, Stefan Ram  
>> >> > wrote:
>> >> >> 2qdxy4rzwzuui...@potatochowder.com writes:
>> >> >> >textual representations.  That way, the following two elements are the
>> >> >> >same (and similar with a collection of sub-elements in a different 
>> >> >> >order
>> >> >> >in another document):
>> >> >>
>> >> >>   The /elements/ differ. They have the /same/ infoset.
>> >> >
>> >> > That's the bit that's hard to prove.
>> >> >
>> >> >>   The OP could edit the files with regexps to create a new version.
>> >> >
>> >> > To you and Jon, who also suggested this: how would that be beneficial?
>> >> > With Beautiful Soup, I have the line number and position within the
>> >> > line where the tag starts; what does a regex give me that I don't have
>> >> > that way?
>> >>
>> >> You mean you could use BeautifulSoup to read the file and identify the
>> >> bits you want to change by line number and offset, and then you could
>> >> use that data to try and update the file, hoping like hell that your
>> >> definition of "line" and "offset" are identical to BeautifulSoup's
>> >> and that you don't mess up later changes when you do earlier ones (you
>> >> could do them in reverse order of line and offset I suppose) and
>> >> probably resorting to regexps anyway in order to find the part of the
>> >> tag you want to change ...
>> >>
>> >> ... or you could avoid all that faff and just do re.sub()?
>> >
>> > Stefan answered in part, but I'll add that it is far FAR easier to do
>> > the analysis with BS4 than regular expressions. I'm not sure what
>> > "hoping like hell" is supposed to mean here, since the line and offset
>> > have been 100% accurate in my experience;
>>
>> Given the string:
>>
>> b"\n \r\r\n\v\n\r\xed\xa0\x80\xed\xbc\x9f\xcc\x80e\xc3\xa8?"
>>
>> what is the line number and offset of the question mark - and does
>> BeautifulSoup agree with your answer? Does the answer to that second
>> question change depending on what parser you tell BeautifulSoup to use?
>
> I'm not sure, because I don't know how to ask BS4 about the location
> of a question mark. But I replaced that with a tag, and:
>
>>>> raw = b"\n 
>>>> \r\r\n\v\n\r\xed\xa0\x80\xed\xbc\x9f\xcc\x80e\xc3\xa8"
>>>> from bs4 import BeautifulSoup
>>>> soup = BeautifulSoup(raw, "html.parser")
>>>> soup.body.sourceline
> 4
>>>> soup.body.sourcepos
> 12
>>>> raw.split(b"\n")[3]
> b'\r\xed\xa0\x80\xed\xbc\x9f\xcc\x80e\xc3\xa8'
>>>> raw.split(b"\n")[3][12:]
> b''
>
> So, yes, it seems to be correct. (Slightly odd in that the sourceline
> is 1-based but the sourcepos is 0-based, but that is indeed the case,
> as confirmed with a much more straight-forward string.)
>
> And yes, it depends on the parser, but I'm using html.parser and it's fine.

Hah, yes, it appears html.parser does an end-run about my lovely
carefully crafted hard case by not even *trying* to work out what
type of line endings the file uses and is just hard-coded to only
recognise "\n" as a line ending.

With the offset though, BeautifulSoup made an arbitrary decision to
use ISO-8859-1 encoding and so when you chopped the bytestring at
that offset it only worked because BeautifulSoup had happened to
choose a 1-byte-per-character encoding. Ironically, *without* the
"\xed\xa0\x80\xed\xbc\x9f" it wouldn't have worked.

>> (If your answer is "if the input contains \xed\xa0\x80\xed\xbc\x9f then
>> I am happy with the program throwing an exception" then feel free to
>> remove that substring from the question.)
>
> Malformed UTF-8 doesn't seem to be a problem. Every file here seems to
> be either UTF-8 or ISO-8859, and in the latter case, I'm assuming
> 8859-1. So I would probably just let this one go through as 8859-1.

It looks like BeautifulSoup is doing something like that, yes.
Personally I would be nervous about some of my files being parsed
as UTF-8 and some of t

Re: Mutating an HTML file with BeautifulSoup

2022-08-22 Thread Jon Ribbens via Python-list
On 2022-08-21, Peter J. Holzer  wrote:
> On 2022-08-20 21:51:41 -0000, Jon Ribbens via Python-list wrote:
>> On 2022-08-20, Stefan Ram  wrote:
>> > Jon Ribbens  writes:
>> >>... or you could avoid all that faff and just do re.sub()?
>
>> > source = ''
>> >
>> > # Use Python to change the source, keeping the order of attributes.
>> >
>> > result = re.sub( r'href\s*=\s*"http"', r'href="https"', source )
>> > result = re.sub( r"href\s*=\s*'http'", r"href='https'", result )
>
> Depending on the content of the site, this might replace some stuff
> which is not a link.
>
>> You could go a bit harder with the regexp of course, e.g.:
>> 
>>   result = re.sub(
>>   r"""(<\s*a\s+[^>]*href\s*=\s*)(['"])\s*OLD\s*\2""",
>
> This will fail on:
> 

I've seen *a lot* of bad/broken/weird HTML over the years, and I don't
believe I've ever seen anyone do that. (Wrongly putting an 'alt'
attribute on an 'a' element is very common, on the other hand ;-) )

> The problem can be solved with regular expressions (and given the
> constraints I think I would prefer that to using Beautiful Soup), but
> getting the regexps right is not trivial, at least in the general case.

I would like to see the regular expression that could fully parse
general HTML...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-21 Thread Jon Ribbens via Python-list
On 2022-08-21, Chris Angelico  wrote:
> On Sun, 21 Aug 2022 at 09:31, Jon Ribbens via Python-list
> wrote:
>> On 2022-08-20, Chris Angelico  wrote:
>> > On Sun, 21 Aug 2022 at 03:27, Stefan Ram  wrote:
>> >> 2qdxy4rzwzuui...@potatochowder.com writes:
>> >> >textual representations.  That way, the following two elements are the
>> >> >same (and similar with a collection of sub-elements in a different order
>> >> >in another document):
>> >>
>> >>   The /elements/ differ. They have the /same/ infoset.
>> >
>> > That's the bit that's hard to prove.
>> >
>> >>   The OP could edit the files with regexps to create a new version.
>> >
>> > To you and Jon, who also suggested this: how would that be beneficial?
>> > With Beautiful Soup, I have the line number and position within the
>> > line where the tag starts; what does a regex give me that I don't have
>> > that way?
>>
>> You mean you could use BeautifulSoup to read the file and identify the
>> bits you want to change by line number and offset, and then you could
>> use that data to try and update the file, hoping like hell that your
>> definition of "line" and "offset" are identical to BeautifulSoup's
>> and that you don't mess up later changes when you do earlier ones (you
>> could do them in reverse order of line and offset I suppose) and
>> probably resorting to regexps anyway in order to find the part of the
>> tag you want to change ...
>>
>> ... or you could avoid all that faff and just do re.sub()?
>
> Stefan answered in part, but I'll add that it is far FAR easier to do
> the analysis with BS4 than regular expressions. I'm not sure what
> "hoping like hell" is supposed to mean here, since the line and offset
> have been 100% accurate in my experience;

Given the string:

b"\n \r\r\n\v\n\r\xed\xa0\x80\xed\xbc\x9f\xcc\x80e\xc3\xa8?"

what is the line number and offset of the question mark - and does
BeautifulSoup agree with your answer? Does the answer to that second
question change depending on what parser you tell BeautifulSoup to use?

(If your answer is "if the input contains \xed\xa0\x80\xed\xbc\x9f then
I am happy with the program throwing an exception" then feel free to
remove that substring from the question.)

> the only part I'm unsure about is where the _end_ of the tag is (and
> maybe there's a way I can use BS4 again to get that??).

There doesn't seem to be. More to the point, there doesn't seem to be
a way to find out where the *attributes* are, so as I said you'll most
likely end up using regexps anyway.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-20 Thread Jon Ribbens via Python-list
On 2022-08-20, Stefan Ram  wrote:
> Jon Ribbens  writes:
>>... or you could avoid all that faff and just do re.sub()?
>
> import bs4
> import re
>
> source = ''
>
> # Use Python to change the source, keeping the order of attributes.
>
> result = re.sub( r'href\s*=\s*"http"', r'href="https"', source )
> result = re.sub( r"href\s*=\s*'http'", r"href='https'", result )

You could go a bit harder with the regexp of course, e.g.:

  result = re.sub(
  r"""(<\s*a\s+[^>]*href\s*=\s*)(['"])\s*OLD\s*\2""",
  r"\1\2NEW\2",
  source,
  flags=re.IGNORECASE
  )

> # Now use BeautifulSoup only for the verification of the result.
>
> reference = bs4.BeautifulSoup( source, features="html.parser" )
> for a in reference.find_all( "a" ):
> if a[ 'href' ]== 'http': a[ 'href' ]='https'
>
> print( bs4.BeautifulSoup( result, features="html.parser" )== reference )

Hmm, yes that seems like a pretty good idea.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-20 Thread Jon Ribbens via Python-list
On 2022-08-20, Chris Angelico  wrote:
> On Sun, 21 Aug 2022 at 03:27, Stefan Ram  wrote:
>> 2qdxy4rzwzuui...@potatochowder.com writes:
>> >textual representations.  That way, the following two elements are the
>> >same (and similar with a collection of sub-elements in a different order
>> >in another document):
>>
>>   The /elements/ differ. They have the /same/ infoset.
>
> That's the bit that's hard to prove.
>
>>   The OP could edit the files with regexps to create a new version.
>
> To you and Jon, who also suggested this: how would that be beneficial?
> With Beautiful Soup, I have the line number and position within the
> line where the tag starts; what does a regex give me that I don't have
> that way?

You mean you could use BeautifulSoup to read the file and identify the
bits you want to change by line number and offset, and then you could
use that data to try and update the file, hoping like hell that your
definition of "line" and "offset" are identical to BeautifulSoup's
and that you don't mess up later changes when you do earlier ones (you
could do them in reverse order of line and offset I suppose) and
probably resorting to regexps anyway in order to find the part of the
tag you want to change ...

... or you could avoid all that faff and just do re.sub()?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mutating an HTML file with BeautifulSoup

2022-08-20 Thread Jon Ribbens via Python-list
On 2022-08-19, Chris Angelico  wrote:
> What's the best way to precisely reconstruct an HTML file after
> parsing it with BeautifulSoup?
>
> Using the Alice example from the BS4 docs:
>
 html_doc = """The Dormouse's story
>
>The Dormouse's story
>
>Once upon a time there were three little sisters; and
> their names were
>http://example.com/elsie; class="sister" id="link1">Elsie,
>http://example.com/lacie; class="sister" id="link2">Lacie and
>http://example.com/tillie; class="sister" id="link3">Tillie;
> and they lived at the bottom of a well.
>
>...
> """
 print(soup)
>The Dormouse's story
>
>The Dormouse's story
>Once upon a time there were three little sisters; and
> their names were
>http://example.com/elsie; id="link1">Elsie,
>http://example.com/lacie; id="link2">Lacie and
>http://example.com/tillie; id="link3">Tillie;
> and they lived at the bottom of a well.
>...
>

>
> Note two distinct changes: firstly, whitespace has been removed, and
> secondly, attributes are reordered (I think alphabetically). There are
> other canonicalizations being done, too.
>
> I'm trying to make some automated changes to a huge number of HTML
> files, with minimal diffs so they're easy to validate. That means that
> spurious changes like these are very much unwanted. Is there a way to
> get BS4 to reconstruct the original precisely?
>
> The mutation itself would be things like finding an anchor tag and
> changing its href attribute. Fairly simple changes, but might alter
> the length of the file (eg changing "http://example.com/; into
> "https://example.com/;). I'd like to do them intelligently rather than
> falling back on element.sourceline and element.sourcepos, but worst
> case, that's what I'll have to do (which would be fiddly).

I'm tempting the Wrath of Zalgo by saying it, but ... regexp?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-18, Tobiah  wrote:
>> You configure the web server to send:
>> 
>>  Content-Type: text/html; charset=...
>> 
>> in the HTTP header when it serves HTML files.
>
> So how does this break down?  When a person enters
> Montréal, Quebéc into a form field, what are they
> doing on the keyboard to make that happen?

It depends on what keybaord they have. Using a standard UK or US
("qwerty") keyboard and Windows you should be able to type "é" by
holding down the 'Alt' key to the right of the spacebar, and typing
'e'.  If they're using a French ("azerty") keyboard then I think they
can enter it by holding 'shift' and typing '2'.

> As the string sits there in the text box, is it latin1, or utf-8
> or something else?

That depends on which browser you're using. I think it's quite likely
it will use UTF-32 (i.e. fixed-width 32 bits per character).

> How does the browser know what sort of data it has in that text box?

It's a text box, so it knows it's text.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-18, Tobiah  wrote:
>> Generally speaking browser submisisons were/are supposed to be sent
>> using the same encoding as the page, so if you're sending the page
>> as "latin1" then you'll see that a fair amount I should think. If you
>> send it as "utf-8" then you'll get 100% utf-8 back.
>
> The only trick I know is to use .  Would
> that 'send' the post as utf-8?  I always expected it had more
> to do with the way the user entered the characters.  How do
> they by the way, enter things like Montréal, Quebéc.  When they
> enter that into a text box on a web page can we say it's in
> a particular encoding at that time?  At submit time?

You configure the web server to send:

Content-Type: text/html; charset=...

in the HTTP header when it serves HTML files. Another way is to put:



or:



in the  section of your HTML document. The HTML "standard"
nowadays says that you are only allowed to use the "utf-8" encoding,
but if you use another encoding then browsers will generally use that
as both the encoding to use when reading the HTML file and the encoding
to use when submitting form data.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-17, Barry  wrote:
>> On 17 Aug 2022, at 18:30, Jon Ribbens via Python-list 
>>  wrote:
>> On 2022-08-17, Tobiah  wrote:
>>> I get data from various sources; client emails, spreadsheets, and
>>> data from web applications.  I find that I can do 
>>> some_string.decode('latin1')
>>> to get unicode that I can use with xlsxwriter,
>>> or put  in the header of a web page to display
>>> European characters correctly.  But normally UTF-8 is recommended as
>>> the encoding to use today.  latin1 works correctly more often when I
>>> am using data from the wild.  It's frustrating that I have to play
>>> a guessing game to figure out how to use incoming text.   I'm just wondering
>>> if there are any thoughts.  What if we just globally decided to use utf-8?
>>> Could that ever happen?
>> 
>> That has already been decided, as much as it ever can be. UTF-8 is
>> essentially always the correct encoding to use on output, and almost
>> always the correct encoding to assume on input absent any explicit
>> indication of another encoding. (e.g. the HTML "standard" says that
>> all HTML files must be UTF-8.)
>> 
>> If you are finding that your specific sources are often encoded with
>> latin-1 instead then you could always try something like:
>> 
>>try:
>>text = data.decode('utf-8')
>>except UnicodeDecodeError:
>>text = data.decode('latin-1')
>> 
>> (I think latin-1 text will almost always fail to be decoded as utf-8,
>> so this would work fairly reliably assuming those are the only two
>> encodings you see.)
>
> Only if a reserved byte is used in the string.
> It will often work in either.

Because it's actually ASCII and hence there's no difference between
interpreting it as utf-8 or iso-8859-1? In which case, who cares?

> For web pages it cannot be assumed that markup saying it’s utf-8 is
> correct. Many pages are I fact cp1252. Usually you find out because
> of a smart quote that is 0xa0 is cp1252 and illegal in utf-8.

Hence what I said above. But if a source explicitly states an encoding
and it's false then these days I see little need for sympathy.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UTF-8 and latin1

2022-08-18 Thread Jon Ribbens via Python-list
On 2022-08-17, Tobiah  wrote:
>> That has already been decided, as much as it ever can be. UTF-8 is
>> essentially always the correct encoding to use on output, and almost
>> always the correct encoding to assume on input absent any explicit
>> indication of another encoding. (e.g. the HTML "standard" says that
>> all HTML files must be UTF-8.)

> I got an email from a client with blast text that
> was in French with stuff like: Montréal, Quebéc.
> latin1 did the trick.

There's no accounting for the Québécois. They think they speak French.

> Also, whenever I get a spreadsheet from a client and save as .csv,
> or take browser data through PHP, it always seems to work with latin1,
> but not UTF-8.

That depends on how you "saved as .csv" and what you did with PHP.
Generally speaking browser submisisons were/are supposed to be sent
using the same encoding as the page, so if you're sending the page
as "latin1" then you'll see that a fair amount I should think. If you
send it as "utf-8" then you'll get 100% utf-8 back.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: UTF-8 and latin1

2022-08-17 Thread Jon Ribbens via Python-list
On 2022-08-17, Tobiah  wrote:
> I get data from various sources; client emails, spreadsheets, and
> data from web applications.  I find that I can do some_string.decode('latin1')
> to get unicode that I can use with xlsxwriter,
> or put  in the header of a web page to display
> European characters correctly.  But normally UTF-8 is recommended as
> the encoding to use today.  latin1 works correctly more often when I
> am using data from the wild.  It's frustrating that I have to play
> a guessing game to figure out how to use incoming text.   I'm just wondering
> if there are any thoughts.  What if we just globally decided to use utf-8?
> Could that ever happen?

That has already been decided, as much as it ever can be. UTF-8 is
essentially always the correct encoding to use on output, and almost
always the correct encoding to assume on input absent any explicit
indication of another encoding. (e.g. the HTML "standard" says that
all HTML files must be UTF-8.)

If you are finding that your specific sources are often encoded with
latin-1 instead then you could always try something like:

try:
text = data.decode('utf-8')
except UnicodeDecodeError:
text = data.decode('latin-1')

(I think latin-1 text will almost always fail to be decoded as utf-8,
so this would work fairly reliably assuming those are the only two
encodings you see.)

Or you could use something fancy like https://pypi.org/project/chardet/

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fwd: timedelta object recursion bug

2022-07-28 Thread Jon Ribbens via Python-list
On 2022-07-28, Ben Hirsig  wrote:
> Hi, I noticed this when using the requests library in the response.elapsed
> object (type timedelta). Tested using the standard datetime library alone
> with the example displayed on
> https://docs.python.org/3/library/datetime.html#examples-of-usage-timedelta
>
> It appears as though the timedelta object recursively adds its own
> attributes (min, max, resolution) as further timedelta objects. I’m not
> sure how deep they go, but presumably hitting the recursion limit.
>
>>from datetime import timedelta
>>year = timedelta(days=365)
>>print(year.max)
>   9 days, 23:59:59.99
>>print(year.max.min.max.resolution.max.min)
>   -9 days, 0:00:00

Why do you think this is a bug?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: REPL with multiple function definitions

2022-06-27 Thread Jon Ribbens via Python-list
On 2022-06-26, Rob Cliffe  wrote:
> This 2-line program
>
> def f(): pass
> def g(): pass
>
> runs silently (no Exception).  But:
>
> 23:07:02 c:\>python
> Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32 
> bit (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> def f(): pass
> ... def g(): pass
>    File "", line 2
>      def g(): pass
>      ^
> SyntaxError: invalid syntax
> >>>
>
> Is there a good reason for this?

For some reason, the REPL can't cope with one-line blocks like that.
If you put a blank line after each one-block line then it will work.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to replace characters in a string?

2022-06-08 Thread Jon Ribbens via Python-list
On 2022-06-08, Dave  wrote:
> I misunderstood how it worked, basically I’ve added this function:
>
> def filterCommonCharacters(theString):
> myNewString = theString.replace("\u2019", "'")
> return myNewString

> Which returns a new string replacing the common characters.
>
> This can easily be extended to include other characters as and when
> they come up by adding a line as so:
>
> myNewString = theString.replace("\u2014", “]”  #just an example
>
> Which is what I was trying to achieve.

Here's a head-start on some characters you might want to translate,
mostly spaces, hyphens, quotation marks, and ligatures:

def unicode_translate(s):
return s.translate({
8192: ' ', 8193: ' ', 8194: ' ', 8195: ' ', 8196: ' ',
8197: ' ', 198: 'AE', 8199: ' ', 8200: ' ', 8201: ' ',
8202: ' ', 8203: '', 64258: 'fl', 8208: '-', 8209: '-',
8210: '-', 8211: '-', 8212: '-', 8722: '-', 8216: "'",
8217: "'", 8220: '"', 8221: '"', 64256: 'ff', 160: ' ',
64260: 'ffl', 8198: ' ', 230: 'ae', 12288: ' ', 173: '',
497: 'DZ', 498: 'Dz', 499: 'dz', 64259: 'ffi', 8230: '...',
64257: 'fi', 64262: 'st'})

If you want to go further then the Unidecode package might be helpful:

https://pypi.org/project/Unidecode/

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python & nmap

2022-05-18 Thread Jon Ribbens via Python-list
On 2022-05-18, ^Bart  wrote:
> THE INPUT
> -
> import nmap
> nm.scan(hosts='192.168.205.0/24', arguments='-n -sP -PE -PA21,23,80,3389')
> hosts_list = [(x, nm[x]['status']['state']) for x in nm.all_hosts()]
> for host, status in hosts_list:
>   print('{0}:{1}'.host)
>
> THE OUTPUT
> -
> Traceback (most recent call last):
>File "/home/gabriele/Documenti/Python/nmap.py", line 1, in 
>  import nmap
>File "/home/gabriele/Documenti/Python/nmap.py", line 2, in 
>  nm.scan(hosts='192.168.205.0/24', arguments='-n -sP -PE 
> -PA21,23,80,3389')
> NameError: name 'nm' is not defined

You forgot the second line (after 'import nmap' and before 'nm.scan()'):

nm = nmap.PortScanner()

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Printing Unicode strings in a list

2022-04-28 Thread Jon Ribbens via Python-list
On 2022-04-28, Stephen Tucker  wrote:
> Hi PythonList Members,
>
> Consider the following log from a run of IDLE:
>
>==
>
> Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)]
> on win32
> Type "copyright", "credits" or "license()" for more information.
 print (u"\u2551")
> ║
 print ([u"\u2551"])
> [u'\u2551']

>
>==
>
> Yes, I am still using Python 2.x - I have good reasons for doing so and
> will be moving to Python 3.x in due course.
>
> I have the following questions arising from the log:
>
> 1. Why does the second print statement not produce [ ║]  or ["║"] ?

print(x) implicitly calls str(x) to convert 'x' to a string for output.
lists don't have their own str converter, so fall back to repr instead,
which outputs '[', followed by the repr of each list item separated by
', ', followed by ']'.

> 2. Should the second print statement produce [ ║]  or ["║"] ?

There's certainly no obvious reason why it *should*, and pretty decent
reasons why it shouldn't (it would be a hybrid mess of Python-syntax
repr output and raw string output).

> 3. Given that I want to print a list of Unicode strings so that their
> characters are displayed (instead of their Unicode codepoint definitions),
> is there a more Pythonic way of doing it than concatenating them into a
> single string and printing that?

print(' '.join(list_of_strings)) is probably most common. I suppose you
could do print(*list_of_strings) if you like, but I'm not sure I'd call
it "pythonic" as I've never seen anyone do that (that doesn't mean of
course that other people haven't seen it done!) Personally I only tend
to use print() for debugging output.

> 4. Does Python 3.x exhibit the same behaviour as Python 2.x in this respect?

Yes.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-19 Thread Jon Ribbens via Python-list
On 2022-04-19, Barry  wrote:
>> On 19 Apr 2022, at 19:38, Dennis Lee Bieber  wrote:
>> *I /think/ this is the year used for leap-day calculations, and
>>  why some leap centuries are skipped as it is really less than a
>>  quarter day per year, so eventually one gets to over-correcting
>>  by a day.
>
> Leap century is skip unless it’s a leap quadra century.

Indeed, which is why "leap=not year & 3" works for years in
range(1901, 2100). Which I have found useful before when
programming in an assembly language that has no division
operation ;-)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-19 Thread Jon Ribbens via Python-list
On 2022-04-19, Loris Bennett  wrote:
> If I am merely trying to represent part a very large number of seconds
> as a number of years, 365 days per year does not seem that controversial
> to me.  Obviously there are issues if you expect all periods of an
> integer number of years which start on a given date to all end on the
> same date.
>
> In my little niche, I just need a very simple period and am anyway not
> bothered about years, since in my case the number of days is usually
> capped at 14 and only in extremely exceptional circumstances could it
> get up to anywhere near 100.
>
> However, surely there are plenty of people measuring durations of a few
> hours or less who don't want to have to deal with seconds all the time
> (I am in fact also in this other group when I record my working hours).

Well, that's my point. Everyone's all in their own slightly-different
little niches. There isn't one straightforward standard that makes all
or even most of them happy.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-19 Thread Jon Ribbens via Python-list
On 2022-04-19, Loris Bennett  wrote:
> Jon Ribbens  writes:
>> On 2022-04-19, Loris Bennett  wrote:
>>> I now realise that timedelta is not really what I need.  I am interested
>>> solely in pure periods, i.e. numbers of seconds,
>>
>> That's exactly what timedelta is.
>>
>>> that I can convert back and forth from a format such as
>>>
>>>   11-22::44:55
>>
>> I don't recognise that format and can't work out what it means.
>> It should be trivial to write functions to parse whatever format
>> you wanted and convert between it and timedelta objects though.
>
> days-hours:minutes:seconds

If by 'days' it means '86,400 seconds' then that's very easily
convertible to and from timedelta.

>> I would be very surprised if any language supported the arbitrary format
>> above you happen to be interested in!
>
> But most languages support fairly arbitrary formatting of timedate-style
> objects.  It doesn't seem unreasonable to me that such formatting might
> be available for simple periods.
>
>>> I would have thought that periods crop up all over
>>> the place and therefore formatting as strings and parsing of string
>>> would be supported natively by most modern languages.  Apparently not.
>>
>> I think most languages think that a simple number suffices to represent
>> a fixed time period (commonly seconds or milliseconds). And if you want
>> more dynamic intervals (e.g. x months y days) then there is insufficient
>> consensus as to what that actually means.
>
> Maybe.  It just seems to me that once you get up to more than a few
> hundred seconds, the ability to convert and from a more readable format
> becomes very useful.  The length of a month may be unclear, but the
> definitions for year, week, day, hours, and minute are all trivial.

Eh? The definitions for "year, week, day" are not in the slightest bit
trivial (unless you define 'day' as '86,400 seconds', in which case
'year' is still not remotely trivial).

I think the issue is simply lack of consensus. Even though ISO 8601,
which is extremely common (possibly even ubiquitous, for anything
modern) for the format of date/times, also defines a format for
durations (e.g. 'P4Y3M' for '4 years 3 months'), I don't think
I have ever seen it used in practice - not least because apparently
it doesn't define what it actually means. So there isn't one simple
standard agreed by everyone that is an obvious candidate for inclusion
in language standard libraries.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-19 Thread Jon Ribbens via Python-list
On 2022-04-19, Loris Bennett  wrote:
> I now realise that timedelta is not really what I need.  I am interested
> solely in pure periods, i.e. numbers of seconds,

That's exactly what timedelta is.

> that I can convert back and forth from a format such as
>
>   11-22::44:55

I don't recognise that format and can't work out what it means.
It should be trivial to write functions to parse whatever format
you wanted and convert between it and timedelta objects though.

> It is obviously fairly easy to rustle up something to do this, but I am
> surprised that this is not baked into Python (such a class also seems to
> be missing from R).

I would be very surprised if any language supported the arbitrary format
above you happen to be interested in!

> I would have thought that periods crop up all over
> the place and therefore formatting as strings and parsing of string
> would be supported natively by most modern languages.  Apparently not.

I think most languages think that a simple number suffices to represent
a fixed time period (commonly seconds or milliseconds). And if you want
more dynamic intervals (e.g. x months y days) then there is insufficient
consensus as to what that actually means.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-16 Thread Jon Ribbens via Python-list
On 2022-04-16, Dennis Lee Bieber  wrote:
> On Sat, 16 Apr 2022 20:35:22 - (UTC), Jon Ribbens
> declaimed the following:
>>I can categorically guarantee you it is not. But let's put it a
>>different way, if you like, if I want to add 24 hours, i.e. 86,400
>>seconds (or indeed any other fixed time period), to a timezone-aware
>>datetime in Python, how do I do it?  It would appear that, without
>>converting to UTC before doing the calculation, you can't.
>
> Which is probably the recommended means to do just that.

Yes, as I've already mentioned it is good advice to always use UTC
when doing date/time calculations and only convert to another timezone
for display. However it is somewhat surprising that Python's datetime
simply *does not work* when doing arithmetic on timezone-aware objects.
It's not "disrecommended", it's straight-up broken.

> The only thing that is most noticeable about UTC is the incorporation
> of leap-seconds.

I've never yet managed to find an application where leap-seconds matter
;-)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-16 Thread Jon Ribbens via Python-list
On 2022-04-16, Peter J. Holzer  wrote:
> On 2022-04-16 14:22:04 -0000, Jon Ribbens via Python-list wrote:
>> On 2022-04-16, Jon Ribbens  wrote:
>> > On 2022-04-16, Peter J. Holzer  wrote:
>> >> Python missed the switch to DST here, the timezone is wrong.
>> >
>> > Because you didn't let it use any timezone information. You need to
>> > either use the third-party 'pytz' module, or in Python 3.9 or above,
>> > the built-in 'zoneinfo' module.
>>
>> ... although now having looked into the new 'zoneinfo' module slightly,
>> it really should have a giant red flashing notice at the top of it
>> saying "BEWARE, TIMEZONES IN PYTHON ARE UTTERLY BROKEN, NEVER USE THEM".
>>
>> Suppose we do this:
>>
>> >>> import datetime, zoneinfo
>> >>> LOS_ANGELES = zoneinfo.ZoneInfo('America/Los_Angeles')
>> >>> UTC = zoneinfo.ZoneInfo('UTC')
>> >>> d = datetime.datetime(2020, 10, 31, 12, tzinfo=LOS_ANGELES)
>> >>> print(d)
>> 2020-10-31 12:00:00-07:00
>> >>> d1 = d + datetime.timedelta(days=1)
>> >>> print(d1)
>> 2020-11-01 12:00:00-08:00
>>
>> d1 is *wrong*.
>
> No, this is correct. That's the result you want.

I can categorically guarantee you it is not. But let's put it a
different way, if you like, if I want to add 24 hours, i.e. 86,400
seconds (or indeed any other fixed time period), to a timezone-aware
datetime in Python, how do I do it?  It would appear that, without
converting to UTC before doing the calculation, you can't.

> So why didn't this work for me (I also used Python 3.9)? My guess is
> that astimezone() doesn't pick the correct time zone.

astimezone() doesn't pick a time zone at all. It works out the current
local offset from UTC. It doesn't know anything about when or if that
offset ever changes.

>> timedelta(days=1) is 24 hours (as you can check by
>> calling timedelta(days=1).total_seconds() ),
>
> It shouldn't be. 1 Day is not 24 hours in the real world.

Nevertheless, timedelta is a fixed time period so that is the only
definition possible.

>> then it can pretend timezones don't exist and do 'naive' arithmetic.
>
> On the contrary. When a datetime is timezone aware, it must use that
> timezone's rules. Adding one day to a datetime just before a DST switch
> must add 23 or 25 hours, not 24. This is NOT naive.

But it is. It's adding 24 hours and while it's doing so it's naively
assuming that the UTC offset doesn't change during that period. Then
once it's got that naive result it's labelling it with what it thinks
is the timezone data at that time.

Here's another example to prove the point:

>>> LONDON = zoneinfo.ZoneInfo('Europe/London')
>>> d0 = datetime.datetime(2022, 3, 27, 0, tzinfo=LONDON)
>>> print(d0)
2022-03-27 00:00:00+00:00
>>> print(d0 + datetime.timedelta(seconds=3600+1800))
2022-03-27 01:30:00+00:00

That is impossible - 2022-03-27 01:30 is a time that *doesn't exist*
in the Europe/London timezone. At 01:00 the clocks moved instantly
to 02:00 as daylight savings kicked in. So the following is wrong too:

>>> print(d0 + datetime.timedelta(seconds=3600*2))
2022-03-27 02:00:00+01:00

That's not 2 hours after midnight, that's 1 hour after midnight.

Doing the calculations in UTC works of course:

>>> print((d0.astimezone(UTC) + 
datetime.timedelta(seconds=3600+1800)).astimezone(LONDON))
2022-03-27 02:30:00+01:00
>>> print((d0.astimezone(UTC) + 
datetime.timedelta(seconds=3600*2)).astimezone(LONDON))
2022-03-27 03:00:00+01:00

> (There is an ambiguity, though: Should 2021-03-27T12:00 CEST -
> 2021-03-26T12:00 CET return 1 day or 25 hours? Both results are correct,
> and depending on context you might prefer one or the other).

But if you're returning that result as a timedelta then only "25 hours"
is correct (or indeed "1 day 3,600 seconds"), because for a timedelta a
day is 24 hours *by definition*.

>> There is a general guideline that you should always keep and use your
>> datetimes as UTC, only ever using timezones for the purposes of display.
>> Usually this is because it keeps things simpler for the programmer, and
>> hence they are less likely to introduce bugs into their programs.
>
> While I generally do this (and often preach it to my collegues) it must
> be stated that this is only a GENERAL guide line.

Yes, that's what I just said.

>> It appears that with Python it's not so much a guideline as an
>> absolute concrete rule, and not because programmers will introduce
>> bugs, but because you need to avoid bugs in the standard library!
>
> As a programmer you must always adapt to the problem. Saying "I must do
> it the wrong way because my library is buggy" is just lazy.

I didn't say any of that. I said you must do it the conservative way,
and it's not "my library" that's buggy, it's the language's built-in
*standard library* that's buggy.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-16 Thread Jon Ribbens via Python-list
On 2022-04-16, Peter J. Holzer  wrote:
> On 2022-04-16 13:47:32 -0000, Jon Ribbens via Python-list wrote:
>> That's impossible unless you redefine 'timedelta' from being, as it is
>> now, a fixed-length period of time, to instead being the difference
>> between two specific dates and times in specific timezones. Days and
>> months have different lengths depending on when and where you are.
>
> That's what I would have expected it to be. Otherwise, why bother with a
> class when a simple float suffices?
>
> Date arithmetic isn't simple. You need complex data types to implement
> it correctly.
>
>> >> It's an undocumented feature of timedelta that by 'day' it means '86400
>> >> seconds'.
>> >
>> > I'd call that a bug, not a feature:
>>
>> It's the only possible way of implementing it,
>
> It's definitely not the only possible way of implementing it.

It's the only possible way of implementing a fixed time period, which is
what timedelta is (modulo the bugs in 'datetime' I mentioned in my other
post).

>> > Python missed the switch to DST here, the timezone is wrong.
>>
>> Because you didn't let it use any timezone information.
>
> I used astimezone() and it returned something something that Python
> calls "timezone aware" containing time zone information which is
> correct for that time and my location
> (tzinfo=datetime.timezone(datetime.timedelta(seconds=3600), 'CET')). Why
> should I expect a "timezone aware" datetime to not be actually timezone
> aware?

I think by "timezone aware" it doesn't mean "timezone aware", it means
"has a specified fixed offset from UTC". Yes this is distinctly sub-optimal.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-16 Thread Jon Ribbens via Python-list
On 2022-04-16, Jon Ribbens  wrote:
> On 2022-04-16, Peter J. Holzer  wrote:
>> Python missed the switch to DST here, the timezone is wrong.
>
> Because you didn't let it use any timezone information. You need to
> either use the third-party 'pytz' module, or in Python 3.9 or above,
> the built-in 'zoneinfo' module.

... although now having looked into the new 'zoneinfo' module slightly,
it really should have a giant red flashing notice at the top of it
saying "BEWARE, TIMEZONES IN PYTHON ARE UTTERLY BROKEN, NEVER USE THEM".

Suppose we do this:

>>> import datetime, zoneinfo
>>> LOS_ANGELES = zoneinfo.ZoneInfo('America/Los_Angeles')
>>> UTC = zoneinfo.ZoneInfo('UTC')
>>> d = datetime.datetime(2020, 10, 31, 12, tzinfo=LOS_ANGELES)
>>> print(d)
2020-10-31 12:00:00-07:00
>>> d1 = d + datetime.timedelta(days=1)
>>> print(d1)
2020-11-01 12:00:00-08:00

d1 is *wrong*. timedelta(days=1) is 24 hours (as you can check by
calling timedelta(days=1).total_seconds() ), but d1 is 25 hours later
than 'd'. If we do the calculations in UTC instead, it works correctly:

>>> print((d.astimezone(UTC) + 
datetime.timedelta(days=1)).astimezone(LOS_ANGELES))
2020-11-01 11:00:00-08:00

It seems that Python is assuming that if the tzinfo attributes of two
datetimes are the same, then it can pretend timezones don't exist and
do 'naive' arithmetic. This is of course a totally false assumption.
Apparently when making the native version of 'zoneinfo', the lessons
learned from 'pytz' have been discarded.

There is a general guideline that you should always keep and use your
datetimes as UTC, only ever using timezones for the purposes of display.
Usually this is because it keeps things simpler for the programmer, and
hence they are less likely to introduce bugs into their programs. It
appears that with Python it's not so much a guideline as an absolute
concrete rule, and not because programmers will introduce bugs, but
because you need to avoid bugs in the standard library!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-16 Thread Jon Ribbens via Python-list
On 2022-04-16, Peter J. Holzer  wrote:
> On 2022-04-14 15:22:29 -0000, Jon Ribbens via Python-list wrote:
>> On 2022-04-14, Paul Bryan  wrote:
>> > I think because minutes and hours can easily be composed by multiplying
>> > seconds. days is separate because you cannot compose days from seconds;
>> > leap seconds are applied to days at various times, due to
>> > irregularities in the Earth's rotation.
>>
>> That's an argument that timedelta should *not* have a 'days' attribute,
>> because a day is not a fixed number of seconds long (to know how long
>> a day is, you have to know which day you're talking about, and where).
>
> Which is exactly why timedelta *must* have separate fields for seconds,
> days and months. You can't simply express the larger units as
> multiples of the smaller units, so they have to be stored separately for
> date arithmetic to work.

That's impossible unless you redefine 'timedelta' from being, as it is
now, a fixed-length period of time, to instead being the difference
between two specific dates and times in specific timezones. Days and
months have different lengths depending on when and where you are.

>> It's an undocumented feature of timedelta that by 'day' it means '86400
>> seconds'.
>
> I'd call that a bug, not a feature:

It's the only possible way of implementing it, so it can't be a bug.
The documentation could be better though.

>>>> from datetime import datetime, timedelta
>>>> t0 = datetime.fromisoformat("2022-03-26T12:00").astimezone()
>>>> t0
> datetime.datetime(2022, 3, 26, 12, 0, 
> tzinfo=datetime.timezone(datetime.timedelta(seconds=3600), 'CET'))
>>>> d = timedelta(days=1)
>>>> t1 = t0 + d
>>>> t1
> datetime.datetime(2022, 3, 27, 12, 0, 
> tzinfo=datetime.timezone(datetime.timedelta(seconds=3600), 'CET'))
>>>> t1.isoformat()
> '2022-03-27T12:00:00+01:00'
>
> Python missed the switch to DST here, the timezone is wrong.

Because you didn't let it use any timezone information. You need to
either use the third-party 'pytz' module, or in Python 3.9 or above,
the built-in 'zoneinfo' module.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-14 Thread Jon Ribbens via Python-list
On 2022-04-14, MRAB  wrote:
> On 2022-04-14 16:22, Jon Ribbens via Python-list wrote:
>> On 2022-04-14, Paul Bryan  wrote:
>>> I think because minutes and hours can easily be composed by multiplying
>>> seconds. days is separate because you cannot compose days from seconds;
>>> leap seconds are applied to days at various times, due to
>>> irregularities in the Earth's rotation.
>> 
>> That's an argument that timedelta should *not* have a 'days' attribute,
>> because a day is not a fixed number of seconds long (to know how long
>> a day is, you have to know which day you're talking about, and where).
>> It's an undocumented feature of timedelta that by 'day' it means '86400
>> seconds'.
>
> When you're working only with dates, timedelta not having a 'days' 
> attribute would be annoying, especially when you consider that a day is 
> usually 24 hours, but sometimes 23 or 25 hours (DST).

The second half of your sentence is the argument as to why the first half
of your sentence is wrong. The difference between noon on the 26th March
2022 in London and noon on the 27th March 2022 is "1 day" from one point
of view but is not "1 day" according to timedelta.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why does datetime.timedelta only have the attributes 'days' and 'seconds'?

2022-04-14 Thread Jon Ribbens via Python-list
On 2022-04-14, Paul Bryan  wrote:
> I think because minutes and hours can easily be composed by multiplying
> seconds. days is separate because you cannot compose days from seconds;
> leap seconds are applied to days at various times, due to
> irregularities in the Earth's rotation.

That's an argument that timedelta should *not* have a 'days' attribute,
because a day is not a fixed number of seconds long (to know how long
a day is, you have to know which day you're talking about, and where).
It's an undocumented feature of timedelta that by 'day' it means '86400
seconds'.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Behavior of the for-else construct

2022-03-03 Thread Jon Ribbens via Python-list
On 2022-03-03, computermaster360  wrote:
> Do you find the for-else construct useful? Have you used it in
> practice?

Yes, I use it frequently.

> I have used it maybe once. My issue with this construct is that
> calling the second block `else` doesn't make sense; a much more
> sensible name would be `then`.

You are not the only person with this opinion, although personally
I have the opposite opinion. I think of 'for...else' as being
a search for something that matches a condition, and the 'else'
block is if no item is found that matches. If you think of it like
that, the syntax makes perfect sense.

> Now, imagine a parallel universe, where the for-else construct would
> have a different behavior:
>
> for elem in iterable:
> process(elem)
> else:
> # executed only when the iterable was initially empty
> print('Nothing to process')
>
> Wouldn't this be more natural? I think so. Also, I face this case much
> more often than having detect whether I broke out of a loop early
> (which is what the current for-else construct is for).

I guess peoples' needs vary. I can't even remember the last time
I've needed something as you suggest above - certainly far less
often than I need 'for...else' as it is now.

> What are your thoughts? Do you agree?

I don't agree. But it doesn't really matter if anyone agrees or not,
since there is no chance whatsoever that a valid Python syntax is
suddenly going to change to mean something completely different, not
even in "Python 4000" or whatever far-future version we might imagine.

This exact topic was discussd in November 2017 by the way, under the
subject heading "Re: replacing `else` with `then` in `for` and `try`".
I'm not sure any particular conclusion was reached though except that
some people think 'else' is more intuitive and some people think
'then' would be more intuitive.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: keep getting a syntax error on the very first program I am running

2022-01-14 Thread Jon Ribbens via Python-list
On 2022-01-15, Bob Griffin  wrote:
>I am running this program and keep getting this error.  Is this normal?
>
>Invalid syntax.  Perhaps you forgot a comma?
>
>Also the t in tags is highlighted.
>
>I even tried different versions of Python also.
>
>Python 3.10.1 (tags/v3.10.1:2cd268a, Dec  6 2021, 19:10:37) [MSC v.1929 64
>bit (AMD64)] on win32
>
>Type "help", "copyright", "credits" or "license()" for more information.
>
>print("Hello World")
>
>Hello World
>
>input("\n\nPress the enter key to exit.")

You're only supposed to enter the lines:

print("Hello World")

and

input("\n\nPress the enter key to exit.")

The other lines are showing you the output you should see.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Short, perfect program to read sentences of webpage

2021-12-08 Thread Jon Ribbens via Python-list
On 2021-12-08, Julius Hamilton  wrote:
> 1. The HTML extraction is not perfect. It doesn’t produce as clean text as
> I would like. Sometimes random links or tags get left in there. And the
> sentences are sometimes randomly broken by newlines.

Oh. Leaving tags in suggests you are doing this very wrongly. Python
has plenty of open source libraries you can use that will parse the
HTML reliably into tags and text for you.

> 2. Neither is the segmentation perfect. I am currently researching
> developing an optimal segmenter with tools from Spacy.
>
> Brevity is greatly valued. I mean, anyone who can make the program more
> perfect, that’s hugely appreciated. But if someone can do it in very few
> lines of code, that’s also appreciated.

It isn't something that can be done in a few lines of code. There's the
spaces issue you mention for example. Nor is it something that can
necessarily be done just by inspecting the HTML alone. To take a trivial
example:

  powergenitalia  = powergen  italia

but:

  powergenitalia= powergenitalia

but the second with the addition of:

  span { dispaly: block }

is back to "powergen  italia". So you need to parse and apply styles
(including external stylesheets) as well. Potentially you may also need
to execute JavaScript on the page, which means you also need a JavaScript
interpreter and a DOM implementation. Basically you need a complete
browser to do it on general web pages.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: copy.copy

2021-11-22 Thread Jon Ribbens via Python-list
On 2021-11-22, ast  wrote:
> Hi,
>
> >>> a = 6
> >>> b = 6
> >>> a is b
> True
>
> ok, we all know that Python creates a sole instance
> with small integers, but:
>
> >>> import copy
> >>> b = copy.copy(a)
> >>> a is b
> True
>
> I was expecting False

Why did you expect False?

For immutable types, copy(foo) just returns foo.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Proliferation of Python packaging formats

2021-11-17 Thread Jon Ribbens via Python-list
On 2021-11-17, Skip Montanaro  wrote:
> Is the proliferation of packaging formats in Python as nutzo as this author
> believes?
>
> https://drewdevault.com/2021/11/16/Python-stop-screwing-distros-over.html
>
> Asking because I've never been in the business of releasing "retail" Python
> applications or packages.

Well the first paragraph is ridiculous. I've never heard of half of the
things he lists as being necessary to deal with, and half the remainder
are just words relating to packages, i.e. you could make a similar list
for any language.

The other major problem with that post is that it gives no examples
or even clues as to what the author's actual problem is...

On the other hand it is true that Python's packaging system is confusing
and very badly documented, and I doubt the vast majority of people have
any idea of what the difference between 'distutils' and 'setuptools' is
(I certainly don't), except inasmuch as 'setuptools' (and 'wheel') is
something you have to remember to manually update after creating a
virtual environment before installing your actual packages.

It's also true that you have to remember with Python that you basically
cannot use 'pip' to install anything globally as it will interfere with
the operating system's packaging. You must use virtual envs for
everything, or the operating system's provided packages.

Also PEP 518's choice of TOML is absolutely risible, a language about
which the only positive thing can be said is that it's not as bad as
YAML, and for which Python doesn't even have a built-in parser -
something that should have absolutely ruled it out as an option.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Avoid nested SIGINT handling

2021-11-10 Thread Jon Ribbens via Python-list
On 2021-11-10, Paulo da Silva  wrote:
> Hi!
>
> How do I handle a SIGINT (or any other signal) avoid nesting?

I don't think you need to. Python will only call signal handlers in
the main thread, so a handler can't be executed while another handler
is running anyway.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Get a Joke in Python

2021-10-28 Thread Jon Ribbens via Python-list
On 2021-10-28, Greg Ewing  wrote:
> On 29/10/21 11:34 am, Chris Angelico wrote:
>> On Fri, Oct 29, 2021 at 7:31 AM Mostowski Collapse  
>> wrote:
>>> QA engineer walks into a bar. Orders a beer. Orders 0 beers.
>>> Orders 9 beers. Orders a lizard. Orders -1 beers.
>>> Orders a sfdeljknesv.
>>>
>> 
>> Orders 1 пиво and is served a пиво. QA engineer sighs "not again".
>
> Orders NaN beers and 

The variant of this I saw the other day follows the above and after
the QA engineer has ordered various things it follows:

  A real customer walks into the bar and doesn't order anything,
  but asks where the toilets are.

  The bar explodes in flames.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: New assignmens ...

2021-10-28 Thread Jon Ribbens via Python-list
On 2021-10-28, Paul Rubin  wrote:
> Chris Angelico  writes:
>> But it all depends on the exact process being done, which is why I've
>> been asking for real examples.
>
> My most frequent use case for walrus is so common that I have sometimes
> implemented a special class for it:
>
>if g := re.search(pat1, text):
>   hack(g.group(1))
>elif g := re.search(pat2, text):
>   smack(g.group(2), "foo")
>...
>
> It's way messier if you have to separate the assignment and test the old
> way.  That said, I'm still on Python 3.7 so I haven't yet gotten to use
> walrus or the new match statement (or is it expression).
>
> I do feel surprised that you can't use an arbitrary lvalue (to use C
> terminology) on the lhs of a walrus.  That seems downright weird to me.
> But, I haven't studied the PEP so I don't know if there was a particular
> rationale.

Well, that's what I was saying: there's no rationale - the limitation
is not even mentioned, let alone explained.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: New assignmens ...

2021-10-23 Thread Jon Ribbens via Python-list
On 2021-10-23, Chris Angelico  wrote:
> On Sun, Oct 24, 2021 at 4:39 AM Jon Ribbens via Python-list
> wrote:
>> On 2021-10-23, Chris Angelico  wrote:
>> > In what situations do you need to mutate an attribute and also test
>> > it, and how much hassle is it to simply break it out into two lines?
>>
>> It's not hard to imagine something like:
>>
>> def get_expensive(self):
>> return self.expensive or self.expensive := self.calculate_expensive()
>
> I usually write this sort of thing the other way:
>
> def get_expensive(self, key):
> if key not in self.cache:
> self.cache[key] = ...
> return self.cache[key]
>
> and then if you don't like the duplication, the cleanest way is to put
> the expensive calculation into the __missing__ method of a dict
> subclass.

Sure, but if "there's already another way of doing it" was a winning
argument then assignment expressions wouldn't have been accepted into
the language at all.

>> > The onus is on you to show that it needs to be more flexible.
>>
>> Is it though? It seems to me that the onus is on you to show that
>> this special case is special enough to be given its own unique
>> existence. It's a bit surprising that the PEP doesn't discuss this
>> decision at all.
>
> The PEP was accepted. Thus it is now up to someone proposing a change
> to show that the change is worthwhile.
>
> Python has frequently started with a more restricted rule set, with
> the option to make it less restricted in the future. Case in point:
> Decorator syntax used to be limited to a small set of options, aimed
> at the known use-cases at the time. Then very recently, that was
> opened up to basically any expression.
>
> https://www.python.org/dev/peps/pep-0614/
>
> Read over that document for an excellent example of how to take a
> tight proposal and recommend that it be made more flexible. Assignment
> expressions are currently in the restricted form, allowing only simple
> names, and it's up to you to propose and demonstrate the value of the
> increased flexibility.

I think we're allowed to discuss things in this group without them
having to turn into formal proposals. Personally I've never written
a Python assignment expression, and I think it'll be a few years
before Python 3.8 is old enough for them to be conveniently usable.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: New assignmens ...

2021-10-23 Thread Jon Ribbens via Python-list
On 2021-10-23, Chris Angelico  wrote:
> I've never used ctr:=ctr-1 either, though, so I don't know the actual
> use cases. Why is this being used in an assignment expression? Is it
> an ersatz loop?
>
> Common use-cases include:
>
> if m := re.match(...):
>
> while data := thing.read():
>
> etc. All of them are doing exactly two things: testing if something is
> empty, and if it isn't, using it in a block of code.
>
> In what situations do you need to mutate an attribute and also test
> it, and how much hassle is it to simply break it out into two lines?

It's not hard to imagine something like:

def get_expensive(self):
return self.expensive or self.expensive := self.calculate_expensive()

> The onus is on you to show that it needs to be more flexible.

Is it though? It seems to me that the onus is on you to show that
this special case is special enough to be given its own unique
existence. It's a bit surprising that the PEP doesn't discuss this
decision at all.
-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   >