OK I believe I’ve cracked it. Spurred on by the process tracing I mentioned in
the last message, I looked into the repeated spawning of
zsh -c getconf DARWIN_USER_TEMP_DIR
which occurs with both remote and local start (even more times for the remote
start). I found DARWIN_USER_TEMP_DIR mentioned in cus-start.el, and it is
obviously mac specific. There is the interesting related docstring:
temporary-file-directory is a compiled Lisp function in ‘files.el’.
(temporary-file-directory)
Probably introduced at or before Emacs version 20.3.
The directory for writing temporary files.
In case of a remote ‘default-directory’, this is a directory for
temporary files on that remote host. ...
For a non-remote and non-mounted ‘default-directory’, the value of
the variable ‘temporary-file-directory’ is returned.
So a canonical local vs remote difference, as you mentioned. To investigate
further, first I explicitly set my shell-file-name to “/bin/bash”. This
radically improves things: local and remote are much closer to the same speed
now.
But still, the issue remains, why is DARWIN_USER_TEMP_DIR being queried via
shell-command-to-string so often? This variable is just a startup constant,
taken ONCE from the environment, or from getconf (as a backup in case getenv
fails).
The traceback is revealing:
shell-command-to-string("getconf DARWIN_USER_TEMP_DIR")
(progn (shell-command-to-string "getconf DARWIN_USER_TEMP_DIR"))
(condition-case nil (progn (shell-command-to-string "getconf
DARWIN_USER_TEMP_DIR")) (error nil))
(ignore-errors (shell-command-to-string "getconf DARWIN_USER_TEMP_DIR"))
(let* (file-name-handler-alist (tmp (ignore-errors (shell-command-to-string
"getconf DARWIN_USER_TEMP_DIR")))) (and (stringp tmp) (setq tmp
(replace-regexp-in-string "\n\\'" "" tmp)) (file-directory-p tmp) tmp))
(or (getenv "TMPDIR") (getenv "TMP") (getenv "TEMP") (let*
(file-name-handler-alist (tmp (ignore-errors (shell-command-to-string "getconf
DARWIN_USER_TEMP_DIR")))) (and (stringp tmp) (setq tmp
(replace-regexp-in-string "\n\\'" "" tmp)) (file-directory-p tmp) tmp)) "/tmp")
(cond ((memq system-type '(ms-dos windows-nt)) (or (getenv "TEMP") (getenv
"TMPDIR") (getenv "TMP") "c:/temp")) ((eq system-type 'darwin) (or (getenv
"TMPDIR") (getenv "TMP") (getenv "TEMP") (let* (file-name-handler-alist (tmp
(ignore-errors (shell-command-to-string "getconf DARWIN_USER_TEMP_DIR")))) (and
(stringp tmp) (setq tmp (replace-regexp-in-string "\n\\'" "" tmp))
(file-directory-p tmp) tmp)) "/tmp")) (t (or (getenv "TMPDIR") (getenv "TMP")
(getenv "TEMP") "/tmp")))
(file-name-as-directory (cond ((memq system-type '(ms-dos windows-nt)) (or
(getenv "TEMP") (getenv "TMPDIR") (getenv "TMP") "c:/temp")) ((eq system-type
'darwin) (or (getenv "TMPDIR") (getenv "TMP") (getenv "TEMP") (let*
(file-name-handler-alist (tmp (ignore-errors ...))) (and (stringp tmp) (setq
tmp (replace-regexp-in-string "\n\\'" "" tmp)) (file-directory-p tmp) tmp))
"/tmp")) (t (or (getenv "TMPDIR") (getenv "TMP") (getenv "TEMP") "/tmp"))))
eval((file-name-as-directory (cond ((memq system-type '(ms-dos windows-nt))
(or (getenv "TEMP") (getenv "TMPDIR") (getenv "TMP") "c:/temp")) ((eq
system-type 'darwin) (or (getenv "TMPDIR") (getenv "TMP") (getenv "TEMP") (let*
(file-name-handler-alist (tmp ...)) (and (stringp tmp) (setq tmp ...)
(file-directory-p tmp) tmp)) "/tmp")) (t (or (getenv "TMPDIR") (getenv "TMP")
(getenv "TEMP") "/tmp")))))
tramp-sh-handle-expand-file-name("/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..."
nil)
apply(tramp-sh-handle-expand-file-name
("/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..." nil))
tramp-sh-file-name-handler(expand-file-name
"/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..." nil)
apply(tramp-sh-file-name-handler expand-file-name
("/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..." nil))
tramp-file-name-handler(expand-file-name
"/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..." nil)
Stepping through `tramp-sh-handle-expand-file-name', I noticed that, after the
main SSH connection is established, every other invocation of
`tramp-compat-temporary-file-directory’ spawns a new shell with getconf
DARWIN_USER_TEMP_DIR.
And, at long last, HERE we see the problem, in
`tramp-compat-temporary-file-directory':
(eval (car (get 'temporary-file-directory 'standard-value)) t))
Why is this an issue? That standard value of `temporary-file-directory'
contains, for 'darwin systems:
(shell-command-to-string \"getconf DARWIN_USER_TEMP_DIR\"))))
So, to recap my understanding of this issue:
During file completion, tramp requests the temporary file directory many, many
times, using its own compat function.
Because on remote hosts, the function `temporary-file-directory' returns a
*remote* directory, tramp instead evaluates the standard value of the variable
`temporary-file-directory' (defined in cus-start).
For 'darwin systems, that standard value includes a shell command call to
retrieve DARWIN_USER_TEMP_DIR.
Tramp thus evals code which spawns a shell to getconf DARWIN_USER_TEMP_DIR
hundreds of times per file completion.
Some shells are slower than others to start. For me zsh is much slower than
bash (due to my config).
[not really relevant] For some unexplained reasons, starting zsh in a remote
directory leads to faster local shell startups. Possibly environment variable
setting is the culprit.
One thing I don’t understand is why tramp needs to re-evaluate the standard
value of the variable `temporary-file-directory’ at all. That variable doesn’t
appear to change on remote hosts, or ever really. What does change is the
returned result of the identically-name function `temporary-file-directory’.
A quick fix for me was simply to (see FIXED timing, below):
(defsubst tramp-compat-temporary-file-directory ()
"Return name of directory for temporary files.
It is the default value of `temporary-file-directory'."
temporary-file-directory)
With this, my tramp file browsing is blazingly fast, whether starting local or
remote, with Emacs 27.2 and tramp 2.4.3-pre. Some timing, from my earlier test:
Normal Tramp Local, zsh: 11.2 seconds
Normal Tramp Remote, zsh: 0.88s
Normal Tramp Local, bash: 1.48 seconds
Normal Tramp Remote, bash: 0.80s
Fixed Tramp Local, zsh: 0.76 seconds
Fixed Tramp Remote, zsh: 0.85 seconds
Fixed Tramp Local, bash: 0.74 seconds
Fixed Tramp Remote, bash: 0.80 seconds
I note that this remains an issue in Emacs 28.1 and Tramp v2.5.1.2. I would
not be at all surprised if this also underlies many of the complaints about
tramp performance on MacOS (e.g.
https://emacs.stackexchange.com/questions/16489/tramp-is-unbearably-slow-osx-ssh
<https://emacs.stackexchange.com/questions/16489/tramp-is-unbearably-slow-osx-ssh>),
for people who have slow-to-start shells.
Thanks for your help with this (and as always, for Tramp, which is unique and
incredibly useful).
Best,
JDS
> On Sep 20, 2021, at 2:31 PM, JD Smith <[email protected]> wrote:
>
>>
>> Furthermore, I have switched to a remote machine in my LAN, because it
>> doesn't matter whether the machine is far away. So I have changed the
>> code snippet to
>>
>> (progn
>> (require 'tramp-sh)
>> (defun my/test ()
>> (interactive)
>> (tramp-cleanup-all-connections)
>> (elp-instrument-list
>> (append
>> '(completion-file-name-table
>> completion--sifn-requote
>> completion--file-name-table
>> completing-read
>> read-file-name-internal
>> read-file-name-default)
>> (mapcar #'intern (all-completions "tramp-handle-" obarray 'functionp))
>> (mapcar #'intern (all-completions "tramp-sh-handle-" obarray
>> 'functionp))))
>> (setq unread-command-events
>> (mapcar 'identity "///ssh:detlef:/hom\talbin\tsr\temacs\t/lisp/\t\n"))
>> (call-interactively #'find-file)
>> (let ((elp-recycle-buffers-p nil))
>> (elp-results)
>> (elp-reset-all)))
>>
>> (dolist (df '("/ssh::" "/"))
>> (let ((default-directory df))
>> (my/test))))
>
> Thanks for this. The final bit of your updated test unfortunately does not
> work for me, as it tries and fails to open a connection to my local machine
> for some reason (“Could not resolve hostname”).
>
> But just running my/test from *scratch*, and then again from a dired buffer
> on the remote host (another server this time, also wired), yields the
> attached results, in graphical form this time. As you can see, in the worst
> case, the performance per call is >100x worse.
>
> One additional bit of data: on my MacOS terminal, when a running process
> (here emacs -Q) spawns any sub-processes, the name of the sub-process appears
> in the terminal window title. I have noticed that, while performing our test
> from *scratch*, the title flashes wildly for the full ~9 seconds of the test,
> mentioning “launchctl” and “date” and others, over and over. These appear
> perhaps a dozen or more times. Conversely, when running my/test from the
> remote dired buffer, it is updated almost not at all (maybe “gzip” for the
> final file transfer?... it goes by fast).
>
> So I’d conjecture the root cause of the discrepancy here is that there’s some
> real difference in how and how often external processes are being spawned,
> between local and remote starting points.
>
> To check into this, I advised call-process, to report on which processes are
> being made. But this looks… identical, except the fast version spawns my zsh
> shell to collect DARWIN_USER_TEMP_DIR even more often (134 times!)! Somehow
> the processes have to be involved though...
>
> <tramp_local_remote.pdf>
>
>
>
>
>