OK I believe I’ve cracked it.  Spurred on by the process tracing I mentioned in 
the last message, I looked into the repeated spawning of 

zsh -c getconf DARWIN_USER_TEMP_DIR 

which occurs with both remote and local start (even more times for the remote 
start).  I found DARWIN_USER_TEMP_DIR mentioned in cus-start.el, and it is 
obviously mac specific.  There is the interesting related docstring:

temporary-file-directory is a compiled Lisp function in ‘files.el’.

(temporary-file-directory)

  Probably introduced at or before Emacs version 20.3.

The directory for writing temporary files.
In case of a remote ‘default-directory’, this is a directory for
temporary files on that remote host.  ...
For a non-remote and non-mounted ‘default-directory’, the value of
the variable ‘temporary-file-directory’ is returned.

So a canonical local vs remote difference, as you mentioned.  To investigate 
further, first I explicitly set my shell-file-name to “/bin/bash”.  This 
radically improves things: local and remote are much closer to the same speed 
now.  

But still, the issue remains, why is DARWIN_USER_TEMP_DIR being queried via 
shell-command-to-string so often?  This variable is just a startup constant, 
taken ONCE from the environment, or from getconf (as a backup in case getenv 
fails).  

The traceback is revealing:

  shell-command-to-string("getconf DARWIN_USER_TEMP_DIR")
  (progn (shell-command-to-string "getconf DARWIN_USER_TEMP_DIR"))
  (condition-case nil (progn (shell-command-to-string "getconf 
DARWIN_USER_TEMP_DIR")) (error nil))
  (ignore-errors (shell-command-to-string "getconf DARWIN_USER_TEMP_DIR"))
  (let* (file-name-handler-alist (tmp (ignore-errors (shell-command-to-string 
"getconf DARWIN_USER_TEMP_DIR")))) (and (stringp tmp) (setq tmp 
(replace-regexp-in-string "\n\\'" "" tmp)) (file-directory-p tmp) tmp))
  (or (getenv "TMPDIR") (getenv "TMP") (getenv "TEMP") (let* 
(file-name-handler-alist (tmp (ignore-errors (shell-command-to-string "getconf 
DARWIN_USER_TEMP_DIR")))) (and (stringp tmp) (setq tmp 
(replace-regexp-in-string "\n\\'" "" tmp)) (file-directory-p tmp) tmp)) "/tmp")
  (cond ((memq system-type '(ms-dos windows-nt)) (or (getenv "TEMP") (getenv 
"TMPDIR") (getenv "TMP") "c:/temp")) ((eq system-type 'darwin) (or (getenv 
"TMPDIR") (getenv "TMP") (getenv "TEMP") (let* (file-name-handler-alist (tmp 
(ignore-errors (shell-command-to-string "getconf DARWIN_USER_TEMP_DIR")))) (and 
(stringp tmp) (setq tmp (replace-regexp-in-string "\n\\'" "" tmp)) 
(file-directory-p tmp) tmp)) "/tmp")) (t (or (getenv "TMPDIR") (getenv "TMP") 
(getenv "TEMP") "/tmp")))
  (file-name-as-directory (cond ((memq system-type '(ms-dos windows-nt)) (or 
(getenv "TEMP") (getenv "TMPDIR") (getenv "TMP") "c:/temp")) ((eq system-type 
'darwin) (or (getenv "TMPDIR") (getenv "TMP") (getenv "TEMP") (let* 
(file-name-handler-alist (tmp (ignore-errors ...))) (and (stringp tmp) (setq 
tmp (replace-regexp-in-string "\n\\'" "" tmp)) (file-directory-p tmp) tmp)) 
"/tmp")) (t (or (getenv "TMPDIR") (getenv "TMP") (getenv "TEMP") "/tmp"))))
  eval((file-name-as-directory (cond ((memq system-type '(ms-dos windows-nt)) 
(or (getenv "TEMP") (getenv "TMPDIR") (getenv "TMP") "c:/temp")) ((eq 
system-type 'darwin) (or (getenv "TMPDIR") (getenv "TMP") (getenv "TEMP") (let* 
(file-name-handler-alist (tmp ...)) (and (stringp tmp) (setq tmp ...) 
(file-directory-p tmp) tmp)) "/tmp")) (t (or (getenv "TMPDIR") (getenv "TMP") 
(getenv "TEMP") "/tmp")))))
  
tramp-sh-handle-expand-file-name("/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..."
 nil)
  apply(tramp-sh-handle-expand-file-name 
("/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..." nil))
  tramp-sh-file-name-handler(expand-file-name 
"/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..." nil)
  apply(tramp-sh-file-name-handler expand-file-name 
("/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..." nil))
  tramp-file-name-handler(expand-file-name 
"/ssh:tir:/home/jdsmith/code/python/scraps/Untitled..." nil)

Stepping through `tramp-sh-handle-expand-file-name', I noticed that, after the 
main SSH connection is established, every other invocation of 
`tramp-compat-temporary-file-directory’ spawns a new shell with getconf 
DARWIN_USER_TEMP_DIR.  

And, at long last, HERE we see the problem, in 
`tramp-compat-temporary-file-directory':

(eval (car (get 'temporary-file-directory 'standard-value)) t))

Why is this an issue?  That standard value of `temporary-file-directory' 
contains, for 'darwin systems:

(shell-command-to-string \"getconf DARWIN_USER_TEMP_DIR\"))))

So, to recap my understanding of this issue:

During file completion, tramp requests the temporary file directory many, many 
times, using its own compat function.
Because on remote hosts, the function `temporary-file-directory' returns a 
*remote* directory, tramp instead evaluates the standard value of the variable 
`temporary-file-directory' (defined in cus-start). 
For 'darwin systems, that standard value includes a shell command call to 
retrieve DARWIN_USER_TEMP_DIR.  
Tramp thus evals code which spawns a shell to getconf DARWIN_USER_TEMP_DIR 
hundreds of times per file completion. 
Some shells are slower than others to start.  For me zsh is much slower than 
bash (due to my config).
[not really relevant] For some unexplained reasons, starting zsh in a remote 
directory leads to faster local shell startups. Possibly environment variable 
setting is the culprit.

One thing I don’t understand is why tramp needs to re-evaluate the standard 
value of the variable `temporary-file-directory’ at all.  That variable doesn’t 
appear to change on remote hosts, or ever really.  What does change is the 
returned result of the identically-name function `temporary-file-directory’.  

A quick fix for me was simply to (see FIXED timing, below):

(defsubst tramp-compat-temporary-file-directory ()
  "Return name of directory for temporary files.
It is the default value of `temporary-file-directory'."
  temporary-file-directory)

With this, my tramp file browsing is blazingly fast, whether starting local or 
remote, with Emacs 27.2 and tramp 2.4.3-pre.  Some timing, from my earlier test:

Normal Tramp Local, zsh: 11.2 seconds
Normal Tramp Remote, zsh: 0.88s
Normal Tramp Local, bash: 1.48 seconds
Normal Tramp Remote, bash: 0.80s
Fixed Tramp Local, zsh: 0.76 seconds
Fixed Tramp Remote, zsh: 0.85 seconds
Fixed Tramp Local, bash: 0.74 seconds
Fixed Tramp Remote, bash: 0.80 seconds

I note that this remains an issue in Emacs 28.1 and Tramp v2.5.1.2.  I would 
not be at all surprised if this also underlies many of the complaints about 
tramp performance on MacOS (e.g. 
https://emacs.stackexchange.com/questions/16489/tramp-is-unbearably-slow-osx-ssh
 
<https://emacs.stackexchange.com/questions/16489/tramp-is-unbearably-slow-osx-ssh>),
 for people who have slow-to-start shells. 

Thanks for your help with this (and as always, for Tramp, which is unique and 
incredibly useful).

Best,
JDS


> On Sep 20, 2021, at 2:31 PM, JD Smith <[email protected]> wrote:
> 
>> 
>> Furthermore, I have switched to a remote machine in my LAN, because it
>> doesn't matter whether the machine is far away. So I have changed the
>> code snippet to
>> 
>> (progn
>>  (require 'tramp-sh)
>>  (defun my/test ()
>>    (interactive)
>>    (tramp-cleanup-all-connections)
>>    (elp-instrument-list
>>     (append
>>      '(completion-file-name-table
>>      completion--sifn-requote
>>      completion--file-name-table
>>      completing-read
>>      read-file-name-internal
>>      read-file-name-default)
>>      (mapcar #'intern (all-completions "tramp-handle-" obarray 'functionp))
>>      (mapcar #'intern (all-completions "tramp-sh-handle-" obarray 
>> 'functionp))))
>>    (setq unread-command-events
>>        (mapcar 'identity "///ssh:detlef:/hom\talbin\tsr\temacs\t/lisp/\t\n"))
>>    (call-interactively #'find-file)
>>    (let ((elp-recycle-buffers-p nil))
>>      (elp-results)
>>      (elp-reset-all)))
>> 
>>  (dolist (df '("/ssh::" "/"))
>>    (let ((default-directory df))
>>      (my/test))))
> 
> Thanks for this.  The final bit of your updated test unfortunately does not 
> work for me, as it tries and fails to open a connection to my local machine 
> for some reason (“Could not resolve hostname”).
> 
> But just running my/test from *scratch*, and then again from a dired buffer 
> on the remote host (another server this time, also wired), yields the 
> attached results, in graphical form this time.  As you can see, in the worst 
> case, the performance per call is >100x worse.
> 
> One additional bit of data: on my MacOS terminal, when a running process 
> (here emacs -Q) spawns any sub-processes, the name of the sub-process appears 
> in the terminal window title.  I have noticed that, while performing our test 
> from *scratch*, the title flashes wildly for the full ~9 seconds of the test, 
> mentioning “launchctl” and “date” and others, over and over.  These appear 
> perhaps a dozen or more times.  Conversely, when running my/test from the 
> remote dired buffer, it is updated almost not at all (maybe “gzip” for the 
> final file transfer?... it goes by fast).
> 
> So I’d conjecture the root cause of the discrepancy here is that there’s some 
> real difference in how and how often external processes are being spawned, 
> between local and remote starting points.  
> 
> To check into this, I advised call-process, to report on which processes are 
> being made.  But this looks… identical, except the fast version spawns my zsh 
> shell to collect DARWIN_USER_TEMP_DIR even more often (134 times!)!  Somehow 
> the processes have to be involved though...
> 
> <tramp_local_remote.pdf>
> 
> 
> 
> 
> 

Reply via email to