Ah I hadn't copy-pasted, didn't realize there were two underscores __ at
the beginning of that environment variable.
Exporting that to disable the cache works.
Yes our homes are NFS mounted via AutoFS, so something strange might be
happening there.
I'll look into this, and will also pose the question to Nvidia via the
thread that I have open with them.
Thanks so much for your help!
Also I'll update this thread once I get more info
On Friday, April 29, 2022 at 1:42:05 PM UTC-4 DRC wrote:
> I'm not sure why the shader cache is so slow on your particular machine,
> but if your home directory is NFS-mounted, then that could be part of the
> problem. You can use the __GL_SHADER_DISK_CACHE_PATH environment
> variable to relocate the cache to a local volume. I've never seen this
> particular issue before, and nVidia would be better equipped to explain
> what is happening. Note that there are also xorg.conf settings for
> disabling the shader disk cache, so perhaps enabling one of those settings
> on the 3D X server would fix this system-wide.
> DRC
>
> On 4/29/22 12:36 PM, DRC wrote:
>
> That really seems like a driver bug. Did setting __GL_SHADER_DISK_CACHE=0
> in the environment have any effect? (Note that I typoed below. There
> should be two underscores preceding the environment variable.)
>
> DRC
> On 4/29/22 12:20 PM, Ryan Salomon wrote:
>
> Huh, most of the way there now!!
>
> i removed ~.nv, which then subsequent runs created .cache/ and so on from
> there and still had the issue.
>
> So I renamed that directory, and lo and behold, it works!! Also it works
> upon subsequent runs /within/ that terminal session in xterm, through the
> VNC client.
>
> ...and then I close the xterm, open a new one, and same thing. It doesn't
> work until I recursively remove the .cache folder.
>
>
>
> On Thursday, April 28, 2022 at 9:09:20 PM UTC-4 DRC wrote:
>
>> Seems like it may be an issue with nVidia's shader cache. Try deleting
>> ~/.nv or setting _GL_SHADER_DISK_CACHE=0 in the environment.
>>
>> DRC
>> On 4/28/22 7:52 PM, Ryan Salomon wrote:
>>
>> Hmm, I'm not sure how to do that.
>>
>> All I did was run vglrun +tr glxinfo
>>
>> Ok, I tried running the same command but with strace at the start, and
>> found something interesting during the start of the wait:
>>
>> poll([{fd=4, events=POLLIN}], 1, -1) = 1 ([{fd=4, revents=POLLIN}])
>> recvmsg(4, {msg_name=NULL, msg_namelen=0,
>> msg_iov=[{iov_base="\1\0023\0\0\0\0\0H\2@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
>>
>> iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 32
>> getpid() = 16364
>> getpid() = 16364
>> getpid() = 16364
>> getpid() = 16364
>> getuid() = 37869
>> geteuid() = 37869
>> getgid() = 37869
>> getegid() = 37869
>> getuid() = 37869
>> geteuid() = 37869
>> getgid() = 37869
>> getegid() = 37869
>> stat("/home/username/.nv/GLCache", {st_mode=S_IFDIR|0700, st_size=4096,
>> ...}) = 0
>> mkdir("/home/username/.nv/", 0700) = -1 EEXIST (File exists)
>> mkdir("/home/username/.nv/GLCache", 0700) = -1 EEXIST (File exists)
>> mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/",
>> 0700) = -1 EEXIST (File exists)
>> mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/",
>>
>> 0700) = -1 EEXIST (File exists)
>>
>> open("/home/rsalomon/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/c1b003dd27d07ca9.toc",
>>
>> O_RDWR|O_CLOEXEC) = 15
>>
>>
>> fcntl(15, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_CUR, l_start=0,
>> l_len=1}) = -1 EIO (Input/output error)
>> close(15) = 0
>> getuid() = 37869
>> geteuid() = 37869
>> getgid() = 37869
>> getegid() = 37869
>> getuid() = 37869
>> geteuid() = 37869
>> getgid() = 37869
>> getegid() = 37869
>> stat("/home/username/.nv/GLCache", {st_mode=S_IFDIR|0700, st_size=4096,
>> ...}) = 0
>> mkdir("/home/username/.nv/", 0700) = -1 EEXIST (File exists)
>> mkdir("/home/username/.nv/GLCache", 0700) = -1 EEXIST (File exists)
>> mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/",
>> 0700) = -1 EEXIST (File exists)
>> mkdir("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/",
>>
>> 0700) = -1 EEXIST (File exists)
>> open("/home/username/.nv/GLCache/9f7e9a18c64449cd7a0049bdadc5d015/cf07cf62221389dd/c1b003dd27d07caa.toc",
>>
>> O_RDWR|O_CLOEXEC) = 15
>> fcntl(15, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_CUR, l_start=0,
>> l_len=1}) = -1 EIO (Input/output error)
>>
>> and it repeats from there
>>
>> home directory edited here since this is public
>>
>> On Thursday, April 28, 2022 at 4:49:02 PM UTC-4 DRC wrote:
>>
>> It's interesting that the delay occurs in the body of glXMakeCurrent().
>>> Can you ascertain whether the delay occurs in the backend::makeCurrent()
>>> method (which, when using the GLX back end, is just a wrapper for
>>> glXMakeContextCurrent())?
>>>
>>> DRC
>>> On 4/28/22 2:54 PM, Ryan Salomon wrote:
>>>
>>> Direct rendering says Yes.
>>>
>>> I ran vglrun +tr glxinfo, here's a snippet of the result, with a large
>>> gap that I had added manually by pressing enter when there was the
>>> extremely long delay.
>>> This might not help but it was my first thought in terms of possibly
>>> getting more visibility
>>>
>>> vglrun +tr glxinfo
>>> [VGL 0x88d75800] XOpenDisplay (name=NULL dpy=0x00db5b00(:1.0) ) 2.102852
>>> ms
>>> name of display: :1.0
>>> [VGL 0x88d75800] glXChooseVisual (dpy=0x00db5b00(:1.0) screen=0
>>> attrib_list=[0x0004 0x0008=0x0001 0x0009=0x0001 0x000a=0x0001 0x000c=0x0001
>>> 0x000d=0x0001 0x000e=0x0001 0x000f=0x0001 0x0010=0x0001 0x0011=0x0001
>>> 0x0005 ] glxattribs=[0x000c=0x0001 0x000d=0x0001 0x000e=0x0001
>>> 0x000f=0x0001 0x0010=0x0001 0x0011=0x0001 0x0005=0x0001 0x0008=0x0001
>>> 0x0009=0x0001 0x000a=0x0001 0x8011=0x0001 0x8010=0x0006 ] [VGL] dlopen
>>> (filename=libGLX_nvidia.so.0 flag=1 retval=0x00dd2f30)
>>> [VGL] dlopen (filename=libX11-xcb.so.1 flag=1 retval=0x00e392b0)
>>> [VGL] dlopen (filename=libxcb.so.1 flag=1 retval=0x7fb888d78000)
>>> [VGL] dlopen (filename=libxcb-glx.so.0 flag=1 retval=0x00e39290)
>>> [VGL] dlopen (filename=libGLX_mesa.so.0 flag=1 retval=0x00e4ddc0)
>>> [VGL] dlopen (filename=libGLX_mesa.so.0 flag=258 retval=0x00e4ddc0)
>>> [VGL] dlopen (filename=/usr/lib64/dri/tls/swrast_dri.so flag=258
>>> retval=0x00000000)
>>> [VGL] dlopen (filename=/usr/lib64/dri/swrast_dri.so flag=258
>>> retval=0x00e973b0)
>>> vis=0x00e68f50(0x21) config=0x00e3bdd0(0x79) ) 117.562056 ms
>>> [VGL 0x88d75800] glXChooseFBConfig (dpy=0x00db5b00(:1.0) screen=0
>>> attrib_list=[0x8011=0x0001 0x0008=0x0001 0x0009=0x0001 0x000a=0x0001
>>> 0x0005=0x0000 ] glxattribs=[0x0005=0x0000 0x0008=0x0001 0x0009=0x0001
>>> 0x000a=0x0001 0x8011=0x0001 ] configs[0]=0x00e9dfc0(0x7f)
>>> configs[1]=0x00e9c520(0xab) configs[2]=0x00e66930(0x7b)
>>> configs[3]=0x00e65cc0(0xa7) configs[4]=0x00e5a340(0x83)
>>> configs[5]=0x00de8d90(0xaf) configs[6]=0x00ed1d60(0x89)
>>> configs[7]=0x00ea34c0(0xb5) configs[8]=0x00e37130(0x91)
>>> configs[9]=0x00dd4830(0xbd) configs[10]=0x00e9ef00(0x8b)
>>> configs[11]=0x00ea0ac0(0xb7) configs[12]=0x00e9e1a0(0x93)
>>> configs[13]=0x00e9c700(0xbf) configs[14]=0x00e9ae00(0x97)
>>> configs[15]=0x00e65ea0(0xc3) configs[16]=0x00e65270(0x9b)
>>> configs[17]=0x00e64640(0xc7) configs[18]=0x00e639e0(0x9f)
>>> configs[19]=0x00e62f50(0xcb) configs[20]=0x00e62360(0xa3)
>>> configs[21]=0x00e60a00(0xcf) configs[22]=0x00e5fd00(0x80)
>>> configs[23]=0x00e5f110(0xac) configs[24]=0x00e5e4a0(0x7c)
>>> configs[25]=0x00e5d840(0xa8) configs[26]=0x00e5cbe0(0x84)
>>> configs[27]=0x00e5bd80(0xb0) configs[28]=0x00e5b170(0x8a)
>>> configs[29]=0x00e5a520(0xb6) configs[30]=0x00e59820(0x92)
>>> configs[31]=0x00e39960(0xbe) configs[32]=0x00ea36b0(0x8c)
>>> configs[33]=0x00ea29a0(0xb8) configs[34]=0x00ea1d10(0x94)
>>> configs[35]=0x00e38060(0xc0) configs[36]=0x00e36760(0x98)
>>> configs[37]=0x00de9cb0(0xc4) configs[38]=0x00de8f70(0x9c)
>>> configs[39]=0x00e9ec90(0xc8) configs[40]=0x00e9d0d0(0xa0)
>>> configs[41]=0x00e9b810(0xcc) configs[42]=0x00e614e0(0xa4)
>>> configs[43]=0x00e9bac0(0xd0) *nelements=44 ) 0.370026 ms
>>> [VGL 0x88d75800] glXGetProcAddressARB ((char
>>> *)procName=glXCreateContextAttribsARB [INTERPOSED]) 0.004768 ms
>>> [VGL 0x88d75800] glXCreateContextAttribsARB (dpy=0x00db5b00(:1.0)
>>> config=0x00e9dfc0(0x7f) share_context=0x00000000 direct=1
>>> attribs=[0x2091=0x0004 0x2092=0x0006 0x9126=0x0001 ] [VGL] dlopen
>>> (filename=NULL flag=1 retval=0x7fb888d9b150)
>>> [VGL] dlopen (filename=NULL flag=1 retval=0x7fb888d9b150)
>>> [VGL] dlopen (filename=libdbus-1.so.3 flag=1 retval=0x00f780f0)
>>> [VGL] dlopen (filename=libdrm.so.2 flag=1 retval=0x00e57b70)
>>> [VGL] dlopen (filename=liballocator.so.0 flag=1 retval=0x00000000)
>>> ctx=0x00e80b48 ) 77.847004 ms
>>> [VGL 0x88d75800] glXIsDirect (dpy=0x00db5b00(:1.0) ctx=0x00e80b48
>>> direct=1 ) 0.001192 ms
>>> [VGL 0x88d75800] glXGetVisualFromFBConfig (dpy=0x00db5b00(:1.0)
>>> config=0x00e9dfc0(0x7f) vis=0x0118e810(0x21) ) 0.023842 ms
>>> [VGL 0x88d75800] XCreateWindow (dpy=0x00db5b00(:1.0) parent=0x000003af
>>> x=0 y=0 width=100 height=100 depth=24 c_class=1 visual=0x00dc1e70(0x21)
>>> win=0x02000002 ) 0.020027 ms
>>> [VGL 0x88d75800] glXMakeCurrent (dpy=0x00db5b00(:1.0)
>>> drawable=0x02000002 ctx=0x00e80b48 [VGL] dlopen (filename=liballocator.so.0
>>> flag=1 retval=0x00000000)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> config=0x00e9dfc0(0x7f) drawable=0x00600002 renderer=NVIDIA
>>> A30/PCIe/SSE2 ) 1540881.658077 ms
>>> [VGL 0x88d75800] glXGetProcAddressARB ((char
>>> *)procName=glGetProgramivARB [passed through]) 0.017166 ms
>>> [VGL 0x88d75800] glXGetProcAddressARB ((char *)procName=glGetStringi
>>> [INTERPOSED]) 0.007868 ms
>>> [VGL 0x88d75800] glXGetProcAddressARB ((char
>>> *)procName=glGetConvolutionParameteriv [passed through]) 0.009060 ms
>>> display: :1 screen: 0
>>> [VGL 0x88d75800] glXIsDirect (dpy=0x00db5b00(:1.0) ctx=0x00e80b48
>>> direct=1 ) 0.003099 ms
>>> direct rendering: Yes
>>> server glx vendor string: VirtualGL
>>> server glx version string: 1.4
>>> server glx extensions:
>>>
>>> On Thursday, April 28, 2022 at 10:59:13 AM UTC-4 Ryan Salomon wrote:
>>>
>>>> The version of TurboVNC is TurboVNC Server v2.2.90 (build 20211222)
>>>>
>>>> I had however tried version 3 and had encountered the same issues.
>>>>
>>>> Hmm. So far none of the suggested steps helped.
>>>> Running vglrun glxspheres64, glxgears, etc results in a blank output
>>>> window.
>>>>
>>>> I'm currently trying to run vglrun +tr glxinfo to see if it gives any
>>>> useful output
>>>>
>>>> On Wednesday, April 27, 2022 at 6:34:26 PM UTC-4 DRC wrote:
>>>>
>>>>> Your message ended up in my spam folder for some reason, and I was out
>>>>> of the office last week anyhow. Unfortunately I've never encountered
>>>>> those
>>>>> symptoms, so I have no good ideas. The only symptoms I've observed that
>>>>> are even remotely similar are due to nVidia's HardDPMS feature, which
>>>>> causes 3D applications to run at 1 frame/second with VirtualGL if the
>>>>> screen saver is active on the 3D X server. (Add
>>>>>
>>>>> Option "HardDPMS" "false"
>>>>>
>>>>> under the "Device" or "Screen" section in xorg.conf to work around
>>>>> that issue.)
>>>>>
>>>>> Other shots in the dark:
>>>>>
>>>>> - Double check that 'vglrun -d :0 glxinfo' reports a direct OpenGL
>>>>> context. "Back in the day" (15 years ago), I seem to recall that, on
>>>>> some
>>>>> systems, insufficient 3D X server permissions resulted in an indirect
>>>>> OpenGL context rather than an error. I can't imagine why that would
>>>>> cause
>>>>> a 25-minute delay, but it would almost certainly cause a delay.
>>>>>
>>>>> - Try removing ~/.Xauthority and restarting the system, in case there
>>>>> is some cruft in that file that is causing problems.
>>>>>
>>>>> - Try running 'vglrun /opt/VirtualGL/bin/glxspheres64' instead of
>>>>> 'vglrun glxgears' and observing whether the behavior is different. That
>>>>> may provide a clue.
>>>>>
>>>>> - Try running 'vglrun /opt/VirtualGL/bin/glxspheres64' and then
>>>>> 'vglrun -sp /opt/VirtualGL/bin/glxspheres64' and observing whether the
>>>>> behavior is different. That may also provide a clue.
>>>>>
>>>>> - On the off-hand chance that this is a TurboVNC problem, which
>>>>> version of TurboVNC are you using on the host and the client? Can you
>>>>> provide more details about your network connection? Try requesting a
>>>>> screen refresh from the TurboVNC Server (using Ctrl-Alt-Shift-R or the
>>>>> toolbar button.)
>>>>> - Try setting VGL_PROBEGLX=0 in the environment prior to invoking
>>>>> vglrun, on the off-hand chance that VirtualGL's 2D X server GLX probing
>>>>> (which is unnecessary in a TurboVNC session) is causing issues.
>>>>> (TurboVNC
>>>>> 3.0 will set that environment variable by default.)
>>>>>
>>>>> DRC
>>>>> On 4/27/22 4:17 PM, Ryan Salomon wrote:
>>>>>
>>>>> bump! Anyone have at least an ideal direction to look in for more
>>>>> information?
>>>>>
>>>>> On Thursday, April 21, 2022 at 2:22:10 PM UTC-4 Ryan Salomon wrote:
>>>>>
>>>>>> Hello! I have a Linux GPU server upon which I'm testing logging into
>>>>>> via a TurboVNC session and running 3D apps via VirtualGL, which I've
>>>>>> configured.
>>>>>>
>>>>>> I've tested vglrun -d :0 glxinfo and vglrun -d :0 glxgears , and both
>>>>>> run perfectly fine as the root user, but run fine as me but with an
>>>>>> insaaane 25 minute delay before they output anything.
>>>>>>
>>>>>> When running vglserver_config , I answered No to all of the questions
>>>>>> because this host is sequestered sufficiently from the rest of our
>>>>>> network
>>>>>> (not to discuss security concerns, but to give background of the setup).
>>>>>>
>>>>>> Also for background of the setup, this is a CentOS 7.9 host, that is
>>>>>> AD bound to our ActiveDirectory domain, and my user account that I
>>>>>> tested
>>>>>> with, as well as client accounts, are AD bound.
>>>>>> I add this in case it is at all likely that the delay is from AD
>>>>>> doing something silly.
>>>>>>
>>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "VirtualGL User Discussion/Support" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/virtualgl-users/fbe4168f-480b-40f3-b061-6d193bdd09d3n%40googlegroups.com
>>>
>>> <https://groups.google.com/d/msgid/virtualgl-users/fbe4168f-480b-40f3-b061-6d193bdd09d3n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> --
>> You received this message because you are subscribed to the Google Groups
>> "VirtualGL User Discussion/Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/virtualgl-users/074bade4-49cb-4d2e-b198-0c6c14a1f919n%40googlegroups.com
>>
>> <https://groups.google.com/d/msgid/virtualgl-users/074bade4-49cb-4d2e-b198-0c6c14a1f919n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "VirtualGL User Discussion/Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/virtualgl-users/4db8ba2c-4023-4a12-9bd7-c7b5140838fan%40googlegroups.com
>
> <https://groups.google.com/d/msgid/virtualgl-users/4db8ba2c-4023-4a12-9bd7-c7b5140838fan%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
--
You received this message because you are subscribed to the Google Groups
"VirtualGL User Discussion/Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/virtualgl-users/cd4ea137-9013-4453-ae78-8efe37cbf098n%40googlegroups.com.