So far I don't know any simple way to trigger the bug. The best I have is running the R script
library(rgl) quartz() rgl.open()in R running in the terminal, but that requires R and rgl to be installed, so I wouldn't call it simple at all, but here are the instructions:
1. You need a of R that supports debugging as described here: https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#I-cannot-attach-debugger-to-R.
2. You need to install rgl from the "quartzbug" branch on Github, because the standard version crashes when the glXMakeCurrent call fails, and the master branch version has an ugly workaround. To do that, run this in R:
install.packages(c("remotes", "rgl")) remotes::install_github("dmurdoch/rgl", "quartzbug")This installs the released version first because that's the quickest way to install the dependencies, then installs the patched version.
Duncan Murdoch On 25/02/2021 12:31 a.m., Jeremy Huddleston Sequoia wrote:
Yeah, if gldAttachDrawable does gpuiReleaseDrawable, IOAccelGLContextClearDrawable, and then returns the error on its error out paths.gldAttachDrawable takes the drawable type as the second argument. The types are:none (0) pbuffer (90) window (80) offscreen (53) fullscreen (54)When gldAttachDrawable() is called with type none, it looks to me like it should just do its error out path (release / clear / return error). I don't see how it would call gfxIODataBindSurface in that case. Of course, I'm not necessarilary looking at the exact same implementation as is on your system. Can you provide the output of `image list` from lldb, so I can see the UUIDs of the various dylibs to look at the exact source version?In any event, we need to figure out why we're getting a type of none into gldAttachDrawable.gliAttachDrawableWithOptions takes a type and passes it straight through, so it's not that.CGLSetSurface just takes a context, connection id, window ID, and surface ID.The context is passed straight from the input to xp_attach_gl_context, and the xp_surface_id input to xp_attach_gl_context maps to the wid / sid passed to CGLSetSurface.---Would you be able to reduce this issue to a very small X11 + GLX application that I could use to debug deeper myself?On Feb 24, 2021, at 12:38, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote:On 24/02/2021 3:10 p.m., Duncan Murdoch wrote:The only call it makes is to libGFXShared.dylib`gfxIODataBindSurface, and when it returns from that it jumps to the error exit. Inside that function, it checks whether a pointer is non-null, then uses it to jump to libGPUSupportMercury.dylib`gldAttachDrawable. In gldAttachDrawable, it looks like it is detecting something wrong, then it calls gpuiReleaseDrawable, IOAccelGLContextClearDrawable, and then returns the 0x2715 = 10005 = kCGLBadDrawable value. I don't have the source (do I?), and I don't know the argument passing conventions. Can you tell me how type would be passed in?I've found some info that says the 2nd argument would be passed in RSI. If that's the case, then what I'm seeing is the following:- When things are working properly, the value 0x50 = 80 is passed in several times.- After calling quartz(), the value is 0, and the error is triggered. Duncan MurdochI don't think we get to either of the other functions. Duncan Murdoch On 24/02/2021 12:59 p.m., Jeremy Huddleston Sequoia wrote:IOAccelGLContextClearDrawable is called on the error-out path of that function, so yeah, we need to see how we got there.enum32_t gldAttachDrawable(GLDContext ctx, enum32_t type, const GLDDrawable drawable, bitfield32_t options, GLTDimensions *size_ret)Can you tell me what the type is here? Do we get to IOAccelGLContextSetDrawable()? If so, what does it return? Do we get to gpulUpdateDrawableDepth()? If so, what does it return?On Feb 24, 2021, at 09:46, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote:Yes, I did get it wrong. It looks like the error was detected before the call to IOAccelGLContextClearDrawable and the stack checking. I'll see if I can figure out where.Duncan Murdoch On 24/02/2021 12:11 p.m., Duncan Murdoch wrote:I don't see any calls to __stack_chk_fail . It's possible Imisinterpreted what was going on after the IOAccelGLContextClearDrawablecall. I'll take another look. Duncan Murdoch On 24/02/2021 11:41 a.m., Jeremy Huddleston Sequoia wrote:__stack_chk_guard is part of stack protector. If it's not liking the value in __stack_chk_guard, it means the stack was smashed. When this is detected, the compiler runtime should call __stack_chk_fail() if implemented or abort if not. Given thatwe're not crashing, I wonder if there's a handler somewhere that ends upcausing us to return the bad value instead of crashing.Can you break on __stack_chk_fail and see if that gives us anything useful?On Feb 24, 2021, at 06:26, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>> wrote:Tracing in with lldb, it appears to be this sequence of calls leadingto the 10005 error value: r * frame #0: 0x00007fff5afc19e0 libGPUSupportMercury.dylib`gldAttachDrawable + 1frame #1: 0x00007fff4467f396 GLEngine`gliAttachDrawableWithOptions+ 251 frame #2: 0x00007fff4465d9f5 OpenGL`___lldb_unnamed_symbol40$$OpenGL + 972 frame #3: 0x00007fff446618e2 OpenGL`___lldb_unnamed_symbol59$$OpenGL + 82 frame #4: 0x00007fff44661c29 OpenGL`CGLSetSurface + 330 frame #5: 0x00007fff70c6ca63 libXplugin.1.dylib`xp_attach_gl_context + 95frame #6: 0x0000000108590dee libGL.1.dylib`surface_make_current + 206frame #7: 0x000000010858df6a libGL.1.dylib`apple_glx_make_current_context + 1274frame #8: 0x0000000108574579 libGL.1.dylib`applegl_bind_context + 185 frame #9: 0x000000010856237e libGL.1.dylib`MakeContextCurrent + 414frame #10: 0x00000001085621d9 libGL.1.dylib`glXMakeCurrent + 41 The libGPUSupportMercury.dylib`gldAttachDrawable function calls IOAccelGLContextClearDrawable then does some sort of check of __stack_chk_guard and doesn't like what it sees, and sets the error. Does this give any hint about what's wrong, or a way to fix it? Duncan Murdoch On 23/02/2021 4:31 p.m., Duncan Murdoch wrote:On 23/02/2021 3:47 p.m., Jeremy Huddleston Sequoia wrote:On Feb 23, 2021, at 06:14, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>>> wrote:On 23/02/2021 12:47 a.m., Jeremy Huddleston Sequoia wrote:xp_attach_gl_context is in libXplugin (check Xplugin.h in the SDK).On Feb 22, 2021, at 14:38, Duncan Murdoch<murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>>>> wrote:I've made a little bit of progress.The message "error: xp_attach_gl_context returned: 2" comes from the Mesa routine surface_make_current, which calls xp_attach_gl_context.I haven't found where xp_attach_gl_context is defined.2 is XP_BadValue, which is returned if cgl_ctx is NULL... so I'dsuggest looking into why mesa is calling xp_attach_gl_context with aNULL context.Thanks, that's helpful. The context is not NULL, so I need to thinkof other ways it could be "bad".Ok, well xp_attach_gl_context is just a wrapper around CGLSetSurface(), which is an internal function to do exactly what we're trying to do here. If it returns any error, xp_attach_gl_context returns bad value.Are you able to capture this in the debugger and figure out what thereturn value from CGLSetSurface() is? That will tell us what the underlying CGLError is, which might help shed some light on this.I believe it's returning 0x0000000000002715 when there's an error.That's 10005, kCGLBadDrawable. So now I need to find out what happenedto the drawable. This feels like progress! Thanks again. DuncanHere's what I see with LIBGL_DIAGNOSTIC=1. For a successful open,rgl.open()function is no-op Debug ../src/glx/apple/apple_glx_context.c:205apple_glx_create_context(4295810496): apple_glx_create_context: ac0x100a10a00 ac->context_obj 0x107cdce00 2021-02-23 08:23:00.041711-0500 R[45754:1283995]apple_glx_create_context: ac 0x100a10a00 ac->context_obj 0x107cdce00Debug ../src/glx/apple/apple_glx_drawable.c:342apple_glx_drawable_create(4295810496): apple_glx_drawable_create: newdrawable 0x107ce0e00 2021-02-23 08:23:00.042235-0500 R[45754:1283995] apple_glx_drawable_create: new drawable 0x107ce0e00 Debug ../src/glx/apple/apple_glx_surface.c:154 create_surface(4295810496): create_surface: created a surface for drawable 0x600066 with uid 621 2021-02-23 08:23:00.044773-0500 R[45754:1283995] create_surface: created a surface for drawable 0x600066 with uid 621 Debug ../src/glx/apple/apple_glx_surface.c:69 surface_make_current(4295810496): surface_make_current: ac->context_obj 0x107cdce00 s->surface_id 92021-02-23 08:23:00.044839-0500 R[45754:1283995] surface_make_current:ac->context_obj 0x107cdce00 s->surface_id 9 Debug ../src/glx/apple/apple_glx_surface.c:89 surface_make_current(4295810496): surface_make_current: drawable 0x6000662021-02-23 08:23:00.045680-0500 R[45754:1283995] surface_make_current:drawable 0x600066 ... (more lines deleted) After I run quartz(), I see this:rgl.open()Debug ../src/glx/apple/apple_glx_context.c:205apple_glx_create_context(4295810496): apple_glx_create_context: ac0x10262bb00 ac->context_obj 0x1058c4800 2021-02-23 08:23:35.666675-0500 R[45754:1283995]apple_glx_create_context: ac 0x10262bb00 ac->context_obj 0x1058c4800Debug ../src/glx/apple/apple_glx_drawable.c:342apple_glx_drawable_create(4295810496): apple_glx_drawable_create: newdrawable 0x107648000 2021-02-23 08:23:35.667040-0500 R[45754:1283995] apple_glx_drawable_create: new drawable 0x107648000 Debug ../src/glx/apple/apple_glx_surface.c:154 create_surface(4295810496): create_surface: created a surface for drawable 0x6000c9 with uid 629 2021-02-23 08:23:35.669119-0500 R[45754:1283995] create_surface: created a surface for drawable 0x6000c9 with uid 629 Debug ../src/glx/apple/apple_glx_surface.c:69 surface_make_current(4295810496): surface_make_current: ac->context_obj 0x1058c4800 s->surface_id 132021-02-23 08:23:35.669195-0500 R[45754:1283995] surface_make_current:ac->context_obj 0x1058c4800 s->surface_id 13 error: xp_attach_gl_context returned: 2 Debug ../src/glx/applegl_glx.c:60 applegl_bind_context(4295810496): applegl_bind_context: error YES2021-02-23 08:23:35.669834-0500 R[45754:1283995] applegl_bind_context:error YESand then I get my own messages from the failure of glXMakeCurrent().As far as I can see, everything appears fine until the call to xp_attach_gl_context. Everything looks very similar up to the failure ofxp_attach_gl_context. Any idea I why the value returned a few linesearlier from apple_glx_create_context() should be a bad value? Duncan Murdoch_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org <mailto:Xquartz-dev@lists.macosforge.org> https://lists.macosforge.org/mailman/listinfo/xquartz-dev_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org https://lists.macosforge.org/mailman/listinfo/xquartz-dev
_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org https://lists.macosforge.org/mailman/listinfo/xquartz-dev