I asked the developer of the R quartz() device code for some help, and he thinks it's related to a conflict between event loops.

My bug only occurs when R is running in the terminal. Before starting rgl, R has no Cocoa event loop. I imagine XQuartz starts one when I initialize things. Then the quartz() device initializes Cocoa and starts its own event loop, and somehow the two event loops clash.

When R is running its GUI R.app, it already has an event loop and the bug doesn't happen.

I have a workaround: making sure quartz() starts first. I think I'll be satisfied with that.

Duncan Murdoch




On 25/02/2021 12:31 a.m., Jeremy Huddleston Sequoia wrote:
Yeah, if gldAttachDrawable does gpuiReleaseDrawable, IOAccelGLContextClearDrawable, and then returns the error on its error out paths.

gldAttachDrawable takes the drawable type as the second argument. The types are:

none (0)
pbuffer  (90)
window (80)
offscreen (53)
fullscreen (54)


When gldAttachDrawable() is called with type none, it looks to me like it should just do its error out path (release / clear / return error).  I don't see how it would call gfxIODataBindSurface in that case.  Of course, I'm not necessarilary looking at the exact same implementation as is on your system.  Can you provide the output of `image list` from lldb, so I can see the UUIDs of the various dylibs to look at the exact source version?

In any event, we need to figure out why we're getting a type of none into gldAttachDrawable.

gliAttachDrawableWithOptions takes a type and passes it straight through, so it's not that.

CGLSetSurface just takes a context, connection id, window ID, and surface ID.

The context is passed straight from the input to xp_attach_gl_context, and the xp_surface_id input to xp_attach_gl_context maps to the wid / sid passed to CGLSetSurface.

---

Would you be able to reduce this issue to a very small X11 + GLX application that I could use to debug deeper myself?

On Feb 24, 2021, at 12:38, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote:

On 24/02/2021 3:10 p.m., Duncan Murdoch wrote:
The only call it makes is to libGFXShared.dylib`gfxIODataBindSurface,
and when it returns from that it jumps to the error exit.
Inside that function, it checks whether a pointer is non-null, then uses
it to jump to libGPUSupportMercury.dylib`gldAttachDrawable.
In gldAttachDrawable, it looks like it is detecting something wrong,
then it calls gpuiReleaseDrawable, IOAccelGLContextClearDrawable, and
then returns the 0x2715 = 10005 = kCGLBadDrawable value.
I don't have the source (do I?), and I don't know the argument passing
conventions.  Can you tell me how type would be passed in?

I've found some info that says the 2nd argument would be passed in RSI.  If that's the case, then what I'm seeing is the following:

- When things are working properly, the value 0x50 = 80 is passed in several times.

- After calling quartz(), the value is 0, and the error is triggered.

Duncan Murdoch

I don't think we get to either of the other functions.
Duncan Murdoch
On 24/02/2021 12:59 p.m., Jeremy Huddleston Sequoia wrote:
IOAccelGLContextClearDrawable is called on the error-out path of that function, so yeah, we need to see how we got there.

enum32_t gldAttachDrawable(GLDContext ctx, enum32_t type, const GLDDrawable drawable, bitfield32_t options, GLTDimensions *size_ret)

Can you tell me what the type is here?

Do we get to IOAccelGLContextSetDrawable()?  If so, what does it return?
Do we get to gpulUpdateDrawableDepth()?  If so, what does it return?


On Feb 24, 2021, at 09:46, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> wrote:

Yes, I did get it wrong.  It looks like the error was detected before the call to IOAccelGLContextClearDrawable and the stack checking.  I'll see if I can figure out where.

Duncan Murdoch

On 24/02/2021 12:11 p.m., Duncan Murdoch wrote:
I don't see any calls to __stack_chk_fail .  It's possible I
misinterpreted what was going on after the IOAccelGLContextClearDrawable
call.  I'll take another look.
Duncan Murdoch
On 24/02/2021 11:41 a.m., Jeremy Huddleston Sequoia wrote:
__stack_chk_guard is part of stack protector.

If it's not liking the value in __stack_chk_guard, it means the stack
was smashed.

When this is detected, the compiler runtime should
call __stack_chk_fail() if implemented or abort if not.  Given that
we're not crashing, I wonder if there's a handler somewhere that ends up
causing us to return the bad value instead of crashing.

Can you break on __stack_chk_fail and see if that gives us anything useful?




On Feb 24, 2021, at 06:26, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>> wrote:

Tracing in with lldb, it appears to be this sequence of calls leading
to the 10005 error value:

r
  * frame #0: 0x00007fff5afc19e0
libGPUSupportMercury.dylib`gldAttachDrawable + 1
    frame #1: 0x00007fff4467f396 GLEngine`gliAttachDrawableWithOptions
+ 251
    frame #2: 0x00007fff4465d9f5
OpenGL`___lldb_unnamed_symbol40$$OpenGL + 972
    frame #3: 0x00007fff446618e2
OpenGL`___lldb_unnamed_symbol59$$OpenGL + 82
    frame #4: 0x00007fff44661c29 OpenGL`CGLSetSurface + 330
    frame #5: 0x00007fff70c6ca63
libXplugin.1.dylib`xp_attach_gl_context + 95
    frame #6: 0x0000000108590dee libGL.1.dylib`surface_make_current + 206
    frame #7: 0x000000010858df6a
libGL.1.dylib`apple_glx_make_current_context + 1274
    frame #8: 0x0000000108574579 libGL.1.dylib`applegl_bind_context + 185     frame #9: 0x000000010856237e libGL.1.dylib`MakeContextCurrent + 414
    frame #10: 0x00000001085621d9 libGL.1.dylib`glXMakeCurrent + 41


The libGPUSupportMercury.dylib`gldAttachDrawable function calls

IOAccelGLContextClearDrawable

then does some sort of check of __stack_chk_guard and doesn't like
what it sees, and sets the error.

Does this give any hint about what's wrong, or a way to fix it?

Duncan Murdoch



On 23/02/2021 4:31 p.m., Duncan Murdoch wrote:
On 23/02/2021 3:47 p.m., Jeremy Huddleston Sequoia wrote:


On Feb 23, 2021, at 06:14, Duncan Murdoch <murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>>> wrote:

On 23/02/2021 12:47 a.m., Jeremy Huddleston Sequoia wrote:
On Feb 22, 2021, at 14:38, Duncan Murdoch
<murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com> <mailto:murdoch.dun...@gmail.com <mailto:murdoch.dun...@gmail.com>>>>> wrote:

I've made a little bit of progress.

The message "error: xp_attach_gl_context returned: 2" comes from the Mesa routine surface_make_current, which calls xp_attach_gl_context.
  I haven't found where xp_attach_gl_context is defined.
xp_attach_gl_context is in libXplugin (check Xplugin.h in the SDK).
2 is XP_BadValue, which is returned if cgl_ctx is NULL... so I'd
suggest looking into why mesa is calling xp_attach_gl_context with a
NULL context.

Thanks, that's helpful.  The context is not NULL, so I need to think
of other ways it could be "bad".

Ok, well xp_attach_gl_context is just a wrapper around CGLSetSurface(), which is an internal function to do exactly what we're trying to do here.  If it returns any error, xp_attach_gl_context returns bad value.

Are you able to capture this in the debugger and figure out what the
return value from CGLSetSurface() is?  That will tell us what the
underlying CGLError is, which might help shed some light on this.
I believe it's returning  0x0000000000002715 when there's an error.
That's 10005, kCGLBadDrawable.  So now I need to find out what happened
to the drawable.
This feels like progress!  Thanks again.
Duncan

Here's what I see with LIBGL_DIAGNOSTIC=1.  For a successful open,

rgl.open()
function is no-op
Debug     ../src/glx/apple/apple_glx_context.c:205
apple_glx_create_context(4295810496): apple_glx_create_context: ac
0x100a10a00 ac->context_obj 0x107cdce00
2021-02-23 08:23:00.041711-0500 R[45754:1283995]
apple_glx_create_context: ac 0x100a10a00 ac->context_obj 0x107cdce00
Debug     ../src/glx/apple/apple_glx_drawable.c:342
apple_glx_drawable_create(4295810496): apple_glx_drawable_create: new
drawable 0x107ce0e00
2021-02-23 08:23:00.042235-0500 R[45754:1283995]
apple_glx_drawable_create: new drawable 0x107ce0e00
Debug     ../src/glx/apple/apple_glx_surface.c:154
create_surface(4295810496): create_surface: created a surface for
drawable 0x600066 with uid 621
2021-02-23 08:23:00.044773-0500 R[45754:1283995] create_surface:
created a surface for drawable 0x600066 with uid 621
Debug     ../src/glx/apple/apple_glx_surface.c:69
surface_make_current(4295810496): surface_make_current:
ac->context_obj 0x107cdce00 s->surface_id 9
2021-02-23 08:23:00.044839-0500 R[45754:1283995] surface_make_current:
ac->context_obj 0x107cdce00 s->surface_id 9
Debug     ../src/glx/apple/apple_glx_surface.c:89
surface_make_current(4295810496): surface_make_current: drawable
0x600066
2021-02-23 08:23:00.045680-0500 R[45754:1283995] surface_make_current:
drawable 0x600066
... (more lines deleted)

After I run quartz(), I see this:

rgl.open()
Debug     ../src/glx/apple/apple_glx_context.c:205
apple_glx_create_context(4295810496): apple_glx_create_context: ac
0x10262bb00 ac->context_obj 0x1058c4800
2021-02-23 08:23:35.666675-0500 R[45754:1283995]
apple_glx_create_context: ac 0x10262bb00 ac->context_obj 0x1058c4800
Debug     ../src/glx/apple/apple_glx_drawable.c:342
apple_glx_drawable_create(4295810496): apple_glx_drawable_create: new
drawable 0x107648000
2021-02-23 08:23:35.667040-0500 R[45754:1283995]
apple_glx_drawable_create: new drawable 0x107648000
Debug     ../src/glx/apple/apple_glx_surface.c:154
create_surface(4295810496): create_surface: created a surface for
drawable 0x6000c9 with uid 629
2021-02-23 08:23:35.669119-0500 R[45754:1283995] create_surface:
created a surface for drawable 0x6000c9 with uid 629
Debug     ../src/glx/apple/apple_glx_surface.c:69
surface_make_current(4295810496): surface_make_current:
ac->context_obj 0x1058c4800 s->surface_id 13
2021-02-23 08:23:35.669195-0500 R[45754:1283995] surface_make_current:
ac->context_obj 0x1058c4800 s->surface_id 13
error: xp_attach_gl_context returned: 2
Debug     ../src/glx/applegl_glx.c:60
applegl_bind_context(4295810496): applegl_bind_context: error YES
2021-02-23 08:23:35.669834-0500 R[45754:1283995] applegl_bind_context:
error YES

and then I get my own messages from the failure of glXMakeCurrent().
  As far as I can see, everything appears fine until the call to
xp_attach_gl_context.


Everything looks very similar up to the failure of
xp_attach_gl_context.  Any idea I why the value returned a few lines
earlier from apple_glx_create_context() should be a bad value?

Duncan Murdoch









_______________________________________________
Xquartz-dev mailing list
Xquartz-dev@lists.macosforge.org <mailto:Xquartz-dev@lists.macosforge.org>
https://lists.macosforge.org/mailman/listinfo/xquartz-dev




_______________________________________________
Xquartz-dev mailing list
Xquartz-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/xquartz-dev



_______________________________________________
Xquartz-dev mailing list
Xquartz-dev@lists.macosforge.org
https://lists.macosforge.org/mailman/listinfo/xquartz-dev

Reply via email to