Hi Daniel,

On 28.07.2017 12:46, Daniel Stone wrote:
On 28 July 2017 at 10:24, Nicolai Hähnle <nhaeh...@gmail.com> wrote:
On 28.07.2017 09:44, Daniel Stone wrote:
No, I don't think it is. Tiled layouts still have a stride: if you
look at i915 X/Y/Yf/Y_CCS/Yf_CCS (the latter two containing an
auxiliary compression/fast-clear buffer), iMX/etnaviv
tiled/supertiled, or VC4 T-tiled modifiers and how they're handled
both for DRIImage and KMS interchange, they all specify a stride which
is conceptually the same as linear, if you imagine linear to be 1x1
tiled.

Most tiling users accept any integer units of tiles (buffer width
aligned to tile width), but not always. The NV12MT format used by
Samsung media decoders (also shipped in some Qualcomm SoCs) is
particularly special, IIRC requiring height to be aligned to a
multiple of two tiles.

Fair enough, but I think you need to distinguish between the chosen stride
and stride *requirements*. I do think it makes sense to consider the stride
requirement as part of the format/layout description, but more below.

Right. Stride is a property of one buffer, stride requirements are a
property of the users of that buffer (GPU, display control, media
encode, etc). The requirements also depend on use, e.g. trying to do
rotation through your scanout engine can change those requirements.

Right.


It definitely seems attractive to kill two birds with one stone, but
I'd really much rather not conflate format description/advertisement,
and allocation restriction, into one enum. I'm still on the side of
saying that this is a problem modifiers do not solve, deferring to the
allocator we need anyway in order to determine things like placement
and global optimality (e.g. rotated scanout placing further
restrictions on allocation).

Okay, the original issue here is that the allocator *cannot* determine the
alignment requirement in the use case that prompted this sub-thread.

The use case is PRIME off-loading, where the rendering GPU supports linear
layouts with a 64 byte stride, while the display GPU requires a 256 byte
stride.

The allocator *cannot* solve this issue, because the allocation happens on
the rendering GPU. We need to communicate somehow what the display GPU's
stride requirements are.

How do you propose to do that?

The allocator[0] in itself can't magically reach across processes to
determine desired usage and resolve dependencies. But the entire
design behind it was to be able to solve cross-device usage: between
GPU and scanout, between both of those and media encode/decode, etc.
Obviously it can't do that without help, so winsys will need to gain
protocol in order to express those in terms the allocator will
understand.

The idea was to split information into positive capabilities and
negative constraints. Modifier queries fall into the same boat as
format queries: you're expressing an additional capability ('I can
speak tiled'). Stride alignment, for me, falls into a negative
constraint ('linear allocations must have stride aligned to 256
bytes'). Similarly, placement constraints (VRAM possibly only
accessible to SLI-type paired GPU vs. GTT vs. pure system RAM, etc)
are in the same boat AFAICT. So this helps solve one side of the
equation, but not the other.

I've been thinking about this some more, and I can see now that the changed modifier scheme that I originally proposed does not fit well into places where modifiers are used to express buffer properties (e.g. DRI3PixmapFromBuffers, DRI3BuffersFromPixmap).

But I see no proposal on how to fix the issue so far. You cannot fully separate capabilities from constraints. As is, we (AMD) cannot properly implement the proposed DRI3 v1.1: what would we return in DRI3GetSupportedModifiers?

The natural option is to return (at least) DRM_FORMAT_MOD_LINEAR, but that would be a lie, because we *don't* speak arbitrary linear formats.

I don't think this is difficult to fix in terms of protocol, although there's plenty of opportunity for bike-shedding :)

I see roughly two options:

1. Make the constraints per-modifier, and add a "constraints: ListOfCard32" (or 64) to the response to DRI3GetSupportedModifiers. We can then reserve some bits for global constraints (e.g. placement) and some bits on a per-modifier basis (e.g. stride alignment for linear). You could build constraints like DRM_CONSTRAINT_PLACEMENT_SYSTEM | DRM_CONSTRAINT_LINEAR_STRIDE_256B.

2. Make the constraints global, and add a DRI3GetConstraints protocol with the same signature as DRI3GetSupportedModifiers. We'd need vendor namespaces for the constraint defines, to support constraints that are specific to vendor-specific modifiers. You could have entries like DRM_CONSTRAINT_PLACEMENT(DRM_CONSTRAINT_PLACEMENT_SYSTEM) and, as a separate list entry, DRM_CONSTRAINT_LINEAR_STRIDE(256).

Cheers,
Nicolai
_______________________________________________
xorg-devel@lists.x.org: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: https://lists.x.org/mailman/listinfo/xorg-devel

Reply via email to