On Fri, Jun 23, 2017 at 3:45 PM, axie wrote:
> Hi Marek,
>
> I understand you spent time on your original logic too. I really don't
> understand why you talked about pain if somebody can improve it.
>
> To reduce the pain, now I am seriously considering dropping this patch. But
>
On Fri, Jun 23, 2017 at 1:55 PM, Zhou, David(ChunMing)
wrote:
>
>
> From: Marek Olšák [mar...@gmail.com]
> Sent: Friday, June 23, 2017 6:49 PM
> To: Christian König
> Cc: Zhou, David(ChunMing); Xie, AlexBin;
On Fri, Jun 23, 2017 at 3:01 PM, Christian König
wrote:
> The key point here is while optimizing this is nice the much bigger pile is
> the locking done for each BO.
>
> In other words even when we optimize all the other locks involved into
> atomics or RCU, the BO
On Fri, Jun 23, 2017 at 05:02:58PM -0400, Felix Kuehling wrote:
> Hi John,
>
> I haven't read your patches. Just a question based on the cover letter.
>
> I understand that visible VRAM is the biggest pain point. But could the
> same reasoning make sense for invisible VRAM? That is, doing all
> -Original Message-
> From: Gavin Wan [mailto:gavin@amd.com]
> Sent: Friday, June 23, 2017 5:33 PM
> To: dl.gcr.gpu-virtual; brahma_sw_dev; amd-gfx@lists.freedesktop.org
> Cc: Wan, Gavin
> Subject: [PATCH] drm/amdgpu: Support passing amdgpu critical error to host
> via GPU Mailbox.
>
+static const struct acpi_device_id cz_audio_acpi_match[] = {
+ { "I2SC1002", 0 },
This one goes on my list of _HID that don't follow ACPI/PCI
vendorID/PartID conventions.
AMD shoud use the "AMDI" ACPI ID or the 0x1002 PCI ID for the 4 first
characters, if everyone does what they feel
Hi John,
I haven't read your patches. Just a question based on the cover letter.
I understand that visible VRAM is the biggest pain point. But could the
same reasoning make sense for invisible VRAM? That is, doing all the
migrations to VRAM in a workqueue?
Regards,
Felix
On 17-06-23 01:39
Reviewed-by: Samuel Li
Sam
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Alex Deucher
> Sent: Thursday, June 22, 2017 6:29 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander
This allows selecting the GPU by its PCI device both with and
without kernel mode support. The instance is populated automatically
so that the proper corresponding debugfs files are used if present.
Signed-off-by: Jean-Francois Thibert
---
doc/umr.1 | 4 +++
Sorry for the delay.
The series is Reviewed-by: Felix Kuehling .
Regards,
Felix
On 17-06-13 02:24 PM, Christian König wrote:
> Am 13.06.2017 um 19:07 schrieb Alex Deucher:
>> On Fri, Jun 9, 2017 at 5:47 PM, Harish Kasiviswanathan
>>
This allows a BO to have busy placements that are not part of its normal
placements.
Users that want the busy placements to be the same can change the
placement.busy_placement pointer and corresponding count to be the same as
the regular placements.
Signed-off-by: John Brooks
Signed-off-by: John Brooks
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++
3 files changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
Moving CPU-accessible BOs from GTT to visible VRAM reduces latency on the
GPU and improves average framerate. However, it's an expensive operation.
When visible VRAM is full and evictions are necessary, it can easily take
tens of milliseconds. On the CS path, that directly increases the frame
time
Allow specifying a limit on visible VRAM via a module parameter. This is
helpful for testing performance under visible VRAM pressure.
Signed-off-by: John Brooks
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4
When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace,
it should only be treated as a hint to initially place a BO somewhere CPU
accessible, rather than having a permanent effect on BO placement.
Instead of the flag being set in stone at BO creation, set the flag when a
page
amdgpu_ttm_placement_init() callers that are using both VRAM and GTT as
domains usually don't want visible VRAM as a busy placement.
Signed-off-by: John Brooks
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +-
1 file changed, 9 insertions(+), 1 deletion(-)
The BO move throttling code is designed to allow VRAM to fill quickly if it
is relatively empty. However, this does not take into account situations
where the visible VRAM is smaller than total VRAM, and total VRAM may not
be close to full but the visible VRAM segment is under pressure. In such
This patch series is intended to improve performance when limited CPU-visible
VRAM is under pressure.
Moving BOs into visible VRAM is essentially a housekeeping task. It's faster to
access them in VRAM than GTT, but it isn't a hard requirement for them to be in
VRAM. As such, it is unnecessary to
There is no need for page faults to force BOs into visible VRAM if it's
full, and the time it takes to do so is great enough to cause noticeable
stuttering. Add GTT as a possible placement so that if visible VRAM is
full, page faults move BOs to GTT instead of evicting other BOs from VRAM.
On Fri, Jun 23, 2017 at 12:43 PM, Christian König
wrote:
> Am 23.06.2017 um 18:34 schrieb Alex Deucher:
>>
>> From: Vijendar Mukunda
>>
>> asic_type information is passed to ACP DMA Driver as platform data.
>> We need this to determine whether
Am 23.06.2017 um 18:34 schrieb Alex Deucher:
From: Vijendar Mukunda
asic_type information is passed to ACP DMA Driver as platform data.
We need this to determine whether the asic is Carrizo (CZ) or
Stoney (ST) in the acp sound driver.
Reviewed-by: Alex Deucher
From: Vijendar Mukunda
Stoney uses 16kb SRAM memory for playback and 16Kb
for capture. Modified Max buffer size to have the
correct mapping between System Memory and SRAM.
Added snd_pcm_hardware structures for playback
and capture for Stoney.
Reviewed-by: Alex Deucher
From: Vijendar Mukunda
Added condition checks for CZ specific code based on asic_type.
Stoney specific code will be added in a future commit.
Reviewed-by: Alex Deucher
Signed-off-by: Vijendar Mukunda
Signed-off-by:
From: Akshu Agrawal
The driver is used for AMD board using rt5650 codec.
Reviewed-by: Alex Deucher
Signed-off-by: Akshu Agrawal
Signed-off-by: Alex Deucher
---
sound/soc/amd/Kconfig |
From: Vijendar Mukunda
Power Gating is disabled in Stoney platform.
Reviewed-by: Alex Deucher
Signed-off-by: Vijendar Mukunda
Signed-off-by: Alex Deucher
---
From: Vijendar Mukunda
Added DW_I2S_QUIRK_16BIT_IDX_OVERRIDE quirk for Stoney.
Supported format and bus width for I2S controller read
from I2S Component Parameter registers.
These are ready only registers.
For Stoney, I2S Component Parameter registers are programmed
From: Vijendar Mukunda
Added DMA driver changes for Stoney platform.
Below are the key differences between Stoney and CZ:
- Memory Gating is disabled
- SRAM Banks won't be turned off
- No Of SRAM Banks reduced to 6
- DAGB Garlic Interface used
- 16 bit resolution is
This patch set updates the AMD GPU and Audio CoProcessor (ACP)
audio drivers and the designware i2s driver for Stoney (ST).
ST is an APU similar to Carrizo (CZ) which already has ACP audio
support. The i2s controller and ACP audio DMA engine are part of
the GPU and both need updating so I would
From: Vijendar Mukunda
Added quirk DW_I2S_QUIRK_16BIT_IDX_OVERRIDE to Designware
driver. This quirk will set idx value to 1.
By setting this quirk, it will override supported format
as 16 bit resolution and bus width as 2 Bytes.
Reviewed-by: Alex Deucher
From: Vijendar Mukunda
asic_type information is passed to ACP DMA Driver as platform data.
We need this to determine whether the asic is Carrizo (CZ) or
Stoney (ST) in the acp sound driver.
Reviewed-by: Alex Deucher
Signed-off-by: Vijendar
On the other hand, after you optimize the BO reservation lock, other
locks still need optimization, right?
In theory yes, in practice no.
There are just way other things we should tackle before taking care of
removing any locks that we probably never get to that point even with
more
Hi Christian,
I agree with you. On the other hand, after you optimize the BO
reservation lock, other locks still need optimization, right?
1. Locking itself is not cheap.
2. Waiting in lock is even more expensive.
Thanks,
Alex Bin Xie
On 2017-06-23 09:01 AM, Christian König wrote:
The
Hi Marek,
I understand you spent time on your original logic too. I really don't
understand why you talked about pain if somebody can improve it.
To reduce the pain, now I am seriously considering dropping this patch.
But please read on before you conclude. Let us treat open source
software
On 2017-06-22 13:49, Philippe CORNU wrote:
> On 06/22/2017 08:06 AM, Peter Rosin wrote:
>> The redundant fb helper .load_lut is no longer used, and can not
>> work right without also providing the fb helpers .gamma_set and
>> .gamma_get thus rendering the code in this driver suspect.
>>
>
> Hi
The key point here is while optimizing this is nice the much bigger pile
is the locking done for each BO.
In other words even when we optimize all the other locks involved into
atomics or RCU, the BO reservation lock will still dominate everything.
One possible solution to this would be per
From: Marek Olšák [mar...@gmail.com]
Sent: Friday, June 23, 2017 6:49 PM
To: Christian König
Cc: Zhou, David(ChunMing); Xie, AlexBin; amd-gfx@lists.freedesktop.org; Xie,
AlexBin
Subject: Re: [PATCH 1/3] drm/amdgpu: fix a typo
On Fri, Jun 23, 2017 at
I agree with you about the spinlock. You seem to be good at this.
It's always good to do measurements to validate that a code change
improves something, especially when the code size and code complexity
has to be increased. A CPU profiler such as sysprof can show you
improvements on the order of
Some style/flow issues inline below.
On 22/06/17 04:36 PM, Jean-Francois Thibert wrote:
This allows selecting the GPU by its PCI device both with and
without kernel mode support. The instance is populated automatically
so that the proper corresponding debugfs files are used if present.
On Fri, Jun 23, 2017 at 11:27 AM, Christian König
wrote:
> Am 23.06.2017 um 11:08 schrieb zhoucm1:
>>
>>
>>
>> On 2017年06月23日 17:01, zhoucm1 wrote:
>>>
>>>
>>>
>>> On 2017年06月23日 16:25, Christian König wrote:
Am 23.06.2017 um 09:09 schrieb zhoucm1:
>
>
On Thu, Jun 22, 2017 at 11:49:34AM +, Philippe CORNU wrote:
>
>
> On 06/22/2017 08:06 AM, Peter Rosin wrote:
> > The redundant fb helper .load_lut is no longer used, and can not
> > work right without also providing the fb helpers .gamma_set and
> > .gamma_get thus rendering the code in this
Am 23.06.2017 um 11:08 schrieb zhoucm1:
On 2017年06月23日 17:01, zhoucm1 wrote:
On 2017年06月23日 16:25, Christian König wrote:
Am 23.06.2017 um 09:09 schrieb zhoucm1:
On 2017年06月23日 14:57, Christian König wrote:
But giving the CS IOCTL an option for directly specifying the BOs
instead of a
On 2017年06月23日 17:01, zhoucm1 wrote:
On 2017年06月23日 16:25, Christian König wrote:
Am 23.06.2017 um 09:09 schrieb zhoucm1:
On 2017年06月23日 14:57, Christian König wrote:
But giving the CS IOCTL an option for directly specifying the BOs
instead of a BO list like Marek suggested would indeed
On 2017年06月23日 16:25, Christian König wrote:
Am 23.06.2017 um 09:09 schrieb zhoucm1:
On 2017年06月23日 14:57, Christian König wrote:
But giving the CS IOCTL an option for directly specifying the BOs
instead of a BO list like Marek suggested would indeed save us some
time here.
interesting,
On 2017年06月23日 16:25, Christian König wrote:
Am 23.06.2017 um 09:09 schrieb zhoucm1:
On 2017年06月23日 14:57, Christian König wrote:
But giving the CS IOCTL an option for directly specifying the BOs
instead of a BO list like Marek suggested would indeed save us some
time here.
interesting,
On Thu, Jun 22, 2017 at 10:48:10AM +0200, Peter Rosin wrote:
> On 2017-06-22 08:36, Daniel Vetter wrote:
> > On Wed, Jun 21, 2017 at 11:40:52AM +0200, Peter Rosin wrote:
> >> On 2017-06-21 09:38, Daniel Vetter wrote:
> >>> On Tue, Jun 20, 2017 at 09:25:25PM +0200, Peter Rosin wrote:
> This
Am 23.06.2017 um 09:09 schrieb zhoucm1:
On 2017年06月23日 14:57, Christian König wrote:
But giving the CS IOCTL an option for directly specifying the BOs
instead of a BO list like Marek suggested would indeed save us some
time here.
interesting, I always follow how to improve our cs ioctl,
On 2017年06月23日 14:57, Christian König wrote:
But giving the CS IOCTL an option for directly specifying the BOs
instead of a BO list like Marek suggested would indeed save us some
time here.
interesting, I always follow how to improve our cs ioctl, since UMD guys
aften complain our command
Hi Alex,
actually Marek is right, command submission is actually not much of a
bottleneck to us because it is handled from a separate userspace thread.
So those micro optimizations you do here on CPU cycles are actually
rather superfluous.
But giving the CS IOCTL an option for directly
48 matches
Mail list logo