[PATCH v2] drm/i915/guc: Enable w/a 16021333562 for DG2, MTL and ARL

2024-05-28 Thread John . C . Harrison
From: John Harrison Enable another workaround that is implemented inside the GuC. v2: Use the correct Gen12 w/a id rather than the Xe version (review feedback from Matthew R) also extend to include ARL. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 1 +

[PATCH] drm/i915/guc: Enable w/a 14019882105 for DG2 and MTL

2024-05-24 Thread John . C . Harrison
From: John Harrison Enable another workaround that is implemented inside the GuC. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 32 --- 2 files changed, 21 insertions(+), 12 deletions(-)

[PATCH] drm/i915/guc: Fix the fix for reset lock confusion

2024-03-29 Thread John . C . Harrison
From: John Harrison The previous fix for the circlular lock splat about the busyness worker wasn't quite complete. Even though the reset-in-progress flag is cleared at the start of intel_uc_reset_finish, the entire function is still inside the reset mutex lock. Not sure why the patch appeared to

[PATCH] drm/i915/guc: Update w/a 14019159160

2024-03-07 Thread John . C . Harrison
From: John Harrison An existing workaround has been extended in both platforms affected and implementation complexity. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 3 ++- drivers/gpu/drm/i915/gt/uc/intel_guc.c| 3 ++-

[PATCH v3 3/3] drm/i915/guc: Enable Wa_14019159160

2024-02-23 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar ---

[PATCH v3 0/3] Enable Wa_14019159160 and Wa_16019325821 for MTL

2024-02-23 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. v2: Fix bug in WA KLV implementation (offset not being reset to start of list). Add better comment to prep patch about how KLVs can be added. Add a module parameter override and disable the w/a

[PATCH v3 2/3] drm/i915/guc: Add support for w/a KLVs

2024-02-23 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +

[PATCH v3 1/3] drm/i915: Enable Wa_16019325821

2024-02-23 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19

[PATCH] drm/i915/guc: Correct capture of EIR register on hang

2024-02-23 Thread John . C . Harrison
From: John Harrison The EIR register (0x20B0) was being included in the engine class list for render and compute as the absolute register address. However, it is actually a ring register available on all engines at an offset of (base) + 0xB0. As it was included as an RCS engine but with the

[PATCH v3] drm/i915/guc: Simplify/extend platform check for Wa_14018913170

2024-02-23 Thread John . C . Harrison
From: John Harrison The above w/a is required for every platform that the i915 driver supports. It is fixed on the latest platforms but they are only supported by Xe instead of i915. So just remove the platform check completely and keep the code simple. v2: Add extra comment (review feedback

[PATCH v3] drm/i915/guc: Simplify/extend platform check for Wa_14018913170

2024-02-16 Thread John . C . Harrison
From: John Harrison The above w/a is required for every platform that the i915 driver supports. It is fixed on the latest platforms but they are only supported by Xe instead of i915. So just remove the platform check completely and keep the code simple. Signed-off-by: John Harrison ---

[PATCH] drm/i915/gt: Restart the heartbeat timer when forcing a pulse

2024-01-10 Thread John . C . Harrison
From: John Harrison The context persistence code does things like send super high priority heartbeat pulses to ensure any leaked context can still be pre-empted and thus isn't a total denial of service but only a minor denial of service. Unfortunately, it wasn't bothering to restart the heatbeat

[PATCH v3 2/3] drm/i915/guc: Add support for w/a KLVs

2024-01-04 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +

[PATCH v3 3/3] drm/i915/guc: Enable Wa_14019159160

2024-01-04 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar ---

[PATCH v3 1/3] drm/i915: Enable Wa_16019325821

2024-01-04 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19

[PATCH v3 0/3] Enable Wa_14019159160 and Wa_16019325821 for MTL

2024-01-04 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. v2: Fix bug in WA KLV implementation (offset not being reset to start of list). Add better comment to prep patch about how KLVs can be added. Add a module parameter override and disable the w/a

[PATCH] drm/i915/huc: Allow for very slow HuC loading

2024-01-02 Thread John . C . Harrison
From: John Harrison A failure to load the HuC is occasionally observed where the cause is believed to be a low GT frequency leading to very long load times. So a) increase the timeout so that the user still gets a working system even in the case of slow load. And b) report the frequency during

[PATCH v3 2/3] drm/i915/guc: Add support for w/a KLVs

2023-12-20 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +

[PATCH v3 0/3] Enable Wa_14019159160 and Wa_16019325821 for MTL

2023-12-20 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. v2: Fix bug in WA KLV implementation (offset not being reset to start of list). Add better comment to prep patch about how KLVs can be added. Add a module parameter override and disable the w/a

[PATCH v3 1/3] drm/i915: Enable Wa_16019325821

2023-12-20 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19

[PATCH v3 3/3] drm/i915/guc: Enable Wa_14019159160

2023-12-20 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison Reviewed-by: Vinay Belgaumkar ---

[PATCH] drm/i915/guc: Avoid circular locking issue on busyness flush

2023-12-19 Thread John . C . Harrison
From: John Harrison Avoid the following lockdep complaint: <4> [298.856498] == <4> [298.856500] WARNING: possible circular locking dependency detected <4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G N <4> [298.856505]

[PATCH v2 2/2] drm/i915/guc: Add a selftest for FAST_REQUEST errors

2023-11-13 Thread John . C . Harrison
From: John Harrison There is a mechanism for reporting errors from fire and forget H2G messages. This is the only way to find out about almost any error in the GuC backend submission path. So it would be useful to know that it is working. v2: Fix some dumb over-complications and a couple of

[PATCH v2 0/2] Selftest for FAST_REQUEST feature

2023-11-13 Thread John . C . Harrison
From: John Harrison Add a selftest to verify that the FAST_REQUEST mechanism (getting errors back from fire-and-forget H2G commands) is functional. Also fix up a potential false positive in the GuC hang selftest. v2: Fix some dumb over-complications and typos - review feedback from Daniele.

[PATCH v2 1/2] drm/i915/guc: Fix for potential false positives in GuC hang selftest

2023-11-13 Thread John . C . Harrison
From: John Harrison Noticed that the hangcheck selftest is submitting a non-preemptoble spinner. That means that even if the GuC does not die, the heartbeat will still kick in and trigger a reset. Which is rather defeating the purpose of the test - to verify that the heartbeat will kick in if

[PATCH 1/2] drm/i915/guc: Don't double enable a context

2023-11-09 Thread John . C . Harrison
From: John Harrison If a context is blocked, unblocked and subitted repeatedly in rapid succession, the driver can end up trying to enable the context while the previous enable request is still in flight. This can lead to much confusion in the state tracking. Prevent that by checking the

[PATCH 0/2] Don't send double context enable/disable requests

2023-11-09 Thread John . C . Harrison
From: John Harrison The driver could sometimes send context enable/disable requests when a previous request was still pending. This is not allowed. So stop doing it. Signed-off-by: John Harrison John Harrison (2): drm/i915/guc: Don't double enable a context drm/i915/guc: Don't disable a

[PATCH 2/2] drm/i915/guc: Don't disable a context whose enable is still pending

2023-11-09 Thread John . C . Harrison
From: John Harrison Various processes involve requesting GuC to disable a given context. However context enable/disable is an asynchronous process in the GuC. Thus, it is possible the previous enable request is still being processed when the disable request is triggered. Having both enable and

[PATCH 1/2] drm/i915/guc: Fix for potential false positives in GuC hang selftest

2023-11-06 Thread John . C . Harrison
From: John Harrison Noticed that the hangcheck selftest is submitting a non-preemptoble spinner. That means that even if the GuC does not die, the heartbeat will still kick in and trigger a reset. Which is rather defeating the purpose of the test - to verify that the heartbeat will kick in if

[PATCH 2/2] drm/i915/guc: Add a selftest for FAST_REQUEST errors

2023-11-06 Thread John . C . Harrison
From: John Harrison There is a mechanism for reporting errors from fire and forget H2G messages. This is the only way to find out about almost any error in the GuC backend submission path. So it would be useful to know that it is working. Signed-off-by: John Harrison ---

[PATCH 0/2] Selftest for FAST_REQUEST feature

2023-11-06 Thread John . C . Harrison
From: John Harrison Add a selftest to verify that the FAST_REQUEST mechanism (getting errors back from fire-and-forget H2G commands) is functional. Also fix up a potential false positive in the GuC hang selftest. Signed-off-by: John Harrison John Harrison (2): drm/i915/guc: Fix for

[PATCH v2 4/4] drm/i915/mtl: Add module parameter override for Wa_16019325821/Wa_14019159160

2023-10-27 Thread John . C . Harrison
From: John Harrison These w/a's can have signficant performance implications for any workload which uses both RCS and CCS. On the other hand, the hang itself is only seen in one or two very specific workloads. So add a module parameter to control whether the w/a's are enabled or not and default

[PATCH v2 3/4] drm/i915/guc: Enable Wa_14019159160

2023-10-27 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 3 ++

[PATCH v2 1/4] drm/i915: Enable Wa_16019325821

2023-10-27 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++

[PATCH v2 2/4] drm/i915/guc: Add support for w/a KLVs

2023-10-27 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 +

[PATCH v2 0/4] Enable Wa_14019159160 and Wa_16019325821 for MTL

2023-10-27 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. v2: Fix bug in WA KLV implementation (offset not being reset to start of list). Add better comment to prep patch about how KLVs can be added. Add a module parameter override and disable the w/a

[PATCH 2/2] drm/i915: More use of GT specific print helpers

2023-10-09 Thread John . C . Harrison
From: John Harrison Update a bunch of GT related print messages in non-GT files to use the GT specific helpers. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 8 +++- drivers/gpu/drm/i915/i915_driver.c| 3 ++- drivers/gpu/drm/i915/i915_perf.c

[PATCH 1/2] drm/i915/gt: More use of GT specific print helpers

2023-10-09 Thread John . C . Harrison
From: John Harrison A bunch of print messages got missed in the update to using sub-system specific helpers. So update those. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 29 + drivers/gpu/drm/i915/gt/intel_gsc.c | 11

[PATCH 0/2] More print message helper updates

2023-10-09 Thread John . C . Harrison
From: John Harrison There was an update a while back to use sub-system specific print helpers that implicitly add sub-system specific information to the print. It seems a bunch of GT related messages got missed in that update. So update them now. Signed-off-by: John Harrison John Harrison

[PATCH] drm/i915/guc: Update 'recommended' version to 70.12.1 for DG2/ADL-S/ADL-P/MTL

2023-10-06 Thread John . C . Harrison
From: John Harrison The latest GuC has new features and new workarounds that we wish to enable. So let the universe know that it is useful to update their firmware. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 8 1 file changed, 4 insertions(+), 4

[PATCH] drm/i915/guc: Enable WA 14018913170

2023-10-05 Thread John . C . Harrison
From: Daniele Ceraolo Spurio The GuC handles the WA, the KMD just needs to set the flag to enable it on the appropriate platforms. Signed-off-by: John Harrison Signed-off-by: Daniele Ceraolo Spurio Reviewed-by: Vinay Belgaumkar --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 6 ++

[PATCH 1/3] drm/i915/guc: Support new and improved engine busyness

2023-09-22 Thread John . C . Harrison
From: John Harrison The GuC has been extended to support a much more friendly engine busyness interface. So partition the old interface into a 'busy_v1' space and add 'busy_v2' support alongside. And if v2 is available, use that in preference to v1. Note that v2 provides extra features over and

[PATCH 3/3] drm/i915/mtl: Add counters for engine busyness ticks

2023-09-22 Thread John . C . Harrison
From: Umesh Nerlige Ramappa In new version of GuC engine busyness, GuC provides engine busyness ticks as a 64 bit counter. Add a new counter to relay this value to the user as is. Signed-off-by: Umesh Nerlige Ramappa Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/intel_engine.h

[PATCH 0/3] Engine busyness v2

2023-09-22 Thread John . C . Harrison
From: John Harrison The latest GuC implements a new and improved scheme for tracking engine busyness. So make use of it. Note that this change comes along with a new set of PMU counters. The old counters have a fundamental problem that they are defined in terms of wall time but the sampling is

[PATCH 2/3] drm/i915/mtl: Add a PMU counter for total active ticks

2023-09-22 Thread John . C . Harrison
From: Umesh Nerlige Ramappa Current engine busyness interface exposed by GuC has a few issues: - The busyness of active engine is calculated using 2 values provided by GuC and is prone to race between CPU reading those values and GuC updating them. Any sort of HW synchronization would be at

[PATCH] drm/i915/guc: Suppress 'ignoring reset notification' message

2023-09-21 Thread John . C . Harrison
From: John Harrison If an active context has been banned (e.g. Ctrl+C killed) then it is likely to be reset as part of evicting it from the hardware. That results in a 'ignoring context reset notification: banned = 1' message at info level. This confuses/concerns people and makes them thing

[PATCH 3/4] drm/i915/guc: Add support for w/a KLVs

2023-09-15 Thread John . C . Harrison
From: John Harrison To prevent running out of bits, new w/a enable flags are being added via a KLV system instead of a 32 bit flags word. Signed-off-by: John Harrison --- .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 3 +

[PATCH 4/4] drm/i915/guc: Enable Wa_14019159160

2023-09-15 Thread John . C . Harrison
From: John Harrison Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a super-set of Wa_16019325821, so requires turning that one as well as setting the new flag for Wa_14019159160 itself. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 3 +++

[PATCH 2/4] drm/i915: Enable Wa_16019325821

2023-09-15 Thread John . C . Harrison
From: John Harrison Some platforms require holding RCS context switches until CCS is idle (the reverse w/a of Wa_14014475959). Some platforms require both versions. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++

[PATCH 1/4] drm/i915/guc: Update 'recommended' version to 70.11.0 for DG2/ADL-P/MTL

2023-09-15 Thread John . C . Harrison
From: John Harrison The latest GuC has new features and new workarounds that we wish to enable. So let the universe know that it is useful to update their firmware. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 6 +++--- 1 file changed, 3 insertions(+), 3

[PATCH 0/4] Enable Wa_14019159160 and Wa_16019325821 for MTL

2023-09-15 Thread John . C . Harrison
From: John Harrison Enable Wa_14019159160 and Wa_16019325821 for MTL RCS/CCS workarounds for MTL. Signed-off-by: John Harrison John Harrison (4): drm/i915/guc: Update 'recommended' version to 70.11.0 for DG2/ADL-P/MTL drm/i915: Enable Wa_16019325821 drm/i915/guc: Add support for

[PATCH 1/2] drm/i915/guc: Update 'recommended' version to 70.11.0 for DG2/ADL-P/MTL

2023-09-14 Thread John . C . Harrison
From: John Harrison The latest GuC has new features and new workarounds that we wish to enable. So let the universe know that it is useful to update their firmware. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 6 +++--- 1 file changed, 3 insertions(+), 3

[PATCH 2/2] drm/i915/guc: Enable WA 14018913170

2023-09-14 Thread John . C . Harrison
From: Daniele Ceraolo Spurio The GuC handles the WA, the KMD just needs to set the flag to enable it on the appropriate platforms. Signed-off-by: John Harrison Signed-off-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/intel_guc.c | 6 ++

[PATCH 0/2] Enable Wa_14018913170 on DG2/MTL/PVD

2023-09-14 Thread John . C . Harrison
From: John Harrison Enable a WA on the latest platforms. Also update the recommended GuC version for those platforms to the latest available. Further patches will follow to make use of other features in the latest GuC firmware, but the w/a at least requires something newer than what was

[PATCH v2] drm/i915/guc: Force a reset on internal GuC error

2023-08-15 Thread John . C . Harrison
From: John Harrison If GuC hits an internal error (and survives long enough to report it to the KMD), it is basically toast and will stop until a GT reset and subsequent GuC reload is performed. Previously, the KMD just printed an error message and then waited for the heartbeat to eventually

[PATCH] drm/i915/guc: Fix potential null pointer deref in GuC 'steal id' test

2023-08-02 Thread John . C . Harrison
From: John Harrison It was noticed that if the very first 'stealing' request failed to create for some reason then the 'steal all ids' loop would immediately exit with 'last' still being NULL. The test would attempt to continue but using a null pointer. Fix that by aborting the test if it fails

[PATCH] drm/i915/guc: Force a reset on internal GuC error

2023-06-05 Thread John . C . Harrison
From: John Harrison If GuC hits an internal error (and survives long enough to report it to the KMD), it is basically toast and will stop until a GT reset and subsequent GuC reload is performed. Previously, the KMD just printed an error message and then waited for the heartbeat to eventually

[PATCH] drm/i915/guc: Remove some obsolete definitions

2023-05-31 Thread John . C . Harrison
From: John Harrison There were a bunch of defines and structures left over from an API update a very long time ago. Remove them. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 33 - 1 file changed, 33 deletions(-) diff --git

[PATCH 0/3] Use FAST_REQUEST mechanism for non-blocking H2G calls

2023-05-26 Thread John . C . Harrison
From: John Harrison The GuC interface supports a mechanism for returning errors against non-blocking H2G calls. This is called FAST_REQUEST. Given that the call is asynchronous, matching the returned error up is difficult. However, getting any error at all back is better than no error. If any

[PATCH 3/3] drm/i915/guc: Track all sent actions to GuC

2023-05-26 Thread John . C . Harrison
From: Michal Wajdeczko For easier debug of any unexpected error responses from GuC that might be related to non-blocking fast requests, track action code (and stack if under DEBUG_GUC config) for every H2G request. Signed-off-by: Michal Wajdeczko Signed-off-by: John Harrison ---

[PATCH 2/3] drm/i915/guc: Update log for unsolicited CTB response

2023-05-26 Thread John . C . Harrison
From: Michal Wajdeczko Instead of printing message fence twice, include HXG header of the unexpected message and its len. Signed-off-by: Michal Wajdeczko Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)

[PATCH 1/3] drm/i915/guc: Use FAST_REQUEST for non-blocking H2G calls

2023-05-26 Thread John . C . Harrison
From: Michal Wajdeczko In addition to the already defined REQUEST HXG message format, which is used when sender expects some confirmation or data, HXG protocol includes definition of the FAST REQUEST message, that may be used when sender does not expect any useful data to be returned. Using

[PATCH] drm/i915/guc: Fix confused register capture list creation

2023-05-11 Thread John . C . Harrison
From: John Harrison The GuC has a completely separate engine class enum when referring to register capture lists, which combines render and compute. The driver was using the 'normal' GuC specific engine class enum instead. That meant that it thought it was defining a capture list for compute

[PATCH] drm/i1915/guc: Fix probe injection CI failures after recent change

2023-05-10 Thread John . C . Harrison
From: John Harrison A recent change bumped a 'notice' message up to 'error' level for debug builds to help trap incorrect configurations in CI systems. Unfortunaetly, tha error condition in question is triggered by the error injection probe test. So change the message again to be 'probe error'

[PATCH 1/2] drm/i915/uc: Track patch level versions on reduced version firmware files

2023-05-04 Thread John . C . Harrison
From: John Harrison When reduced version firmware files were added (matching major component being the only strict requirement), the minor version was still tracked and a notification reported if it was older. However, the patch version should really be tracked as well for the same reasons. The

[PATCH 0/2] Update MTL GuC firmware

2023-05-04 Thread John . C . Harrison
From: John Harrison Update MTL to the latest GuC release and switch to using reduced version file names. Also, pull in a patch from an earlier series that is waiting to merge to prevent merge conflicts later. Signed-off-by: John Harrison John Harrison (2): drm/i915/uc: Track patch level

[PATCH 2/2] drm/i915/mtl: Update GuC firmware version for MTL to 70.6.6

2023-05-04 Thread John . C . Harrison
From: John Harrison Also switch to using reduced version file naming as it is no longer such a work-in-progress and likely to change. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[PATCH v3 6/6] drm/i915/uc: Make unexpected firmware versions an error in debug builds

2023-05-02 Thread John . C . Harrison
From: John Harrison If the DEBUG_GEM config option is set then escalate the 'unexpected firmware version' message from a notice to an error. This will ensure that the CI system treats such occurences as a failure and logs a bug about it (or fails the pre-merge testing). Signed-off-by: John

[PATCH v3 5/6] drm/i915/uc: Reject duplicate entries in firmware table

2023-05-02 Thread John . C . Harrison
From: John Harrison It was noticed that duplicate entries in the firmware table could cause an infinite loop in the firmware loading code if that entry failed to load. Duplicate entries are a bug anyway and so should never happen. Ensure they don't by tweaking the table validation code to reject

[PATCH v3 4/6] drm/i915/uc: Enhancements to firmware table validation

2023-05-02 Thread John . C . Harrison
From: John Harrison The validation of the firmware table was being done inside the code for scanning the table for the next available firmware blob. Which is unnecessary. So pull it out into a separate function that is only called once per blob type at init time. Also, drop the CONFIG_SELFTEST

[PATCH v3 1/6] drm/i915/guc: Decode another GuC load failure case

2023-05-02 Thread John . C . Harrison
From: John Harrison Explain another potential firmware failure mode and early exit the long wait if hit. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++ 2

[PATCH v3 2/6] drm/i915/guc: Print status register when waiting for GuC to load

2023-05-02 Thread John . C . Harrison
From: John Harrison If the GuC load is taking an excessively long time, the wait loop currently prints the GT frequency. Extend that to include the GuC status as well so we can see if the GuC is actually making progress or not. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio

[PATCH v3 3/6] drm/i915/uc: Track patch level versions on reduced version firmware files

2023-05-02 Thread John . C . Harrison
From: John Harrison When reduced version firmware files were added (matching major component being the only strict requirement), the minor version was still tracked and a notification reported if it was older. However, the patch version should really be tracked as well for the same reasons. The

[PATCH v3 0/6] Improvements to uc firmare management

2023-05-02 Thread John . C . Harrison
From: John Harrison Enhance the firmware table verification code to catch more potential errors and to generally improve the code itself. Track patch level version even on reduced version files to allow user notification of missing bug fixes. Detect another immediate failure case when loading

[PATCH v2 4/4] drm/i915/guc: Fix error capture for virtual engines

2023-04-28 Thread John . C . Harrison
From: John Harrison GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT] gem_exec_balancer: starting subtest hang [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388] [drm] GT0: GUC: No

[PATCH v2 3/4] drm/i915/guc: Capture list naming clean up

2023-04-28 Thread John . C . Harrison
From: John Harrison Don't use 'xe_lp*' prefixes for register lists that are common with Gen8. Don't add Xe only GSC registers to pre-Xe devices that don't even have a GSC engine. Fix Xe_LP name. Don't use GEN9 as a prefix for register lists that contain all GEN8 registers. Rename the

[PATCH v2 2/4] drm/i915/guc: Consolidate duplicated capture list code

2023-04-28 Thread John . C . Harrison
From: John Harrison Remove 99% duplicated steered register list code. Also, include the pre-Xe steered registers in the pre-Xe list generation. Signed-off-by: John Harrison Reviewed-by: Alan Previn --- .../gpu/drm/i915/gt/uc/intel_guc_capture.c| 112 +- 1 file changed, 29

[PATCH v2 0/4] Improvements to GuC error capture

2023-04-28 Thread John . C . Harrison
From: John Harrison The GuC error capture list creation was including Gen8 registers on Xe platforms. While fixing that, it was noticed that there were other issues. The platform naming was wrong, the naming of lists was misleading, the steered register code was duplicated and steered registers

[PATCH v2 1/4] drm/i915/guc: Don't capture Gen8 regs on Xe devices

2023-04-28 Thread John . C . Harrison
From: John Harrison A pair of pre-Xe registers were being included in the Xe capture list. GuC was rejecting those as being invalid and logging errors about them. So, stop doing it. Signed-off-by: John Harrison Reviewed-by: Alan Previn Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers

[PATCH 6/6] drm/i915/guc: Capture list clean up - 5

2023-04-26 Thread John . C . Harrison
From: John Harrison Rename the 'default_' register list prefix to 'gen8_' as that is the more accurate name. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git

[PATCH] drm/i915/guc: Actually return an error if GuC version range check fails

2023-04-21 Thread John . C . Harrison
From: John Harrison Dan Carpenter pointed out that 'err' was not being set in the case where the GuC firmware version range check fails. Fix that. Note that while this is bug fix for a previous patch (see Fixes tag below). It is an exceedingly low risk bug. The range check is asserting that the

[PATCH 6/6] drm/i915/uc: Make unexpected firmware versions an error in debug builds

2023-04-20 Thread John . C . Harrison
From: John Harrison If the DEBUG_GEM config option is set then escalate the 'unexpected firmware version' message from a notice to an error. This will ensure that the CI system treats such occurences as a failure and logs a bug about it (or fails the pre-merge testing). Signed-off-by: John

[PATCH 4/6] drm/i915/uc: Enhancements to firmware table validation

2023-04-20 Thread John . C . Harrison
From: John Harrison The validation of the firmware table was being done inside the code for scanning the table for the next available firmware blob. Which is unnecessary. So pull it out into a separate function that is only called once per blob type at init time. Also, drop the CONFIG_SELFTEST

[PATCH 0/6] Improvements to uc firmare management

2023-04-20 Thread John . C . Harrison
From: John Harrison Enhance the firmware table verification code to catch more potential errors and to generally improve the code itself. Track patch level version even on reduced version files to allow user notification of missing bug fixes. Detect another immediate failure case when loading

[PATCH 1/6] drm/i915/guc: Decode another GuC load failure case

2023-04-20 Thread John . C . Harrison
From: John Harrison Explain another potential firmware failure mode and early exit the long wait if hit. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++ 2

[PATCH 3/6] drm/i915/uc: Track patch level versions on reduced version firmware files

2023-04-20 Thread John . C . Harrison
From: John Harrison When reduced version firmware files were added (matching major component being the only strict requirement), the minor version was still tracked and a notification reported if it was older. However, the patch version should really be tracked as well for the same reasons. The

[PATCH 2/6] drm/i915/guc: Print status register when waiting for GuC to load

2023-04-20 Thread John . C . Harrison
From: John Harrison If the GuC load is taking an excessively long time, the wait loop currently prints the GT frequency. Extend that to include the GuC status as well so we can see if the GuC is actually making progress or not. Signed-off-by: John Harrison Reviewed-by: Daniele Ceraolo Spurio

[PATCH 5/6] drm/i915/uc: Reject duplicate entries in firmware table

2023-04-20 Thread John . C . Harrison
From: John Harrison It was noticed that duplicate entries in the firmware table could cause an infinite loop in the firmware loading code if that entry failed to load. Duplicate entries are a bug anyway and so should never happen. Ensure they don't by tweaking the table validation code to reject

[PATCH v2 2/2] drm/i915/guc: Dump error capture to dmesg on CTB error

2023-04-18 Thread John . C . Harrison
From: John Harrison In the past, There have been sporadic CTB failures which proved hard to reproduce manually. The most effective solution was to dump the GuC log at the point of failure and let the CI system do the repro. It is preferable not to dump the GuC log via dmesg for all issues as it

[PATCH v2 1/2] drm/i915: Dump error capture to kernel log

2023-04-18 Thread John . C . Harrison
From: John Harrison This is useful for getting debug information out in certain situations, such as failing kernel selftests and CI runs that don't log error captures. It is especially useful for things like retrieving GuC logs as GuC operation can't be tracked by adding printk or ftrace

[PATCH v2 0/2] Add support for dumping error captures via kernel logging

2023-04-18 Thread John . C . Harrison
From: John Harrison Sometimes, the only effective way to debug an issue is to dump all the interesting information at the point of failure. So add support for doing that. v2: Extra CONFIG wrapping (review feedback from Rodrigo) Signed-off-by: John Harrison John Harrison (2): drm/i915:

[PATCH 2/5] drm/i915/guc: Print status register when waiting for GuC to load

2023-04-14 Thread John . C . Harrison
From: John Harrison If the GuC load is taking an excessively long time, the wait loop currently prints the GT frequency. Extend that to include the GuC status as well so we can see if the GuC is actually making progress or not. Signed-off-by: John Harrison ---

[PATCH 5/5] drm/i915/uc: Reject doplicate entries in firmware table

2023-04-14 Thread John . C . Harrison
From: John Harrison It was noticed that duplicte entries in the firmware table could cause an infinite loop in the firmware loading code if that entry failed to load. Duplicate entries are a bug anyway and so should never happen. Ensure they don't by tweaking the table validation code to reject

[PATCH 3/5] drm/i915/uc: Track patch level versions on reduced version firmware files

2023-04-14 Thread John . C . Harrison
From: John Harrison When reduced version firmware files were added (matching major component being the only strict requirement), the minor version was still tracked and a notification reported if it was older. However, the patch version should really be tracked as well for the same reasons. The

[PATCH 1/5] drm/i915/guc: Decode another GuC load failure case

2023-04-14 Thread John . C . Harrison
From: John Harrison Explain another potential firmware failure mode and early exit the long wait if hit. Signed-off-by: John Harrison --- drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++ 2 files changed, 7 insertions(+) diff

[PATCH 4/5] drm/i915/uc: Split firmware table validation to a separate function

2023-04-14 Thread John . C . Harrison
From: John Harrison The validation of the firmware table was being done inside the code for scanning the table for the next available firmware blob. Which is unnecessary. Potentially, it should be a selftest. But either way, the first step is pulling it out into a separate function that can be

[PATCH 0/5] Improvements to uc firmare management

2023-04-14 Thread John . C . Harrison
From: John Harrison Enhance the firmware table verification code to catch more potential errors and to generally improve the code itself. Track patch level version even on reduced version files to allow user notification of missing bug fixes. Detect another immediate failure case when loading

[PATCH] drm/i915/guc: Fix error capture for virtual engines

2023-04-14 Thread John . C . Harrison
From: John Harrison GuC based register dumps in error capture logs were basically broken for virtual engines. This can be seen in igt@gem_exec_balancer@hang: [IGT] gem_exec_balancer: starting subtest hang [drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388] [drm] GT0: GUC: No

[PATCH 1/2] drm/i915: Dump error capture to kernel log

2023-04-10 Thread John . C . Harrison
From: John Harrison This is useful for getting debug information out in certain situations, such as failing kernel selftests and CI runs that don't log error captures. It is especially useful for things like retrieving GuC logs as GuC operation can't be tracked by adding printk or ftrace

[PATCH 2/2] drm/i915/guc: Dump error capture to dmesg on CTB error

2023-04-10 Thread John . C . Harrison
From: John Harrison In the past, There have been sporadic CTB failures which proved hard to reproduce manually. The most effective solution was to dump the GuC log at the point of failure and let the CI system do the repro. It is preferable not to dump the GuC log via dmesg for all issues as it

  1   2   3   4   5   >