From: John Harrison
Enable another workaround that is implemented inside the GuC.
v2: Use the correct Gen12 w/a id rather than the Xe version (review
feedback from Matthew R) also extend to include ARL.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 1 +
From: John Harrison
Enable another workaround that is implemented inside the GuC.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 1 +
drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 32 ---
2 files changed, 21 insertions(+), 12 deletions(-)
From: John Harrison
The previous fix for the circlular lock splat about the busyness
worker wasn't quite complete. Even though the reset-in-progress flag
is cleared at the start of intel_uc_reset_finish, the entire function
is still inside the reset mutex lock. Not sure why the patch appeared
to
From: John Harrison
An existing workaround has been extended in both platforms affected
and implementation complexity.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/abi/guc_klvs_abi.h | 3 ++-
drivers/gpu/drm/i915/gt/uc/intel_guc.c| 3 ++-
From: John Harrison
Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a
super-set of Wa_16019325821, so requires turning that one as well as
setting the new flag for Wa_14019159160 itself.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
From: John Harrison
Enable Wa_14019159160 and Wa_16019325821 for MTL
RCS/CCS workarounds for MTL.
v2: Fix bug in WA KLV implementation (offset not being reset to start
of list). Add better comment to prep patch about how KLVs can be added.
Add a module parameter override and disable the w/a
From: John Harrison
To prevent running out of bits, new w/a enable flags are being added
via a KLV system instead of a 32 bit flags word.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
.../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +
From: John Harrison
Some platforms require holding RCS context switches until CCS is idle
(the reverse w/a of Wa_14014475959). Some platforms require both
versions.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19
From: John Harrison
The EIR register (0x20B0) was being included in the engine class list
for render and compute as the absolute register address. However, it
is actually a ring register available on all engines at an offset of
(base) + 0xB0. As it was included as an RCS engine but with the
From: John Harrison
The above w/a is required for every platform that the i915 driver
supports. It is fixed on the latest platforms but they are only
supported by Xe instead of i915. So just remove the platform check
completely and keep the code simple.
v2: Add extra comment (review feedback
From: John Harrison
The above w/a is required for every platform that the i915 driver
supports. It is fixed on the latest platforms but they are only
supported by Xe instead of i915. So just remove the platform check
completely and keep the code simple.
Signed-off-by: John Harrison
---
From: John Harrison
The context persistence code does things like send super high priority
heartbeat pulses to ensure any leaked context can still be pre-empted
and thus isn't a total denial of service but only a minor denial of
service. Unfortunately, it wasn't bothering to restart the heatbeat
From: John Harrison
To prevent running out of bits, new w/a enable flags are being added
via a KLV system instead of a 32 bit flags word.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
.../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +
From: John Harrison
Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a
super-set of Wa_16019325821, so requires turning that one as well as
setting the new flag for Wa_14019159160 itself.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
From: John Harrison
Some platforms require holding RCS context switches until CCS is idle
(the reverse w/a of Wa_14014475959). Some platforms require both
versions.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19
From: John Harrison
Enable Wa_14019159160 and Wa_16019325821 for MTL
RCS/CCS workarounds for MTL.
v2: Fix bug in WA KLV implementation (offset not being reset to start
of list). Add better comment to prep patch about how KLVs can be added.
Add a module parameter override and disable the w/a
From: John Harrison
A failure to load the HuC is occasionally observed where the cause is
believed to be a low GT frequency leading to very long load times.
So a) increase the timeout so that the user still gets a working
system even in the case of slow load. And b) report the frequency
during
From: John Harrison
To prevent running out of bits, new w/a enable flags are being added
via a KLV system instead of a 32 bit flags word.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
.../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +
From: John Harrison
Enable Wa_14019159160 and Wa_16019325821 for MTL
RCS/CCS workarounds for MTL.
v2: Fix bug in WA KLV implementation (offset not being reset to start
of list). Add better comment to prep patch about how KLVs can be added.
Add a module parameter override and disable the w/a
From: John Harrison
Some platforms require holding RCS context switches until CCS is idle
(the reverse w/a of Wa_14014475959). Some platforms require both
versions.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19
From: John Harrison
Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a
super-set of Wa_16019325821, so requires turning that one as well as
setting the new flag for Wa_14019159160 itself.
Signed-off-by: John Harrison
Reviewed-by: Vinay Belgaumkar
---
From: John Harrison
Avoid the following lockdep complaint:
<4> [298.856498] ==
<4> [298.856500] WARNING: possible circular locking dependency detected
<4> [298.856503] 6.7.0-rc5-CI_DRM_14017-g58ac4ffc75b6+ #1 Tainted: G
N
<4> [298.856505]
From: John Harrison
There is a mechanism for reporting errors from fire and forget H2G
messages. This is the only way to find out about almost any error in
the GuC backend submission path. So it would be useful to know that it
is working.
v2: Fix some dumb over-complications and a couple of
From: John Harrison
Add a selftest to verify that the FAST_REQUEST mechanism (getting
errors back from fire-and-forget H2G commands) is functional.
Also fix up a potential false positive in the GuC hang selftest.
v2: Fix some dumb over-complications and typos - review feedback from
Daniele.
From: John Harrison
Noticed that the hangcheck selftest is submitting a non-preemptoble
spinner. That means that even if the GuC does not die, the heartbeat
will still kick in and trigger a reset. Which is rather defeating the
purpose of the test - to verify that the heartbeat will kick in if
From: John Harrison
If a context is blocked, unblocked and subitted repeatedly in rapid
succession, the driver can end up trying to enable the context while
the previous enable request is still in flight. This can lead to much
confusion in the state tracking.
Prevent that by checking the
From: John Harrison
The driver could sometimes send context enable/disable requests when a
previous request was still pending. This is not allowed. So stop doing
it.
Signed-off-by: John Harrison
John Harrison (2):
drm/i915/guc: Don't double enable a context
drm/i915/guc: Don't disable a
From: John Harrison
Various processes involve requesting GuC to disable a given context.
However context enable/disable is an asynchronous process in the GuC.
Thus, it is possible the previous enable request is still being
processed when the disable request is triggered. Having both enable
and
From: John Harrison
Noticed that the hangcheck selftest is submitting a non-preemptoble
spinner. That means that even if the GuC does not die, the heartbeat
will still kick in and trigger a reset. Which is rather defeating the
purpose of the test - to verify that the heartbeat will kick in if
From: John Harrison
There is a mechanism for reporting errors from fire and forget H2G
messages. This is the only way to find out about almost any error in
the GuC backend submission path. So it would be useful to know that it
is working.
Signed-off-by: John Harrison
---
From: John Harrison
Add a selftest to verify that the FAST_REQUEST mechanism (getting
errors back from fire-and-forget H2G commands) is functional.
Also fix up a potential false positive in the GuC hang selftest.
Signed-off-by: John Harrison
John Harrison (2):
drm/i915/guc: Fix for
From: John Harrison
These w/a's can have signficant performance implications for any
workload which uses both RCS and CCS. On the other hand, the hang
itself is only seen in one or two very specific workloads. So add a
module parameter to control whether the w/a's are enabled or not and
default
From: John Harrison
Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a
super-set of Wa_16019325821, so requires turning that one as well as
setting the new flag for Wa_14019159160 itself.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 3 ++
From: John Harrison
Some platforms require holding RCS context switches until CCS is idle
(the reverse w/a of Wa_14014475959). Some platforms require both
versions.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++
From: John Harrison
To prevent running out of bits, new w/a enable flags are being added
via a KLV system instead of a 32 bit flags word.
Signed-off-by: John Harrison
---
.../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +
drivers/gpu/drm/i915/gt/uc/intel_guc.h| 2 +
From: John Harrison
Enable Wa_14019159160 and Wa_16019325821 for MTL
RCS/CCS workarounds for MTL.
v2: Fix bug in WA KLV implementation (offset not being reset to start
of list). Add better comment to prep patch about how KLVs can be added.
Add a module parameter override and disable the w/a
From: John Harrison
Update a bunch of GT related print messages in non-GT files to use the
GT specific helpers.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.c | 8 +++-
drivers/gpu/drm/i915/i915_driver.c| 3 ++-
drivers/gpu/drm/i915/i915_perf.c
From: John Harrison
A bunch of print messages got missed in the update to using sub-system
specific helpers. So update those.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/intel_engine_cs.c | 29 +
drivers/gpu/drm/i915/gt/intel_gsc.c | 11
From: John Harrison
There was an update a while back to use sub-system specific print
helpers that implicitly add sub-system specific information to the
print. It seems a bunch of GT related messages got missed in that
update. So update them now.
Signed-off-by: John Harrison
John Harrison
From: John Harrison
The latest GuC has new features and new workarounds that we wish to
enable. So let the universe know that it is useful to update their
firmware.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 8
1 file changed, 4 insertions(+), 4
From: Daniele Ceraolo Spurio
The GuC handles the WA, the KMD just needs to set the flag to enable
it on the appropriate platforms.
Signed-off-by: John Harrison
Signed-off-by: Daniele Ceraolo Spurio
Reviewed-by: Vinay Belgaumkar
---
drivers/gpu/drm/i915/gt/uc/intel_guc.c | 6 ++
From: John Harrison
The GuC has been extended to support a much more friendly engine
busyness interface. So partition the old interface into a 'busy_v1'
space and add 'busy_v2' support alongside. And if v2 is available, use
that in preference to v1. Note that v2 provides extra features over
and
From: Umesh Nerlige Ramappa
In new version of GuC engine busyness, GuC provides engine busyness
ticks as a 64 bit counter. Add a new counter to relay this value to the
user as is.
Signed-off-by: Umesh Nerlige Ramappa
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/intel_engine.h
From: John Harrison
The latest GuC implements a new and improved scheme for tracking
engine busyness. So make use of it.
Note that this change comes along with a new set of PMU counters. The
old counters have a fundamental problem that they are defined in terms
of wall time but the sampling is
From: Umesh Nerlige Ramappa
Current engine busyness interface exposed by GuC has a few issues:
- The busyness of active engine is calculated using 2 values provided by
GuC and is prone to race between CPU reading those values and GuC
updating them. Any sort of HW synchronization would be at
From: John Harrison
If an active context has been banned (e.g. Ctrl+C killed) then it is
likely to be reset as part of evicting it from the hardware. That
results in a 'ignoring context reset notification: banned = 1'
message at info level. This confuses/concerns people and makes them
thing
From: John Harrison
To prevent running out of bits, new w/a enable flags are being added
via a KLV system instead of a 32 bit flags word.
Signed-off-by: John Harrison
---
.../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +
drivers/gpu/drm/i915/gt/uc/intel_guc.h| 3 +
From: John Harrison
Use the new w/a KLV support to enable a MTL w/a. Note, this w/a is a
super-set of Wa_16019325821, so requires turning that one as well as
setting the new flag for Wa_14019159160 itself.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 3 +++
From: John Harrison
Some platforms require holding RCS context switches until CCS is idle
(the reverse w/a of Wa_14014475959). Some platforms require both
versions.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 19 +++
From: John Harrison
The latest GuC has new features and new workarounds that we wish to
enable. So let the universe know that it is useful to update their
firmware.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 6 +++---
1 file changed, 3 insertions(+), 3
From: John Harrison
Enable Wa_14019159160 and Wa_16019325821 for MTL
RCS/CCS workarounds for MTL.
Signed-off-by: John Harrison
John Harrison (4):
drm/i915/guc: Update 'recommended' version to 70.11.0 for
DG2/ADL-P/MTL
drm/i915: Enable Wa_16019325821
drm/i915/guc: Add support for
From: John Harrison
The latest GuC has new features and new workarounds that we wish to
enable. So let the universe know that it is useful to update their
firmware.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 6 +++---
1 file changed, 3 insertions(+), 3
From: Daniele Ceraolo Spurio
The GuC handles the WA, the KMD just needs to set the flag to enable
it on the appropriate platforms.
Signed-off-by: John Harrison
Signed-off-by: Daniele Ceraolo Spurio
---
drivers/gpu/drm/i915/gt/uc/intel_guc.c | 6 ++
From: John Harrison
Enable a WA on the latest platforms. Also update the recommended GuC
version for those platforms to the latest available. Further patches
will follow to make use of other features in the latest GuC firmware,
but the w/a at least requires something newer than what was
From: John Harrison
If GuC hits an internal error (and survives long enough to report it
to the KMD), it is basically toast and will stop until a GT reset and
subsequent GuC reload is performed. Previously, the KMD just printed
an error message and then waited for the heartbeat to eventually
From: John Harrison
It was noticed that if the very first 'stealing' request failed to
create for some reason then the 'steal all ids' loop would immediately
exit with 'last' still being NULL. The test would attempt to continue
but using a null pointer. Fix that by aborting the test if it fails
From: John Harrison
If GuC hits an internal error (and survives long enough to report it
to the KMD), it is basically toast and will stop until a GT reset and
subsequent GuC reload is performed. Previously, the KMD just printed
an error message and then waited for the heartbeat to eventually
From: John Harrison
There were a bunch of defines and structures left over from an API
update a very long time ago. Remove them.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 33 -
1 file changed, 33 deletions(-)
diff --git
From: John Harrison
The GuC interface supports a mechanism for returning errors against
non-blocking H2G calls. This is called FAST_REQUEST. Given that the
call is asynchronous, matching the returned error up is difficult.
However, getting any error at all back is better than no error.
If any
From: Michal Wajdeczko
For easier debug of any unexpected error responses from GuC that
might be related to non-blocking fast requests, track action code (and
stack if under DEBUG_GUC config) for every H2G request.
Signed-off-by: Michal Wajdeczko
Signed-off-by: John Harrison
---
From: Michal Wajdeczko
Instead of printing message fence twice, include HXG header of the
unexpected message and its len.
Signed-off-by: Michal Wajdeczko
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
From: Michal Wajdeczko
In addition to the already defined REQUEST HXG message format,
which is used when sender expects some confirmation or data,
HXG protocol includes definition of the FAST REQUEST message,
that may be used when sender does not expect any useful data
to be returned.
Using
From: John Harrison
The GuC has a completely separate engine class enum when referring to
register capture lists, which combines render and compute. The driver
was using the 'normal' GuC specific engine class enum instead. That
meant that it thought it was defining a capture list for compute
From: John Harrison
A recent change bumped a 'notice' message up to 'error' level for
debug builds to help trap incorrect configurations in CI systems.
Unfortunaetly, tha error condition in question is triggered by the
error injection probe test. So change the message again to be 'probe
error'
From: John Harrison
When reduced version firmware files were added (matching major
component being the only strict requirement), the minor version was
still tracked and a notification reported if it was older. However,
the patch version should really be tracked as well for the same
reasons. The
From: John Harrison
Update MTL to the latest GuC release and switch to using reduced
version file names. Also, pull in a patch from an earlier series that
is waiting to merge to prevent merge conflicts later.
Signed-off-by: John Harrison
John Harrison (2):
drm/i915/uc: Track patch level
From: John Harrison
Also switch to using reduced version file naming as it is no longer
such a work-in-progress and likely to change.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
From: John Harrison
If the DEBUG_GEM config option is set then escalate the 'unexpected
firmware version' message from a notice to an error. This will ensure
that the CI system treats such occurences as a failure and logs a bug
about it (or fails the pre-merge testing).
Signed-off-by: John
From: John Harrison
It was noticed that duplicate entries in the firmware table could cause
an infinite loop in the firmware loading code if that entry failed to
load. Duplicate entries are a bug anyway and so should never happen.
Ensure they don't by tweaking the table validation code to reject
From: John Harrison
The validation of the firmware table was being done inside the code
for scanning the table for the next available firmware blob. Which is
unnecessary. So pull it out into a separate function that is only
called once per blob type at init time.
Also, drop the CONFIG_SELFTEST
From: John Harrison
Explain another potential firmware failure mode and early exit the
long wait if hit.
Signed-off-by: John Harrison
Reviewed-by: Daniele Ceraolo Spurio
---
drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +
drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++
2
From: John Harrison
If the GuC load is taking an excessively long time, the wait loop
currently prints the GT frequency. Extend that to include the GuC
status as well so we can see if the GuC is actually making progress or
not.
Signed-off-by: John Harrison
Reviewed-by: Daniele Ceraolo Spurio
From: John Harrison
When reduced version firmware files were added (matching major
component being the only strict requirement), the minor version was
still tracked and a notification reported if it was older. However,
the patch version should really be tracked as well for the same
reasons. The
From: John Harrison
Enhance the firmware table verification code to catch more potential
errors and to generally improve the code itself.
Track patch level version even on reduced version files to allow user
notification of missing bug fixes.
Detect another immediate failure case when loading
From: John Harrison
GuC based register dumps in error capture logs were basically broken
for virtual engines. This can be seen in igt@gem_exec_balancer@hang:
[IGT] gem_exec_balancer: starting subtest hang
[drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388]
[drm] GT0: GUC: No
From: John Harrison
Don't use 'xe_lp*' prefixes for register lists that are common with
Gen8.
Don't add Xe only GSC registers to pre-Xe devices that don't
even have a GSC engine.
Fix Xe_LP name.
Don't use GEN9 as a prefix for register lists that contain all GEN8
registers.
Rename the
From: John Harrison
Remove 99% duplicated steered register list code. Also, include the
pre-Xe steered registers in the pre-Xe list generation.
Signed-off-by: John Harrison
Reviewed-by: Alan Previn
---
.../gpu/drm/i915/gt/uc/intel_guc_capture.c| 112 +-
1 file changed, 29
From: John Harrison
The GuC error capture list creation was including Gen8 registers on Xe
platforms. While fixing that, it was noticed that there were other
issues. The platform naming was wrong, the naming of lists was
misleading, the steered register code was duplicated and steered
registers
From: John Harrison
A pair of pre-Xe registers were being included in the Xe capture list.
GuC was rejecting those as being invalid and logging errors about
them. So, stop doing it.
Signed-off-by: John Harrison
Reviewed-by: Alan Previn
Fixes: dce2bd542337 ("drm/i915/guc: Add Gen9 registers
From: John Harrison
Rename the 'default_' register list prefix to 'gen8_' as that is the
more accurate name.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c | 14 +++---
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git
From: John Harrison
Dan Carpenter pointed out that 'err' was not being set in the case
where the GuC firmware version range check fails. Fix that.
Note that while this is bug fix for a previous patch (see Fixes tag
below). It is an exceedingly low risk bug. The range check is
asserting that the
From: John Harrison
If the DEBUG_GEM config option is set then escalate the 'unexpected
firmware version' message from a notice to an error. This will ensure
that the CI system treats such occurences as a failure and logs a bug
about it (or fails the pre-merge testing).
Signed-off-by: John
From: John Harrison
The validation of the firmware table was being done inside the code
for scanning the table for the next available firmware blob. Which is
unnecessary. So pull it out into a separate function that is only
called once per blob type at init time.
Also, drop the CONFIG_SELFTEST
From: John Harrison
Enhance the firmware table verification code to catch more potential
errors and to generally improve the code itself.
Track patch level version even on reduced version files to allow user
notification of missing bug fixes.
Detect another immediate failure case when loading
From: John Harrison
Explain another potential firmware failure mode and early exit the
long wait if hit.
Signed-off-by: John Harrison
Reviewed-by: Daniele Ceraolo Spurio
---
drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +
drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++
2
From: John Harrison
When reduced version firmware files were added (matching major
component being the only strict requirement), the minor version was
still tracked and a notification reported if it was older. However,
the patch version should really be tracked as well for the same
reasons. The
From: John Harrison
If the GuC load is taking an excessively long time, the wait loop
currently prints the GT frequency. Extend that to include the GuC
status as well so we can see if the GuC is actually making progress or
not.
Signed-off-by: John Harrison
Reviewed-by: Daniele Ceraolo Spurio
From: John Harrison
It was noticed that duplicate entries in the firmware table could cause
an infinite loop in the firmware loading code if that entry failed to
load. Duplicate entries are a bug anyway and so should never happen.
Ensure they don't by tweaking the table validation code to reject
From: John Harrison
In the past, There have been sporadic CTB failures which proved hard
to reproduce manually. The most effective solution was to dump the GuC
log at the point of failure and let the CI system do the repro. It is
preferable not to dump the GuC log via dmesg for all issues as it
From: John Harrison
This is useful for getting debug information out in certain
situations, such as failing kernel selftests and CI runs that don't
log error captures. It is especially useful for things like retrieving
GuC logs as GuC operation can't be tracked by adding printk or ftrace
From: John Harrison
Sometimes, the only effective way to debug an issue is to dump all the
interesting information at the point of failure. So add support for
doing that.
v2: Extra CONFIG wrapping (review feedback from Rodrigo)
Signed-off-by: John Harrison
John Harrison (2):
drm/i915:
From: John Harrison
If the GuC load is taking an excessively long time, the wait loop
currently prints the GT frequency. Extend that to include the GuC
status as well so we can see if the GuC is actually making progress or
not.
Signed-off-by: John Harrison
---
From: John Harrison
It was noticed that duplicte entries in the firmware table could cause
an infinite loop in the firmware loading code if that entry failed to
load. Duplicate entries are a bug anyway and so should never happen.
Ensure they don't by tweaking the table validation code to reject
From: John Harrison
When reduced version firmware files were added (matching major
component being the only strict requirement), the minor version was
still tracked and a notification reported if it was older. However,
the patch version should really be tracked as well for the same
reasons. The
From: John Harrison
Explain another potential firmware failure mode and early exit the
long wait if hit.
Signed-off-by: John Harrison
---
drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 1 +
drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 6 ++
2 files changed, 7 insertions(+)
diff
From: John Harrison
The validation of the firmware table was being done inside the code
for scanning the table for the next available firmware blob. Which is
unnecessary. Potentially, it should be a selftest. But either way, the
first step is pulling it out into a separate function that can be
From: John Harrison
Enhance the firmware table verification code to catch more potential
errors and to generally improve the code itself.
Track patch level version even on reduced version files to allow user
notification of missing bug fixes.
Detect another immediate failure case when loading
From: John Harrison
GuC based register dumps in error capture logs were basically broken
for virtual engines. This can be seen in igt@gem_exec_balancer@hang:
[IGT] gem_exec_balancer: starting subtest hang
[drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388]
[drm] GT0: GUC: No
From: John Harrison
This is useful for getting debug information out in certain
situations, such as failing kernel selftests and CI runs that don't
log error captures. It is especially useful for things like retrieving
GuC logs as GuC operation can't be tracked by adding printk or ftrace
From: John Harrison
In the past, There have been sporadic CTB failures which proved hard
to reproduce manually. The most effective solution was to dump the GuC
log at the point of failure and let the CI system do the repro. It is
preferable not to dump the GuC log via dmesg for all issues as it
1 - 100 of 472 matches
Mail list logo