Bug 216528
Summary: | commit ed589c7d6485b0f4bacd1c4fb385b79176a33a73 leads to silent hang on boot for MSI Laptop | ||
---|---|---|---|
Product: | Drivers | Reporter: | spasswolf |
Component: | Other | Assignee: | drivers_other |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | apmbugs |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | next-20220923 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | kernel config |
Description
spasswolf
2022-09-25 10:10:51 UTC
Created attachment 301871 [details]
kernel config
Unfortunately the reversion may have introduced a new bug which leads to a hang of the system without traces in the logs. Hang seems to occur after 15min to 1h and under higher graphical load. Could this be a temperature issue caused by the reversion? This is a locking issue, the following patch made the kernel boot again diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 9b27211b806f..9179989fe920 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1154,7 +1154,7 @@ int thermal_zone_get_crit_temp(struct thermal_zone_device *tz, int *temp) if (!tz->trips) return -EINVAL; - mutex_lock(&tz->lock); + //mutex_lock(&tz->lock); for (i = 0; i < tz->num_trips; i++) { if (tz->trips[i].type == THERMAL_TRIP_CRITICAL) { @@ -1165,7 +1165,7 @@ int thermal_zone_get_crit_temp(struct thermal_zone_device *tz, int *temp) ret = -EINVAL; out: - mutex_unlock(&tz->lock); + //mutex_unlock(&tz->lock); return ret; } @@ -1202,9 +1202,9 @@ int thermal_zone_get_trip(struct thermal_zone_device *tz, int trip_id, { int ret; - mutex_lock(&tz->lock); + //mutex_lock(&tz->lock); ret = __thermal_zone_get_trip(tz, trip_id, trip); - mutex_unlock(&tz->lock); + //mutex_unlock(&tz->lock); return ret; } @@ -1216,7 +1216,7 @@ int thermal_zone_set_trip(struct thermal_zone_device *tz, int trip_id, struct thermal_trip t; int ret = -EINVAL; - mutex_lock(&tz->lock); + //mutex_lock(&tz->lock); if (!tz->ops->set_trip_temp && !tz->ops->set_trip_hyst && !tz->trips) goto out; @@ -1244,7 +1244,7 @@ int thermal_zone_set_trip(struct thermal_zone_device *tz, int trip_id, tz->trips[trip_id] = *trip; out: - mutex_unlock(&tz->lock); + //mutex_unlock(&tz->lock); if (!ret) { thermal_notify_tz_trip_change(tz->id, trip_id, trip->type, This is the important part, which removes the hang: @@ -1202,9 +1202,9 @@ int thermal_zone_get_trip(struct thermal_zone_device *tz, int trip_id, { int ret; - mutex_lock(&tz->lock); + //mutex_lock(&tz->lock); ret = __thermal_zone_get_trip(tz, trip_id, trip); - mutex_unlock(&tz->lock); + //mutex_unlock(&tz->lock); return ret; } CC'ing Stephen Rothwell ACPI thermal zone is the problem, setting CONFIG_ACPI_THERMAL_ZONE=n make the kernel bootable again. CONFIG_ACPI_THERMAL=m postpones the hang until the module is loaded (3s into boot instead of 0.3s with =y). The hang when booting is fixed by the changes in linux-20220927, but the instability continues. After about 15min the system locks with capslock LED flashing. As the said instability also occurs when I revert the thermal patches it is caused by something else ... |