commit b7f70762d1584a2b66e056412fd39ad6f6344c89 Author: Greg Kroah-Hartman Date: Sun Jan 16 09:15:39 2022 +0100 Linux 5.4.172 Link: https://lore.kernel.org/r/20220114081541.465841464@linuxfoundation.org Tested-by: Jon Hunter Tested-by: Florian Fainelli Tested-by: Shuah Khan Tested-by: Linux Kernel Functional Testing Tested-by: Sudip Mukherjee Tested-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit f415409551b04f6ef9f0b9d70745271edebd7203 Author: Arnd Bergmann Date: Thu Dec 9 12:51:42 2021 -0700 staging: greybus: fix stack size warning with UBSAN commit 144779edf598e0896302c35a0926ef0b68f17c4b upstream. clang warns about excessive stack usage in this driver when UBSAN is enabled: drivers/staging/greybus/audio_topology.c:977:12: error: stack frame size of 1836 bytes in function 'gbaudio_tplg_create_widget' [-Werror,-Wframe-larger-than=] Rework this code to no longer use compound literals for initializing the structure in each case, but instead keep the common bits in a preallocated constant array and copy them as needed. Link: https://github.com/ClangBuiltLinux/linux/issues/1535 Link: https://lore.kernel.org/r/20210103223541.2790855-1-arnd@kernel.org/ Reviewed-by: Nick Desaulniers Reviewed-by: Alex Elder Signed-off-by: Arnd Bergmann [nathan: Address review comments from v1] Signed-off-by: Nathan Chancellor Link: https://lore.kernel.org/r/20211209195141.1165233-1-nathan@kernel.org Signed-off-by: Greg Kroah-Hartman commit 65c2e7176f77597a3ecded95d7b126f901c57ee1 Author: Nathan Chancellor Date: Thu Oct 14 14:19:16 2021 -0700 drm/i915: Avoid bitwise vs logical OR warning in snb_wm_latency_quirk() commit 2e70570656adfe1c5d9a29940faa348d5f132199 upstream. A new warning in clang points out a place in this file where a bitwise OR is being used with boolean types: drivers/gpu/drm/i915/intel_pm.c:3066:12: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical] changed = ilk_increase_wm_latency(dev_priv, dev_priv->wm.pri_latency, 12) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This construct is intentional, as it allows every one of the calls to ilk_increase_wm_latency() to occur (instead of short circuiting with logical OR) while still caring about the result of each call. To make this clearer to the compiler, use the '|=' operator to assign the result of each ilk_increase_wm_latency() call to changed, which keeps the meaning of the code the same but makes it obvious that every one of these calls is expected to happen. Link: https://github.com/ClangBuiltLinux/linux/issues/1473 Reported-by: Nick Desaulniers Signed-off-by: Nathan Chancellor Suggested-by: Dávid Bolvanský Reviewed-by: Nick Desaulniers Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20211014211916.3550122-1-nathan@kernel.org Signed-off-by: Greg Kroah-Hartman commit 86ded7a6cf400d13dfa677ad42985e31dd14416c Author: Nathan Chancellor Date: Thu Oct 14 14:57:03 2021 -0700 staging: wlan-ng: Avoid bitwise vs logical OR warning in hfa384x_usb_throttlefn() commit 502408a61f4b7eb4713f44bd77f4a48e6cb1b59a upstream. A new warning in clang points out a place in this file where a bitwise OR is being used with boolean expressions: In file included from drivers/staging/wlan-ng/prism2usb.c:2: drivers/staging/wlan-ng/hfa384x_usb.c:3787:7: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical] ((test_and_clear_bit(THROTTLE_RX, &hw->usb_flags) && ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/staging/wlan-ng/hfa384x_usb.c:3787:7: note: cast one or both operands to int to silence this warning 1 warning generated. The comment explains that short circuiting here is undesirable, as the calls to test_and_{clear,set}_bit() need to happen for both sides of the expression. Clang's suggestion would work to silence the warning but the readability of the expression would suffer even more. To clean up the warning and make the block more readable, use a variable for each side of the bitwise expression. Link: https://github.com/ClangBuiltLinux/linux/issues/1478 Signed-off-by: Nathan Chancellor Link: https://lore.kernel.org/r/20211014215703.3705371-1-nathan@kernel.org Signed-off-by: Greg Kroah-Hartman commit a459686f986ca8b5dba98d7e539f7d4ffedc6cec Author: Ricardo Ribalda Date: Tue Dec 7 01:38:37 2021 +0100 media: Revert "media: uvcvideo: Set unique vdev name based in type" commit f66dcb32af19faf49cc4a9222c3152b10c6ec84a upstream. A lot of userspace depends on a descriptive name for vdev. Without this patch, users have a hard time figuring out which camera shall they use for their video conferencing. This reverts commit e3f60e7e1a2b451f538f9926763432249bcf39c4. Link: https://lore.kernel.org/linux-media/20211207003840.1212374-2-ribalda@chromium.org Cc: Fixes: e3f60e7e1a2b ("media: uvcvideo: Set unique vdev name based in type") Reported-by: Nicolas Dufresne Signed-off-by: Ricardo Ribalda Reviewed-by: Laurent Pinchart Reviewed-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 7e07bedae159d8b304238b434e45128f4b640bdf Author: Dominik Brodowski Date: Wed Dec 29 22:10:03 2021 +0100 random: fix crash on multiple early calls to add_bootloader_randomness() commit f7e67b8e803185d0aabe7f29d25a35c8be724a78 upstream. Currently, if CONFIG_RANDOM_TRUST_BOOTLOADER is enabled, multiple calls to add_bootloader_randomness() are broken and can cause a NULL pointer dereference, as noted by Ivan T. Ivanov. This is not only a hypothetical problem, as qemu on arm64 may provide bootloader entropy via EFI and via devicetree. On the first call to add_hwgenerator_randomness(), crng_fast_load() is executed, and if the seed is long enough, crng_init will be set to 1. On subsequent calls to add_bootloader_randomness() and then to add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead, wait_event_interruptible() and then credit_entropy_bits() will be called. If the entropy count for that second seed is large enough, that proceeds to crng_reseed(). However, both wait_event_interruptible() and crng_reseed() depends (at least in numa_crng_init()) on workqueues. Therefore, test whether system_wq is already initialized, which is a sufficient indicator that workqueue_init_early() has progressed far enough. If we wind up hitting the !system_wq case, we later want to do what would have been done there when wqs are up, so set a flag, and do that work later from the rand_initialize() call. Reported-by: Ivan T. Ivanov Fixes: 18b915ac6b0a ("efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness") Cc: stable@vger.kernel.org Signed-off-by: Dominik Brodowski [Jason: added crng_need_done state and related logic.] Signed-off-by: Jason A. Donenfeld Signed-off-by: Greg Kroah-Hartman commit 517ab153f50369345d62268edadac8b87208d60c Author: Eric Biggers Date: Mon Dec 20 16:41:57 2021 -0600 random: fix data race on crng init time commit 009ba8568be497c640cab7571f7bfd18345d7b24 upstream. _extract_crng() does plain loads of crng->init_time and crng_global_init_time, which causes undefined behavior if crng_reseed() and RNDRESEEDCRNG modify these corrently. Use READ_ONCE() and WRITE_ONCE() to make the behavior defined. Don't fix the race on crng->init_time by protecting it with crng->lock, since it's not a problem for duplicate reseedings to occur. I.e., the lockless access with READ_ONCE() is fine. Fixes: d848e5f8e1eb ("random: add new ioctl RNDRESEEDCRNG") Fixes: e192be9d9a30 ("random: replace non-blocking pool with a Chacha20-based CRNG") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Acked-by: Paul E. McKenney Signed-off-by: Jason A. Donenfeld Signed-off-by: Greg Kroah-Hartman commit 90ceecdaa062fd8b6f02a39317e8a6df559aaf6b Author: Eric Biggers Date: Mon Dec 20 16:41:56 2021 -0600 random: fix data race on crng_node_pool commit 5d73d1e320c3fd94ea15ba5f79301da9a8bcc7de upstream. extract_crng() and crng_backtrack_protect() load crng_node_pool with a plain load, which causes undefined behavior if do_numa_crng_init() modifies it concurrently. Fix this by using READ_ONCE(). Note: as per the previous discussion https://lore.kernel.org/lkml/20211219025139.31085-1-ebiggers@kernel.org/T/#u, READ_ONCE() is believed to be sufficient here, and it was requested that it be used here instead of smp_load_acquire(). Also change do_numa_crng_init() to set crng_node_pool using cmpxchg_release() instead of mb() + cmpxchg(), as the former is sufficient here but is more lightweight. Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Acked-by: Paul E. McKenney Signed-off-by: Jason A. Donenfeld Signed-off-by: Greg Kroah-Hartman commit a4fa4377c91b557d1ce9977d5321270bbd38ff63 Author: Brian Silverman Date: Wed Jan 5 16:29:50 2022 -0800 can: gs_usb: gs_can_start_xmit(): zero-initialize hf->{flags,reserved} commit 89d58aebe14a365c25ba6645414afdbf4e41cea4 upstream. No information is deliberately sent in hf->flags in host -> device communications, but the open-source candleLight firmware echoes it back, which can result in the GS_CAN_FLAG_OVERFLOW flag being set and generating spurious ERRORFRAMEs. While there also initialize the reserved member with 0. Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices") Link: https://lore.kernel.org/all/20220106002952.25883-1-brian.silverman@bluerivertech.com Link: https://github.com/candle-usb/candleLight_fw/issues/87 Cc: stable@vger.kernel.org Signed-off-by: Brian Silverman [mkl: initialize the reserved member, too] Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit e90a7524b5c8f6285908928272cf675e884acb25 Author: Marc Kleine-Budde Date: Fri Dec 10 10:03:09 2021 +0100 can: gs_usb: fix use of uninitialized variable, detach device on reception of invalid USB data commit 4a8737ff068724f509d583fef404d349adba80d6 upstream. The received data contains the channel the received data is associated with. If the channel number is bigger than the actual number of channels assume broken or malicious USB device and shut it down. This fixes the error found by clang: | drivers/net/can/usb/gs_usb.c:386:6: error: variable 'dev' is used | uninitialized whenever 'if' condition is true | if (hf->channel >= GS_MAX_INTF) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | drivers/net/can/usb/gs_usb.c:474:10: note: uninitialized use occurs here | hf, dev->gs_hf_size, gs_usb_receive_bulk_callback, | ^~~ Link: https://lore.kernel.org/all/20211210091158.408326-1-mkl@pengutronix.de Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices") Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 9e9241d3345af3f2a78a5b60701a9cf0d15bf942 Author: Joe Perches Date: Wed Sep 16 13:40:39 2020 -0700 drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions commit aa838896d87af561a33ecefea1caa4c15a68bc47 upstream. Convert the various sprintf fmaily calls in sysfs device show functions to sysfs_emit and sysfs_emit_at for PAGE_SIZE buffer safety. Done with: $ spatch -sp-file sysfs_emit_dev.cocci --in-place --max-width=80 . And cocci script: $ cat sysfs_emit_dev.cocci @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - sprintf(buf, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... return - strcpy(buf, chr); + sysfs_emit(buf, chr); ...> } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - sprintf(buf, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - snprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... len = - scnprintf(buf, PAGE_SIZE, + sysfs_emit(buf, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; identifier len; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { <... - len += scnprintf(buf + len, PAGE_SIZE - len, + len += sysfs_emit_at(buf, len, ...); ...> return len; } @@ identifier d_show; identifier dev, attr, buf; expression chr; @@ ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf) { ... - strcpy(buf, chr); - return strlen(buf); + return sysfs_emit(buf, chr); } Signed-off-by: Joe Perches Link: https://lore.kernel.org/r/3d033c33056d88bbe34d4ddb62afd05ee166ab9a.1600285923.git.joe@perches.com Cc: Lee Jones Signed-off-by: Greg Kroah-Hartman commit ada3805f1423954db0bacbd140a42557dec622dd Author: Andy Shevchenko Date: Mon Nov 1 21:00:08 2021 +0200 mfd: intel-lpss: Fix too early PM enablement in the ACPI ->probe() commit c9e143084d1a602f829115612e1ec79df3727c8b upstream. The runtime PM callback may be called as soon as the runtime PM facility is enabled and activated. It means that ->suspend() may be called before we finish probing the device in the ACPI case. Hence, NULL pointer dereference: intel-lpss INT34BA:00: IRQ index 0 not found BUG: kernel NULL pointer dereference, address: 0000000000000030 ... Workqueue: pm pm_runtime_work RIP: 0010:intel_lpss_suspend+0xb/0x40 [intel_lpss] To fix this, first try to register the device and only after that enable runtime PM facility. Fixes: 4b45efe85263 ("mfd: Add support for Intel Sunrisepoint LPSS devices") Reported-by: Orlando Chamberlain Reported-by: Aditya Garg Signed-off-by: Andy Shevchenko Tested-by: Aditya Garg Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20211101190008.86473-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit d08a0a88db88462bdf1b2df33ded23269b832afc Author: Daniel Borkmann Date: Thu Jan 6 01:46:06 2022 +0100 veth: Do not record rx queue hint in veth_xmit commit 710ad98c363a66a0cd8526465426c5c5f8377ee0 upstream. Laurent reported that they have seen a significant amount of TCP retransmissions at high throughput from applications residing in network namespaces talking to the outside world via veths. The drops were seen on the qdisc layer (fq_codel, as per systemd default) of the phys device such as ena or virtio_net due to all traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the setup was _not_ using XDP on veths as the issue is generic.) More specifically, after edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence") which made it all the way back to v4.19.184+, skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX queue by default for veths) instead of leaving at 0. This is eventually retained and callbacks like ena_select_queue() will also pick single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic is forwarded to that device via upper stack or other means. Similarly, for others not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well. In general, it is a _bad_ idea for virtual devices like veth to mess around with queue selection [by default]. Given dev->real_num_tx_queues is by default 1, the skb->queue_mapping was left untouched, and so prior to edbea9220251 the netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device. Unbreak this and restore prior behavior by removing the skb_record_rx_queue() from veth_xmit() altogether. If the veth peer has an XDP program attached, then it would return the first RX queue index in xdp_md->rx_queue_index (unless configured in non-default manner). However, this is still better than breaking the generic case. Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence") Fixes: 638264dc9022 ("veth: Support per queue XDP ring") Reported-by: Laurent Bernaille Signed-off-by: Daniel Borkmann Cc: Maciej Fijalkowski Cc: Toshiaki Makita Cc: Eric Dumazet Cc: Paolo Abeni Cc: John Fastabend Cc: Willem de Bruijn Acked-by: John Fastabend Reviewed-by: Eric Dumazet Acked-by: Toshiaki Makita Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a6722b497401e514eb7952738a3f99736ecb5e5b Author: Adrian Hunter Date: Wed Nov 24 11:48:50 2021 +0200 mmc: sdhci-pci: Add PCI ID for Intel ADL commit e53e97f805cb1abeea000a61549d42f92cb10804 upstream. Add PCI ID for Intel ADL eMMC host controller. Signed-off-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211124094850.1783220-1-adrian.hunter@intel.com Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman commit 1199f0928488199c30ade617cecccdf606a92fd2 Author: Alan Stern Date: Fri Dec 31 21:07:12 2021 -0500 USB: Fix "slab-out-of-bounds Write" bug in usb_hcd_poll_rh_status commit 1d7d4c07932e04355d6e6528d44a2f2c9e354346 upstream. When the USB core code for getting root-hub status reports was originally written, it was assumed that the hub driver would be its only caller. But this isn't true now; user programs can use usbfs to communicate with root hubs and get status reports. When they do this, they may use a transfer_buffer that is smaller than the data returned by the HCD, which will lead to a buffer overflow error when usb_hcd_poll_rh_status() tries to store the status data. This was discovered by syzbot: BUG: KASAN: slab-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline] BUG: KASAN: slab-out-of-bounds in usb_hcd_poll_rh_status+0x5f4/0x780 drivers/usb/core/hcd.c:776 Write of size 2 at addr ffff88801da403c0 by task syz-executor133/4062 This patch fixes the bug by reducing the amount of status data if it won't fit in the transfer_buffer. If some data gets discarded then the URB's completion status is set to -EOVERFLOW rather than 0, to let the user know what happened. Reported-and-tested-by: syzbot+3ae6a2b06f131ab9849f@syzkaller.appspotmail.com Signed-off-by: Alan Stern Cc: Link: https://lore.kernel.org/r/Yc+3UIQJ2STbxNua@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman commit 43aac50196f3214def6a313e65079babb5c7f047 Author: Alan Stern Date: Sat Jan 1 14:52:14 2022 -0500 USB: core: Fix bug in resuming hub's handling of wakeup requests commit 0f663729bb4afc92a9986b66131ebd5b8a9254d1 upstream. Bugzilla #213839 reports a 7-port hub that doesn't work properly when devices are plugged into some of the ports; the kernel goes into an unending disconnect/reinitialize loop as shown in the bug report. This "7-port hub" comprises two four-port hubs with one plugged into the other; the failures occur when a device is plugged into one of the downstream hub's ports. (These hubs have other problems too. For example, they bill themselves as USB-2.0 compliant but they only run at full speed.) It turns out that the failures are caused by bugs in both the kernel and the hub. The hub's bug is that it reports a different bmAttributes value in its configuration descriptor following a remote wakeup (0xe0 before, 0xc0 after -- the wakeup-support bit has changed). The kernel's bug is inside the hub driver's resume handler. When hub_activate() sees that one of the hub's downstream ports got a wakeup request from a child device, it notes this fact by setting the corresponding bit in the hub->change_bits variable. But this variable is meant for connection changes, not wakeup events; setting it causes the driver to believe the downstream port has been disconnected and then connected again (in addition to having received a wakeup request). Because of this, the hub driver then tries to check whether the device currently plugged into the downstream port is the same as the device that had been attached there before. Normally this check succeeds and wakeup handling continues with no harm done (which is why the bug remained undetected until now). But with these dodgy hubs, the check fails because the config descriptor has changed. This causes the hub driver to reinitialize the child device, leading to the disconnect/reinitialize loop described in the bug report. The proper way to note reception of a downstream wakeup request is to set a bit in the hub->event_bits variable instead of hub->change_bits. That way the hub driver will realize that something has happened to the port but will not think the port and child device have been disconnected. This patch makes that change. Cc: Tested-by: Jonathan McDowell Signed-off-by: Alan Stern Link: https://lore.kernel.org/r/YdCw7nSfWYPKWQoD@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman commit ed5c2683b67b1762c60cc9afc089edc53a5f816d Author: Johan Hovold Date: Mon Oct 25 13:39:44 2021 +0200 Bluetooth: bfusb: fix division by zero in send path commit b5e6fa7a12572c82f1e7f2f51fbb02a322291291 upstream. Add the missing bulk-out endpoint sanity check to probe() to avoid division by zero in bfusb_send_frame() in case a malicious device has broken descriptors (or when doing descriptor fuzz testing). Note that USB core will reject URBs submitted for endpoints with zero wMaxPacketSize but that drivers doing packet-size calculations still need to handle this (cf. commit 2548288b4fb0 ("USB: Fix: Don't skip endpoint descriptors with maxpacket=0")). Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: Marcel Holtmann Signed-off-by: Greg Kroah-Hartman commit 784e873af3dc69b61c80f0ce7d99bb7a1bdc3100 Author: Mark-YW.Chen Date: Thu Oct 14 00:22:04 2021 +0800 Bluetooth: btusb: fix memory leak in btusb_mtk_submit_wmt_recv_urb() commit 60c6a63a3d3080a62f3e0e20084f58dbeff16748 upstream. Driver should free `usb->setup_packet` to avoid the leak. $ cat /sys/kernel/debug/kmemleak unreferenced object 0xffffffa564a58080 (size 128): backtrace: [<000000007eb8dd70>] kmem_cache_alloc_trace+0x22c/0x384 [<000000008a44191d>] btusb_mtk_hci_wmt_sync+0x1ec/0x994 [btusb] [<00000000ca7189a3>] btusb_mtk_setup+0x6b8/0x13cc [btusb] [<00000000c6105069>] hci_dev_do_open+0x290/0x974 [bluetooth] [<00000000a583f8b8>] hci_power_on+0xdc/0x3cc [bluetooth] [<000000005d80e687>] process_one_work+0x514/0xc80 [<00000000f4d57637>] worker_thread+0x818/0xd0c [<00000000dc7bdb55>] kthread+0x2f8/0x3b8 [<00000000f9999513>] ret_from_fork+0x10/0x30 Fixes: a1c49c434e150 ("Bluetooth: btusb: Add protocol support for MediaTek MT7668U USB devices") Signed-off-by: Mark-YW.Chen Signed-off-by: Marcel Holtmann Signed-off-by: Greg Kroah-Hartman commit ad07b60837b24346a9bb3bf018da5a6926c22a53 Author: Frederic Weisbecker Date: Wed Dec 1 16:19:44 2021 +0100 workqueue: Fix unbind_workers() VS wq_worker_running() race commit 07edfece8bcb0580a1828d939e6f8d91a8603eb2 upstream. At CPU-hotplug time, unbind_worker() may preempt a worker while it is waking up. In that case the following scenario can happen: unbind_workers() wq_worker_running() -------------- ------------------- if (!(worker->flags & WORKER_NOT_RUNNING)) //PREEMPTED by unbind_workers worker->flags |= WORKER_UNBOUND; [...] atomic_set(&pool->nr_running, 0); //resume to worker atomic_inc(&worker->pool->nr_running); After unbind_worker() resets pool->nr_running, the value is expected to remain 0 until the pool ever gets rebound in case cpu_up() is called on the target CPU in the future. But here the race leaves pool->nr_running with a value of 1, triggering the following warning when the worker goes idle: WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0 Modules linked in: CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014 Workqueue: 0x0 (rcu_par_gp) RIP: 0010:worker_enter_idle+0x95/0xc0 Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93 0 RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086 RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140 RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080 R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20 R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140 FS: 0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: worker_thread+0x89/0x3d0 ? process_one_work+0x400/0x400 kthread+0x162/0x190 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x22/0x30 Also due to this incorrect "nr_running == 1", further queued work may end up not being served, because no worker is awaken at work insert time. This raises rcutorture writer stalls for example. Fix this with disabling preemption in the right place in wq_worker_running(). It's worth noting that if the worker migrates and runs concurrently with unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update due to set_cpus_allowed_ptr() acquiring/releasing rq->lock. Fixes: 6d25be5782e4 ("sched/core, workqueues: Distangle worker accounting from rq lock") Reviewed-by: Lai Jiangshan Tested-by: Paul E. McKenney Acked-by: Peter Zijlstra (Intel) Signed-off-by: Frederic Weisbecker Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Sebastian Andrzej Siewior Cc: Daniel Bristot de Oliveira Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman