commit 545aecd229613138d6db54fb2b5221faca10137f Author: Greg Kroah-Hartman Date: Sat Jul 2 16:41:19 2022 +0200 Linux 5.15.52 Link: https://lore.kernel.org/r/20220630133232.926711493@linuxfoundation.org Tested-by: Jon Hunter Tested-by: Shuah Khan Tested-by: Florian Fainelli Tested-by: Guenter Roeck Tested-by: Bagas Sanjaya Tested-by: Linux Kernel Functional Testing Tested-by: Sudip Mukherjee Tested-by: Christian Brauner (Microsoft) Tested-by: Ron Economos Signed-off-by: Greg Kroah-Hartman commit ea512d540a55a75058ee4b78abfd1d499b229d1d Author: Pavel Begunkov Date: Thu Jun 9 08:34:35 2022 +0100 io_uring: fix not locked access to fixed buf table commit 05b538c1765f8d14a71ccf5f85258dcbeaf189f7 upstream. We can look inside the fixed buffer table only while holding ->uring_lock, however in some cases we don't do the right async prep for IORING_OP_{WRITE,READ}_FIXED ending up with NULL req->imu forcing making an io-wq worker to try to resolve the fixed buffer without proper locking. Move req->imu setup into early req init paths, i.e. io_prep_rw(), which is called unconditionally for rw requests and under uring_lock. Fixes: 634d00df5e1cf ("io_uring: add full-fledged dynamic buffers support") Signed-off-by: Pavel Begunkov Signed-off-by: Greg Kroah-Hartman commit 5696f7983d5d0dc070b4c0c07969d528aa553827 Author: Vladimir Oltean Date: Tue Jun 28 20:20:15 2022 +0300 net: mscc: ocelot: allow unregistered IP multicast flooding to CPU Since commit 4cf35a2b627a ("net: mscc: ocelot: fix broken IP multicast flooding") from v5.12, unregistered IP multicast flooding is configurable in the ocelot driver for bridged ports. However, by writing 0 to the PGID_MCIPV4 and PGID_MCIPV6 port masks at initialization time, the CPU port module, for which ocelot_port_set_mcast_flood() is not called, will have unknown IP multicast flooding disabled. This makes it impossible for an application such as smcroute to work properly, since all IP multicast traffic received on a standalone port is treated as unregistered (and dropped). Starting with commit 7569459a52c9 ("net: dsa: manage flooding on the CPU ports"), the limitation above has been lifted, because when standalone ports become IFF_PROMISC or IFF_ALLMULTI, ocelot_port_set_mcast_flood() would be called on the CPU port module, so unregistered multicast is flooded to the CPU on an as-needed basis. But between v5.12 and v5.18, IP multicast flooding to the CPU has remained broken, promiscuous or not. Delete the inexplicable premature optimization of clearing PGID_MCIPV4 and PGID_MCIPV6 as part of the init sequence, and allow unregistered IP multicast to be flooded freely to the CPU port module. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Cc: stable@kernel.org Signed-off-by: Vladimir Oltean Signed-off-by: Greg Kroah-Hartman commit 810962c794178b44c2c29d61d51e2e62d61d0218 Author: Ping-Ke Shih Date: Thu Jan 6 20:47:39 2022 -0600 rtw88: rtw8821c: enable rfe 6 devices commit e109e3617e5d563b431a52e6e2f07f0fc65a93ae upstream. Ping-Ke Shih answered[1] a question for a user about an rtl8821ce device that reported RFE 6, which the driver did not support. Ping-Ke suggested a possible fix, but the user never reported back. A second user discovered the above thread and tested the proposed fix. Accordingly, I am pushing this change, even though I am not the author. [1] https://lore.kernel.org/linux-wireless/3f5e2f6eac344316b5dd518ebfea2f95@realtek.com/ Signed-off-by: Ping-Ke Shih Reported-and-tested-by: masterzorag Signed-off-by: Larry Finger Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20220107024739.20967-1-Larry.Finger@lwfinger.net Signed-off-by: Meng Tang Signed-off-by: Greg Kroah-Hartman commit d52f1c58882477e1ed7dfb99c95fa3fd56936035 Author: Guo-Feng Fan Date: Wed Sep 22 10:36:36 2021 +0800 rtw88: 8821c: support RFE type4 wifi NIC commit b789e3fe7047296be0ccdbb7ceb0b58856053572 upstream. RFE type4 is a new NIC which has one RF antenna shares with BT. RFE type4 HW is the same as RFE type2 but attaching antenna to aux antenna connector. RFE type2 attach antenna to main antenna connector. Load the same parameter as RFE type2 when initializing NIC. Signed-off-by: Guo-Feng Fan Signed-off-by: Ping-Ke Shih Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20210922023637.9357-1-pkshih@realtek.com Signed-off-by: Meng Tang Signed-off-by: Greg Kroah-Hartman commit e8d4878dcd00239c54b089314c625477c5a20a7f Author: Christian Brauner Date: Tue Jun 28 14:16:20 2022 +0200 fs: account for group membership commit 168f912893407a5acb798a4a58613b5f1f98c717 upstream. When calling setattr_prepare() to determine the validity of the attributes the ia_{g,u}id fields contain the value that will be written to inode->i_{g,u}id. This is exactly the same for idmapped and non-idmapped mounts and allows callers to pass in the values they want to see written to inode->i_{g,u}id. When group ownership is changed a caller whose fsuid owns the inode can change the group of the inode to any group they are a member of. When searching through the caller's groups we need to use the gid mapped according to the idmapped mount otherwise we will fail to change ownership for unprivileged users. Consider a caller running with fsuid and fsgid 1000 using an idmapped mount that maps id 65534 to 1000 and 65535 to 1001. Consequently, a file owned by 65534:65535 in the filesystem will be owned by 1000:1001 in the idmapped mount. The caller now requests the gid of the file to be changed to 1000 going through the idmapped mount. In the vfs we will immediately map the requested gid to the value that will need to be written to inode->i_gid and place it in attr->ia_gid. Since this idmapped mount maps 65534 to 1000 we place 65534 in attr->ia_gid. When we check whether the caller is allowed to change group ownership we first validate that their fsuid matches the inode's uid. The inode->i_uid is 65534 which is mapped to uid 1000 in the idmapped mount. Since the caller's fsuid is 1000 we pass the check. We now check whether the caller is allowed to change inode->i_gid to the requested gid by calling in_group_p(). This will compare the passed in gid to the caller's fsgid and search the caller's additional groups. Since we're dealing with an idmapped mount we need to pass in the gid mapped according to the idmapped mount. This is akin to checking whether a caller is privileged over the future group the inode is owned by. And that needs to take the idmapped mount into account. Note, all helpers are nops without idmapped mounts. New regression test sent to xfstests. Link: https://github.com/lxc/lxd/issues/10537 Link: https://lore.kernel.org/r/20220613111517.2186646-1-brauner@kernel.org Fixes: 2f221d6f7b88 ("attr: handle idmapped mounts") Cc: Seth Forshee Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Al Viro Cc: stable@vger.kernel.org # 5.15+ CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit dc85bc24fbf13d2ef36d41da970776c4f94b68dd Author: Christian Brauner Date: Tue Jun 28 14:16:19 2022 +0200 fs: fix acl translation commit 705191b03d507744c7e097f78d583621c14988ac upstream. Last cycle we extended the idmapped mounts infrastructure to support idmapped mounts of idmapped filesystems (No such filesystem yet exist.). Since then, the meaning of an idmapped mount is a mount whose idmapping is different from the filesystems idmapping. While doing that work we missed to adapt the acl translation helpers. They still assume that checking for the identity mapping is enough. But they need to use the no_idmapping() helper instead. Note, POSIX ACLs are always translated right at the userspace-kernel boundary using the caller's current idmapping and the initial idmapping. The order depends on whether we're coming from or going to userspace. The filesystem's idmapping doesn't matter at the border. Consequently, if a non-idmapped mount is passed we need to make sure to always pass the initial idmapping as the mount's idmapping and not the filesystem idmapping. Since it's irrelevant here it would yield invalid ids and prevent setting acls for filesystems that are mountable in a userns and support posix acls (tmpfs and fuse). I verified the regression reported in [1] and verified that this patch fixes it. A regression test will be added to xfstests in parallel. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1] Fixes: bd303368b776 ("fs: support mapped mounts of mapped filesystems") Cc: Seth Forshee Cc: Christoph Hellwig Cc: # 5.15+ Cc: Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Linus Torvalds Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit 38753e9173a5903e902c856b41fb325762bf5945 Author: Christian Brauner Date: Tue Jun 28 14:16:18 2022 +0200 fs: support mapped mounts of mapped filesystems commit bd303368b776eead1c29e6cdda82bde7128b82a7 upstream. In previous patches we added new and modified existing helpers to handle idmapped mounts of filesystems mounted with an idmapping. In this final patch we convert all relevant places in the vfs to actually pass the filesystem's idmapping into these helpers. With this the vfs is in shape to handle idmapped mounts of filesystems mounted with an idmapping. Note that this is just the generic infrastructure. Actually adding support for idmapped mounts to a filesystem mountable with an idmapping is follow-up work. In this patch we extend the definition of an idmapped mount from a mount that that has the initial idmapping attached to it to a mount that has an idmapping attached to it which is not the same as the idmapping the filesystem was mounted with. As before we do not allow the initial idmapping to be attached to a mount. In addition this patch prevents that the idmapping the filesystem was mounted with can be attached to a mount created based on this filesystem. This has multiple reasons and advantages. First, attaching the initial idmapping or the filesystem's idmapping doesn't make much sense as in both cases the values of the i_{g,u}id and other places where k{g,u}ids are used do not change. Second, a user that really wants to do this for whatever reason can just create a separate dedicated identical idmapping to attach to the mount. Third, we can continue to use the initial idmapping as an indicator that a mount is not idmapped allowing us to continue to keep passing the initial idmapping into the mapping helpers to tell them that something isn't an idmapped mount even if the filesystem is mounted with an idmapping. Link: https://lore.kernel.org/r/20211123114227.3124056-11-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-11-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-11-brauner@kernel.org Cc: Seth Forshee Cc: Amir Goldstein Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit 968e66f8ff70285744ac506cd5668223d5eebe41 Author: Christian Brauner Date: Tue Jun 28 14:16:17 2022 +0200 fs: add i_user_ns() helper commit a1ec9040a2a9122605ac26e5725c6de019184419 upstream. Since we'll be passing the filesystem's idmapping in even more places in the following patches and we do already dereference struct inode to get to the filesystem's idmapping multiple times add a tiny helper. Link: https://lore.kernel.org/r/20211123114227.3124056-10-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-10-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-10-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit 21c6c720be75049913336099c33129089497d7cb Author: Christian Brauner Date: Tue Jun 28 14:16:16 2022 +0200 fs: port higher-level mapping helpers commit 209188ce75d0d357c292f6bb81d712acdd4e7db7 upstream. Enable the mapped_fs{g,u}id() helpers to support filesystems mounted with an idmapping. Apart from core mapping helpers that use mapped_fs{g,u}id() to initialize struct inode's i_{g,u}id fields xfs is the only place that uses these low-level helpers directly. The patch only extends the helpers to be able to take the filesystem idmapping into account. Since we don't actually yet pass the filesystem's idmapping in no functional changes happen. This will happen in a final patch. Link: https://lore.kernel.org/r/20211123114227.3124056-9-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-9-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-9-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit 7d0536a8fab719544db98e58ad49b9a07332922d Author: Christian Brauner Date: Tue Jun 28 14:16:15 2022 +0200 fs: remove unused low-level mapping helpers commit 02e4079913500f24ceb082d8d87d8665f044b298 upstream. Now that we ported all places to use the new low-level mapping helpers that are able to support filesystems mounted with an idmapping we can remove the old low-level mapping helpers. With the removal of these old helpers we also conclude the renaming of the mapping helpers we started in commit a65e58e791a1 ("fs: document and rename fsid helpers"). Link: https://lore.kernel.org/r/20211123114227.3124056-8-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-8-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-8-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit f895d0ff47be88e950e526e757678ef6392c958f Author: Christian Brauner Date: Tue Jun 28 14:16:14 2022 +0200 fs: use low-level mapping helpers commit 4472071331549e911a5abad41aea6e3be855a1a4 upstream. In a few places the vfs needs to interact with bare k{g,u}ids directly instead of struct inode. These are just a few. In previous patches we introduced low-level mapping helpers that are able to support filesystems mounted an idmapping. This patch simply converts the places to use these new helpers. Link: https://lore.kernel.org/r/20211123114227.3124056-7-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-7-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-7-brauner@kernel.org Cc: Seth Forshee Cc: Amir Goldstein Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit 1c62e0186d947b3b04a78a8f3750c6446c01c839 Author: Christian Brauner Date: Tue Jun 28 14:16:13 2022 +0200 docs: update mapping documentation commit 8cc5c54de44c5e8e104d364a627ac4296845fc7f upstream. Now that we implement the full remapping algorithms described in our documentation remove the section about shortcircuting them. Link: https://lore.kernel.org/r/20211123114227.3124056-6-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-6-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-6-brauner@kernel.org Cc: Seth Forshee Cc: Amir Goldstein Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit b20dcf603b8d0bb24a45c8e6cdd345e3fb3aa3d4 Author: Christian Brauner Date: Tue Jun 28 14:16:12 2022 +0200 fs: account for filesystem mappings commit 1ac2a4104968e0a60b4b3572216a92aab5c1b025 upstream. Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc4909d ("fs/mount_setattr: tighten permission checks") [2]: https://github.com/containers/podman/issues/10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e791a1 ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/20211123114227.3124056-5-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-5-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-5-brauner@kernel.org Cc: Seth Forshee Cc: Amir Goldstein Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit 3374eb1b0afc9d4b2c35599d240b062488d976b4 Author: Christian Brauner Date: Tue Jun 28 14:16:11 2022 +0200 fs: tweak fsuidgid_has_mapping() commit 476860b3eb4a50958243158861d5340066df5af2 upstream. If the caller's fs{g,u}id aren't mapped in the mount's idmapping we can return early and skip the check whether the mapped fs{g,u}id also have a mapping in the filesystem's idmapping. If the fs{g,u}id aren't mapped in the mount's idmapping they consequently can't be mapped in the filesystem's idmapping. So there's no point in checking that. Link: https://lore.kernel.org/r/20211123114227.3124056-4-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-4-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-4-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit 7bc23abcb4142ea3cb81d1772b8c6501bd63f2ad Author: Christian Brauner Date: Tue Jun 28 14:16:10 2022 +0200 fs: move mapping helpers commit a793d79ea3e041081cd7cbd8ee43d0b5e4914a2b upstream. The low-level mapping helpers were so far crammed into fs.h. They are out of place there. The fs.h header should just contain the higher-level mapping helpers that interact directly with vfs objects such as struct super_block or struct inode and not the bare mapping helpers. Similarly, only vfs and specific fs code shall interact with low-level mapping helpers. And so they won't be made accessible automatically through regular {g,u}id helpers. Link: https://lore.kernel.org/r/20211123114227.3124056-3-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-3-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-3-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit b3679e8b59966cf106d1f24f4467f8d346a4480b Author: Christian Brauner Date: Tue Jun 28 14:16:09 2022 +0200 fs: add is_idmapped_mnt() helper commit bb49e9e730c2906a958eee273a7819f401543d6c upstream. Multiple places open-code the same check to determine whether a given mount is idmapped. Introduce a simple helper function that can be used instead. This allows us to get rid of the fragile open-coding. We will later change the check that is used to determine whether a given mount is idmapped. Introducing a helper allows us to do this in a single place instead of doing it for multiple places. Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1) Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2) Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Amir Goldstein Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Greg Kroah-Hartman commit ab0b6dc5e16ba6a876d141082912e8ed05e2a491 Author: Naveen N. Rao Date: Mon May 16 12:44:22 2022 +0530 powerpc/ftrace: Remove ftrace init tramp once kernel init is complete commit 84ade0a6655bee803d176525ef457175cbf4df22 upstream. Stop using the ftrace trampoline for init section once kernel init is complete. Fixes: 67361cf8071286 ("powerpc/ftrace: Handle large kernel configs") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Naveen N. Rao Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220516071422.463738-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman commit ce6bfe55237e933fa31dad1c85686023ae387249 Author: Darrick J. Wong Date: Tue Jun 28 11:39:51 2022 -0700 xfs: only bother with sync_filesystem during readonly remount [ Upstream commit b97cca3ba9098522e5a1c3388764ead42640c1a5 ] In commit 02b9984d6408, we pushed a sync_filesystem() call from the VFS into xfs_fs_remount. The only time that we ever need to push dirty file data or metadata to disk for a remount is if we're remounting the filesystem read only, so this really could be moved to xfs_remount_ro. Once we've moved the call site, actually check the return value from sync_filesystem. Fixes: 02b9984d6408 ("fs: push sync_filesystem() down to the file system's remount_fs()") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 3465b167831e9688efa9b5dba44070661b142e22 Author: Darrick J. Wong Date: Tue Jun 28 11:39:50 2022 -0700 xfs: prevent UAF in xfs_log_item_in_current_chkpt [ Upstream commit f8d92a66e810acbef6ddbc0bd0cbd9b117ce8acd ] While I was running with KASAN and lockdep enabled, I stumbled upon an KASAN report about a UAF to a freed CIL checkpoint. Looking at the comment for xfs_log_item_in_current_chkpt, it seems pretty obvious to me that the original patch to xfs_defer_finish_noroll should have done something to lock the CIL to prevent it from switching the CIL contexts while the predicate runs. For upper level code that needs to know if a given log item is new enough not to need relogging, add a new wrapper that takes the CIL context lock long enough to sample the current CIL context. This is kind of racy in that the CIL can switch the contexts immediately after sampling, but that's ok because the consequence is that the defer ops code is a little slow to relog items. ================================================================== BUG: KASAN: use-after-free in xfs_log_item_in_current_chkpt+0x139/0x160 [xfs] Read of size 8 at addr ffff88804ea5f608 by task fsstress/527999 CPU: 1 PID: 527999 Comm: fsstress Tainted: G D 5.16.0-rc4-xfsx #rc4 Call Trace: dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x83/0xdf xfs_log_item_in_current_chkpt+0x139/0x160 xfs_defer_finish_noroll+0x3bb/0x1e30 __xfs_trans_commit+0x6c8/0xcf0 xfs_reflink_remap_extent+0x66f/0x10e0 xfs_reflink_remap_blocks+0x2dd/0xa90 xfs_file_remap_range+0x27b/0xc30 vfs_dedupe_file_range_one+0x368/0x420 vfs_dedupe_file_range+0x37c/0x5d0 do_vfs_ioctl+0x308/0x1260 __x64_sys_ioctl+0xa1/0x170 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f2c71a2950b Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe8c0e03c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00005600862a8740 RCX: 00007f2c71a2950b RDX: 00005600862a7be0 RSI: 00000000c0189436 RDI: 0000000000000004 RBP: 000000000000000b R08: 0000000000000027 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000005a R13: 00005600862804a8 R14: 0000000000016000 R15: 00005600862a8a20 Allocated by task 464064: kasan_save_stack+0x1e/0x50 __kasan_kmalloc+0x81/0xa0 kmem_alloc+0xcd/0x2c0 [xfs] xlog_cil_ctx_alloc+0x17/0x1e0 [xfs] xlog_cil_push_work+0x141/0x13d0 [xfs] process_one_work+0x7f6/0x1380 worker_thread+0x59d/0x1040 kthread+0x3b0/0x490 ret_from_fork+0x1f/0x30 Freed by task 51: kasan_save_stack+0x1e/0x50 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xed/0x130 slab_free_freelist_hook+0x7f/0x160 kfree+0xde/0x340 xlog_cil_committed+0xbfd/0xfe0 [xfs] xlog_cil_process_committed+0x103/0x1c0 [xfs] xlog_state_do_callback+0x45d/0xbd0 [xfs] xlog_ioend_work+0x116/0x1c0 [xfs] process_one_work+0x7f6/0x1380 worker_thread+0x59d/0x1040 kthread+0x3b0/0x490 ret_from_fork+0x1f/0x30 Last potentially related work creation: kasan_save_stack+0x1e/0x50 __kasan_record_aux_stack+0xb7/0xc0 insert_work+0x48/0x2e0 __queue_work+0x4e7/0xda0 queue_work_on+0x69/0x80 xlog_cil_push_now.isra.0+0x16b/0x210 [xfs] xlog_cil_force_seq+0x1b7/0x850 [xfs] xfs_log_force_seq+0x1c7/0x670 [xfs] xfs_file_fsync+0x7c1/0xa60 [xfs] __x64_sys_fsync+0x52/0x80 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88804ea5f600 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 8 bytes inside of 256-byte region [ffff88804ea5f600, ffff88804ea5f700) The buggy address belongs to the page: page:ffffea00013a9780 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804ea5ea00 pfn:0x4ea5e head:ffffea00013a9780 order:1 compound_mapcount:0 flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff) raw: 04fff80000010200 ffffea0001245908 ffffea00011bd388 ffff888004c42b40 raw: ffff88804ea5ea00 0000000000100009 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88804ea5f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88804ea5f580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88804ea5f600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88804ea5f680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88804ea5f700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fixes: 4e919af7827a ("xfs: periodically relog deferred intent items") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 4f0c91ab4c7d7608707bab4ffecd6034a5d926a1 Author: Dave Chinner Date: Tue Jun 28 11:39:49 2022 -0700 xfs: check sb_meta_uuid for dabuf buffer recovery [ Upstream commit 09654ed8a18cfd45027a67d6cbca45c9ea54feab ] Got a report that a repeated crash test of a container host would eventually fail with a log recovery error preventing the system from mounting the root filesystem. It manifested as a directory leaf node corruption on writeback like so: XFS (loop0): Mounting V5 Filesystem XFS (loop0): Starting recovery (logdev: internal) XFS (loop0): Metadata corruption detected at xfs_dir3_leaf_check_int+0x99/0xf0, xfs_dir3_leaf1 block 0x12faa158 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3d f1 00 00 e1 9e d5 8b ........=....... 00000010: 00 00 00 00 12 fa a1 58 00 00 00 29 00 00 1b cc .......X...).... 00000020: 91 06 78 ff f7 7e 4a 7d 8d 53 86 f2 ac 47 a8 23 ..x..~J}.S...G.# 00000030: 00 00 00 00 17 e0 00 80 00 43 00 00 00 00 00 00 .........C...... 00000040: 00 00 00 2e 00 00 00 08 00 00 17 2e 00 00 00 0a ................ 00000050: 02 35 79 83 00 00 00 30 04 d3 b4 80 00 00 01 50 .5y....0.......P 00000060: 08 40 95 7f 00 00 02 98 08 41 fe b7 00 00 02 d4 .@.......A...... 00000070: 0d 62 ef a7 00 00 01 f2 14 50 21 41 00 00 00 0c .b.......P!A.... XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_do_force_shutdown+0x1a/0x20 (fs/xfs/xfs_buf.c:1514). Shutting down. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed Tracing indicated that we were recovering changes from a transaction at LSN 0x29/0x1c16 into a buffer that had an LSN of 0x29/0x1d57. That is, log recovery was overwriting a buffer with newer changes on disk than was in the transaction. Tracing indicated that we were hitting the "recovery immediately" case in xfs_buf_log_recovery_lsn(), and hence it was ignoring the LSN in the buffer. The code was extracting the LSN correctly, then ignoring it because the UUID in the buffer did not match the superblock UUID. The problem arises because the UUID check uses the wrong UUID - it should be checking the sb_meta_uuid, not sb_uuid. This filesystem has sb_uuid != sb_meta_uuid (which is fine), and the buffer has the correct matching sb_meta_uuid in it, it's just the code checked it against the wrong superblock uuid. The is no corruption in the filesystem, and failing to recover the buffer due to a write verifier failure means the recovery bug did not propagate the corruption to disk. Hence there is no corruption before or after this bug has manifested, the impact is limited simply to an unmountable filesystem.... This was missed back in 2015 during an audit of incorrect sb_uuid usage that resulted in commit fcfbe2c4ef42 ("xfs: log recovery needs to validate against sb_meta_uuid") that fixed the magic32 buffers to validate against sb_meta_uuid instead of sb_uuid. It missed the magicda buffers.... Fixes: ce748eaa65f2 ("xfs: create new metadata UUID field and incompat flag") Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit c4f376ba8be836d7aea18c954514470cacfeab06 Author: Darrick J. Wong Date: Tue Jun 28 11:39:48 2022 -0700 xfs: remove all COW fork extents when remounting readonly [ Upstream commit 089558bc7ba785c03815a49c89e28ad9b8de51f9 ] As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount # (0) fsstress & # (1) while true; do fsstress mount -o remount,ro # (2) fsstress mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. Solve this race by forcing the xfs_blockgc_free_space to run synchronously, which causes xfs_icwalk to return to inodes that were skipped because the blockgc code couldn't take the IOLOCK. This is safe to do here because the VFS has already prohibited new writer threads. Fixes: 10ddf64e420f ("xfs: remove leftover CoW reservations when remounting ro") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Reviewed-by: Chandan Babu R Signed-off-by: Leah Rumancik Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 40de647b2bab58745a396b4f1bb29287ae5ec7f5 Author: Yang Xu Date: Tue Jun 28 11:39:47 2022 -0700 xfs: Fix the free logic of state in xfs_attr_node_hasname [ Upstream commit a1de97fe296c52eafc6590a3506f4bbd44ecb19a ] When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine. Adding a getxattr operation after xattr corrupted, I can reproduce it 100%. The deadlock as below: [983.923403] task:setfattr state:D stack: 0 pid:17639 ppid: 14687 flags:0x00000080 [ 983.923405] Call Trace: [ 983.923410] __schedule+0x2c4/0x700 [ 983.923412] schedule+0x37/0xa0 [ 983.923414] schedule_timeout+0x274/0x300 [ 983.923416] __down+0x9b/0xf0 [ 983.923451] ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923453] down+0x3b/0x50 [ 983.923471] xfs_buf_lock+0x33/0xf0 [xfs] [ 983.923490] xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs] [ 983.923508] xfs_buf_get_map+0x4c/0x320 [xfs] [ 983.923525] xfs_buf_read_map+0x53/0x310 [xfs] [ 983.923541] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923560] xfs_trans_read_buf_map+0x1cf/0x360 [xfs] [ 983.923575] ? xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923590] xfs_da_read_buf+0xcf/0x120 [xfs] [ 983.923606] xfs_da3_node_read+0x1f/0x40 [xfs] [ 983.923621] xfs_da3_node_lookup_int+0x69/0x4a0 [xfs] [ 983.923624] ? kmem_cache_alloc+0x12e/0x270 [ 983.923637] xfs_attr_node_hasname+0x6e/0xa0 [xfs] [ 983.923651] xfs_has_attr+0x6e/0xd0 [xfs] [ 983.923664] xfs_attr_set+0x273/0x320 [xfs] [ 983.923683] xfs_xattr_set+0x87/0xd0 [xfs] [ 983.923686] __vfs_removexattr+0x4d/0x60 [ 983.923688] __vfs_removexattr_locked+0xac/0x130 [ 983.923689] vfs_removexattr+0x4e/0xf0 [ 983.923690] removexattr+0x4d/0x80 [ 983.923693] ? __check_object_size+0xa8/0x16b [ 983.923695] ? strncpy_from_user+0x47/0x1a0 [ 983.923696] ? getname_flags+0x6a/0x1e0 [ 983.923697] ? _cond_resched+0x15/0x30 [ 983.923699] ? __sb_start_write+0x1e/0x70 [ 983.923700] ? mnt_want_write+0x28/0x50 [ 983.923701] path_removexattr+0x9b/0xb0 [ 983.923702] __x64_sys_removexattr+0x17/0x20 [ 983.923704] do_syscall_64+0x5b/0x1a0 [ 983.923705] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 983.923707] RIP: 0033:0x7f080f10ee1b When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job. Then subsequent removexattr will hang because of it. This bug was introduced by kernel commit 07120f1abdff ("xfs: Add xfs_has_attr and subroutines"). It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state in this case. But xfs_attr_node_hasname will free state itself instead of caller if xfs_da3_node_lookup_int fails. Fix this bug by moving the step of free state into caller. Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and xfs_attr_node_removename_setup function because we should free state ourselves. Fixes: 07120f1abdff ("xfs: Add xfs_has_attr and subroutines") Signed-off-by: Yang Xu Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 0e84e17c16a3c720f2492cb8c4d85a32d1c5511e Author: Brian Foster Date: Tue Jun 28 11:39:46 2022 -0700 xfs: punch out data fork delalloc blocks on COW writeback failure [ Upstream commit 5ca5916b6bc93577c360c06cb7cdf71adb9b5faf ] If writeback I/O to a COW extent fails, the COW fork blocks are punched out and the data fork blocks left alone. It is possible for COW fork blocks to overlap non-shared data fork blocks (due to cowextsz hint prealloc), however, and writeback unconditionally maps to the COW fork whenever blocks exist at the corresponding offset of the page undergoing writeback. This means it's quite possible for a COW fork extent to overlap delalloc data fork blocks, writeback to convert and map to the COW fork blocks, writeback to fail, and finally for ioend completion to cancel the COW fork blocks and leave stale data fork delalloc blocks around in the inode. The blocks are effectively stale because writeback failure also discards dirty page state. If this occurs, it is likely to trigger assert failures, free space accounting corruption and failures in unrelated file operations. For example, a subsequent reflink attempt of the affected file to a new target file will trip over the stale delalloc in the source file and fail. Several of these issues are occasionally reproduced by generic/648, but are reproducible on demand with the right sequence of operations and timely I/O error injection. To fix this problem, update the ioend failure path to also punch out underlying data fork delalloc blocks on I/O error. This is analogous to the writeback submission failure path in xfs_discard_page() where we might fail to map data fork delalloc blocks and consistent with the successful COW writeback completion path, which is responsible for unmapping from the data fork and remapping in COW fork blocks. Fixes: 787eb485509f ("xfs: fix and streamline error handling in xfs_end_io") Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 71a218ca4fde7255fde96f5dc84e420e496dee17 Author: Rustam Kovhaev Date: Tue Jun 28 11:39:45 2022 -0700 xfs: use kmem_cache_free() for kmem_cache objects [ Upstream commit c30a0cbd07ecc0eec7b3cd568f7b1c7bb7913f93 ] For kmalloc() allocations SLOB prepends the blocks with a 4-byte header, and it puts the size of the allocated blocks in that header. Blocks allocated with kmem_cache_alloc() allocations do not have that header. SLOB explodes when you allocate memory with kmem_cache_alloc() and then try to free it with kfree() instead of kmem_cache_free(). SLOB will assume that there is a header when there is none, read some garbage to size variable and corrupt the adjacent objects, which eventually leads to hang or panic. Let's make XFS work with SLOB by using proper free function. Fixes: 9749fee83f38 ("xfs: enable the xfs_defer mechanism to process extents to free") Signed-off-by: Rustam Kovhaev Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Darrick J. Wong Signed-off-by: Greg Kroah-Hartman commit 1cdcd496b7ca3821df8b8551f1775b53efc934e0 Author: Coly Li Date: Fri May 27 23:28:16 2022 +0800 bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() commit 7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c upstream. The local variables check_state (in bch_btree_check()) and state (in bch_sectors_dirty_init()) should be fully filled by 0, because before allocating them on stack, they were dynamically allocated by kzalloc(). Signed-off-by: Coly Li Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit edbaf6e5e93acda96aae23ba134ef3c1466da3b5 Author: Greg Kroah-Hartman Date: Thu Jun 30 12:19:47 2022 +0200 x86, kvm: use proper ASM macros for kvm_vcpu_is_preempted The build rightfully complains about: arch/x86/kernel/kvm.o: warning: objtool: __raw_callee_save___kvm_vcpu_is_preempted()+0x12: missing int3 after ret because the ASM_RET call is not being used correctly in kvm_vcpu_is_preempted(). This was hand-fixed-up in the kvm merge commit a4cfff3f0f8c ("Merge branch 'kvm-older-features' into HEAD") which of course can not be backported to stable kernels, so just fix this up directly instead. Cc: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit f4a80ec8c51d68be4b7a7830c510f75080c5e417 Author: Masahiro Yamada Date: Mon Jun 27 12:22:09 2022 +0900 tick/nohz: unexport __init-annotated tick_nohz_full_setup() commit 2390095113e98fc52fffe35c5206d30d9efe3f78 upstream. EXPORT_SYMBOL and __init is a bad combination because the .init.text section is freed up after the initialization. Hence, modules cannot use symbols annotated __init. The access to a freed symbol may end up with kernel panic. modpost used to detect it, but it had been broken for a decade. Commit 28438794aba4 ("modpost: fix section mismatch check for exported init/exit sections") fixed it so modpost started to warn it again, then this showed up: MODPOST vmlinux.symvers WARNING: modpost: vmlinux.o(___ksymtab_gpl+tick_nohz_full_setup+0x0): Section mismatch in reference from the variable __ksymtab_tick_nohz_full_setup to the function .init.text:tick_nohz_full_setup() The symbol tick_nohz_full_setup is exported and annotated __init Fix this by removing the __init annotation of tick_nohz_full_setup or drop the export. Drop the export because tick_nohz_full_setup() is only called from the built-in code in kernel/sched/isolation.c. Fixes: ae9e557b5be2 ("time: Export tick start/stop functions for rcutorture") Reported-by: Linus Torvalds Signed-off-by: Masahiro Yamada Tested-by: Paul E. McKenney Signed-off-by: Linus Torvalds Cc: Thomas Backlund Signed-off-by: Greg Kroah-Hartman