A recent snafu where Intel ignored upstream feedback on a firmware
change, led to a late rc6 fix being required. In order to avoid this
in the future we should document some expectations around
linux-firmware.
I was originally going to write this for drm, but it seems quite generic
advice.
v2: rewritten with suggestions from Thorsten Leemhuis
v3: rewritten with suggestions from Mauro
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://lore.kernel.org/r/20220721044352.3110507-1-airlied@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
init_cpu_topology() is called only once at the boot and all the cache
attributes are detected early for all the possible CPUs. However when
the CPUs are hotplugged out, the cacheinfo gets removed. While the
attributes are added back when the CPUs are hotplugged back in as part
of CPU hotplug state machine, it ends up called quite late after the
update_siblings_masks() are called in the secondary_start_kernel()
resulting in wrong llc_sibling_masks.
Move the call to detect_cache_attributes() inside update_siblings_masks()
to ensure the cacheinfo is updated before the LLC sibling masks are
updated. This will fix the incorrect LLC sibling masks generated when
the CPUs are hotplugged out and hotplugged back in again.
Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20220720-arch_topo_fixes-v3-3-43d696288e84@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, everytime an information needs to be fetched from the PPTT,
the table is mapped via acpi_get_table() and unmapped after the use via
acpi_put_table() which is fine. However we do this at runtime especially
when the CPU is hotplugged out and plugged in back since we re-populate
the cache topology and other information.
However, with the support to fetch LLC information from the PPTT in the
cpuhotplug path which is executed in the atomic context, it is preferred
to avoid mapping and unmapping of the PPTT for every single use as the
acpi_get_table() might sleep waiting for a mutex.
In order to avoid the same, the table is needs to just mapped once on
the boot CPU and is never unmapped allowing it to be used at runtime
with out the hassle of mapping and unmapping the table.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
--
Hi Rafael,
Sorry to bother you again on this PPTT changes. Guenter reported an issue
with lockdep enabled in -next that include my cacheinfo/arch_topology changes
to utilise LLC from PPTT in the CPU hotplug path.
Please ack the change once you are happy so that I can get it merged with
other fixes via Greg's tree.
Regards,
Sudeep
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20220720-arch_topo_fixes-v3-2-43d696288e84@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On couple of architectures like RISC-V and ARM64, we need to detect
cache attribues quite early during the boot when the secondary CPUs
start. So we will call detect_cache_attributes in the atomic context
and since use of normal allocation can sleep, we will end up getting
"sleeping in the atomic context" bug splat.
In order avoid that, move the allocation to use atomic version in
preparation to move the actual detection of cache attributes in the
CPU hotplug path which is atomic.
Cc: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20220720-arch_topo_fixes-v3-1-43d696288e84@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using bin_attributes with a 0 size causes fstat and friends to return that
0 size. This breaks userspace code that retrieves the size before reading
the file. Rather than reverting 75bd50fa84 ("drivers/base/node.c: use
bin_attribute to break the size limitation of cpumap ABI") let's put in a
size value at compile time.
For cpulist the maximum size is on the order of
NR_CPUS * (ceil(log10(NR_CPUS)) + 1)/2
which for 8192 is 20480 (8192 * 5)/2. In order to get near that you'd need
a system with every other CPU on one node. For example: (0,2,4,8, ... ).
To simplify the math and support larger NR_CPUS in the future we are using
(NR_CPUS * 7)/2. We also set it to a min of PAGE_SIZE to retain the older
behavior for smaller NR_CPUS.
The cpumap file the size works out to be NR_CPUS/4 + NR_CPUS/32 - 1
(or NR_CPUS * 9/32 - 1) including the ","s.
Add a set of macros for these values to cpumask.h so they can be used in
multiple places. Apply these to the handful of such files in
drivers/base/topology.c as well as node.c.
As an example, on an 80 cpu 4-node system (NR_CPUS == 8192):
before:
-r--r--r--. 1 root root 0 Jul 12 14:08 system/node/node0/cpulist
-r--r--r--. 1 root root 0 Jul 11 17:25 system/node/node0/cpumap
after:
-r--r--r--. 1 root root 28672 Jul 13 11:32 system/node/node0/cpulist
-r--r--r--. 1 root root 4096 Jul 13 11:31 system/node/node0/cpumap
CONFIG_NR_CPUS = 16384
-r--r--r--. 1 root root 57344 Jul 13 14:03 system/node/node0/cpulist
-r--r--r--. 1 root root 4607 Jul 13 14:02 system/node/node0/cpumap
The actual number of cpus doesn't matter for the reported size since they
are based on NR_CPUS.
Fixes: 75bd50fa84 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI")
Fixes: bb9ec13d15 ("topology: use bin_attribute to break the size limitation of cpumap ABI")
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Yury Norov <yury.norov@gmail.com> (for include/linux/cpumask.h)
Signed-off-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220715134924.3466194-1-pauld@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Going forward, I'll be using my kernel.org for upstream work.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20220714112533.539910-6-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Going forward, I'll be using my kernel.org for upstream work.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20220714112533.539910-5-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Should the need for toolchain mitigations ever be necessary, add a group
for toolchain ambassadors.
Add Nick Desaulniers as LLVM's ambassador for the embargoed hardware
issues process.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220711181101.1559558-1-ndesaulniers@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These are updates to fix some discrepancies we have in the CPU topology
parsing from the device tree /cpu-map node and the divergence from the
behaviour on a ACPI enabled platform. The expectation is that both DT
and ACPI enabled systems must present consistent view of the CPU topology.
The current assignment of generated cluster count as the physical package
identifier for each CPU is wrong. The device tree bindings for CPU
topology supports sockets to infer the socket or physical package
identifier for a given CPU. It is now being made use of you address the
issue. These updates also assigns the cluster identifier as parsed from
the device tree cluster nodes within /cpu-map without support for
nesting of the clusters as there are no such reported/known platforms.
In order to be on par with ACPI PPTT physical package/socket support,
these updates also include support for socket nodes in /cpu-map.
The only exception is that the last level cache id information can be
inferred from the same ACPI PPTT while we need to parse CPU cache nodes
in the device tree. The cacheinfo changes here is to enable the re-use
of the cacheinfo to detect the cache attributes for all the CPU quite
early even before the scondardaries are booted so that the information
can be used to build the schedular domains especially the last level
cache(LLC).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEunHlEgbzHrJD3ZPhAEG6vDF+4pgFAmLFak4ACgkQAEG6vDF+
4pjHNxAAyzXpazkWuTjTot9UcX2TE8kIlEB4wXoGr1WJfi2uefVuvo3owvSKLdmr
pBNUf0fSLKShljueXOYcmwxVvSoN3zWSrOh4huUBWv0VPBg5yyNplZChwbhxjmiL
N5FGtSDHoTgPjjMUXnlsa/Y7/RDUhxR+s4ZdGS/vnMHbGr8Gsm5bjc6BNCq9E6cz
xnGDzUOS3+Sin+751De09HIuH5FOoCEpWOj6FGVK3MtEsizHU4ANEKTgFdsE26mG
nmVY1CU3GJmHluvG1JgL4+HmZsW02h2yU0tRSLqcseJCUou8gJ5yr0wYF6wmsHGk
nzGDfV7GzLdQg5rVnWcgzrzibqbBKJvh795e2cW3tV60VjMxlh57L7OWnHAEzQh8
QCF8HeE2CGg6VlYC+oB5JZ4pLdPE69e0+fHzhFy7hqK/B9yOr0olIKXxcm4tR/TV
5Ri7D8bX4Oviq1pQcT+GE/8Of5vX5R9LTiH1V0ld38npVvA55KDnO6WKvcadEucO
tKnHZx2dZYR/sRJ8ABz4hb7+UlLiLpCPFudx+BcXLHn+nWSqXYlu+F/2D2nsGRP+
HpmJHISJ40gD669KvupLg0/TtPdQ1oRmuf9CXUiMkzQZcySIW4wmFvCvbUfMcApl
7OzQ+FtAPb7n821mSV7KJBYJ5xOxNVR7DfUovmXCa/91xfqmE1M=
=UCxa
-----END PGP SIGNATURE-----
Merge tag 'arch-cache-topo-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into driver-core-next
Sudeep writes:
cacheinfo and arch_topology updates for v5.20
These are updates to fix some discrepancies we have in the CPU topology
parsing from the device tree /cpu-map node and the divergence from the
behaviour on a ACPI enabled platform. The expectation is that both DT
and ACPI enabled systems must present consistent view of the CPU topology.
The current assignment of generated cluster count as the physical package
identifier for each CPU is wrong. The device tree bindings for CPU
topology supports sockets to infer the socket or physical package
identifier for a given CPU. It is now being made use of you address the
issue. These updates also assigns the cluster identifier as parsed from
the device tree cluster nodes within /cpu-map without support for
nesting of the clusters as there are no such reported/known platforms.
In order to be on par with ACPI PPTT physical package/socket support,
these updates also include support for socket nodes in /cpu-map.
The only exception is that the last level cache id information can be
inferred from the same ACPI PPTT while we need to parse CPU cache nodes
in the device tree. The cacheinfo changes here is to enable the re-use
of the cacheinfo to detect the cache attributes for all the CPU quite
early even before the scondardaries are booted so that the information
can be used to build the schedular domains especially the last level
cache(LLC).
* tag 'arch-cache-topo-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: (21 commits)
ACPI: Remove the unused find_acpi_cpu_cache_topology()
arch_topology: Warn that topology for nested clusters is not supported
arch_topology: Add support for parsing sockets in /cpu-map
arch_topology: Set cluster identifier in each core/thread from /cpu-map
arch_topology: Limit span of cpu_clustergroup_mask()
arch_topology: Don't set cluster identifier as physical package identifier
arch_topology: Avoid parsing through all the CPUs once a outlier CPU is found
arch_topology: Check for non-negative value rather than -1 for IDs validity
arch_topology: Set thread sibling cpumask only within the cluster
arch_topology: Drop LLC identifier stash from the CPU topology
arm64: topology: Remove redundant setting of llc_id in CPU topology
arch_topology: Use the last level cache information from the cacheinfo
arch_topology: Add support to parse and detect cache attributes
cacheinfo: Align checks in cache_shared_cpu_map_{setup,remove} for readability
cacheinfo: Use cache identifiers to check if the caches are shared if available
cacheinfo: Allow early detection and population of cache attributes
cacheinfo: Add support to check if last level cache(LLC) is valid or shared
cacheinfo: Move cache_leaves_are_shared out of CONFIG_OF
cacheinfo: Add helper to access any cache index for a given CPU
cacheinfo: Use of_cpu_device_node_get instead cpu_dev->of_node
...
This reverts commit b8f35fa118.
This is causing regression due to same kernfs_node getting
added multiple times in kernfs_notify_list so revert it until
safe way of using llist in this context is found.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Michael Walle <michael@walle.cc>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Cc: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220705201026.2487665-1-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The sole user of this find_acpi_cpu_cache_topology() was arm64 topology
which is now consolidated into the generic arch_topology without the need
of this function.
Drop the unused function find_acpi_cpu_cache_topology().
Link: https://lore.kernel.org/r/20220704101605.1318280-22-sudeep.holla@arm.com
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reported-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
We don't support the topology for clusters of CPU clusters while the
DT and ACPI bindings theoritcally support the same. Just warn about the
same so that it is clear to the users of arch_topology that the nested
clusters are not yet supported.
Link: https://lore.kernel.org/r/20220704101605.1318280-21-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Finally let us add support for socket nodes in /cpu-map in the device
tree. Since this may not be present in all the old platforms and even
most of the existing platforms, we need to assume absence of the socket
node indicates that it is a single socket system and handle appropriately.
Also it is likely that most single socket systems skip to as the node
since it is optional.
Link: https://lore.kernel.org/r/20220704101605.1318280-20-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Let us set the cluster identifier as parsed from the device tree
cluster nodes within /cpu-map.
We don't support nesting of clusters yet as there are no real hardware
to support clusters of clusters.
Link: https://lore.kernel.org/r/20220704101605.1318280-19-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Currently the cluster identifier is not set on DT based platforms.
The reset or default value is -1 for all the CPUs. Once we assign the
cluster identifier values correctly, the cluster_sibling mask will be
populated and returned by cpu_clustergroup_mask() to contribute in the
creation of the CLS scheduling domain level, if SCHED_CLUSTER is
enabled.
To avoid topologies that will result in questionable or incorrect
scheduling domains, impose restrictions regarding the span of clusters,
as presented to scheduling domains building code: cluster_sibling should
not span more or the same CPUs as cpu_coregroup_mask().
This is needed in order to obtain a strict separation between the MC and
CLS levels, and maintain the same domains for existing platforms in
the presence of CONFIG_SCHED_CLUSTER, where the new cluster information
is redundant and irrelevant for the scheduler.
While previously the scheduling domain builder code would have removed MC
as redundant and kept CLS if SCHED_CLUSTER was enabled and the
cpu_coregroup_mask() and cpu_clustergroup_mask() spanned the same CPUs,
now CLS will be removed and MC kept.
Link: https://lore.kernel.org/r/20220704101605.1318280-18-sudeep.holla@arm.com
Cc: Darren Hart <darren@os.amperecomputing.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Currently as we parse the CPU topology from /cpu-map node from the
device tree, we assign generated cluster count as the physical package
identifier for each CPU which is wrong.
The device tree bindings for CPU topology supports sockets to infer
the socket or physical package identifier for a given CPU. Since it is
fairly new and not supported on most of the old and existing systems, we
can assume all such systems have single socket/physical package.
Fix the physical package identifier to 0 by removing the assignment of
cluster identifier to the same.
Link: https://lore.kernel.org/r/20220704101605.1318280-17-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
There is no point in looping through all the CPU's physical package
identifier to check if it is valid or not once a CPU which is outside
the topology(i.e. outlier CPU) is found.
Let us just break out of the loop early in such case.
Link: https://lore.kernel.org/r/20220704101605.1318280-16-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Instead of just comparing the cpu topology IDs with -1 to check their
validity, improve that by checking for a valid non-negative value.
Link: https://lore.kernel.org/r/20220704101605.1318280-15-sudeep.holla@arm.com
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Currently the cluster identifier is not set on the DT based platforms.
The reset or default value is -1 for all the CPUs. Once we assign the
cluster identifier values correctly that may result in getting the thread
siblings wrong as the core identifiers can be same for 2 different CPUs
belonging to 2 different cluster.
So, in order to get the thread sibling cpumasks correct, we need to
update them only if the cores they belong are in the same cluster within
the socket. Let us skip updation of the thread sibling cpumaks if the
cluster identifier doesn't match.
This change won't affect even if the cluster identifiers are not set
currently but will avoid any breakage once we set the same correctly.
Link: https://lore.kernel.org/r/20220704101605.1318280-14-sudeep.holla@arm.com
Tested-by: Gavin Shan <gshan@redhat.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Since the cacheinfo LLC information is used directly in arch_topology,
there is no need to parse and store the LLC ID information only for
ACPI systems in the CPU topology.
Remove the redundant LLC ID from the generic CPU arch_topology
information.
Link: https://lore.kernel.org/r/20220704101605.1318280-13-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Since the cacheinfo LLC information is used directly in arch_topology,
there is no need to parse and fetch the LLC ID information only for
ACPI systems.
Just drop the redundant parsing and setting of llc_id in CPU topology
from ACPI PPTT.
Link: https://lore.kernel.org/r/20220704101605.1318280-12-sudeep.holla@arm.com
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The cacheinfo is now initialised early along with the CPU topology
initialisation. Instead of relying on the LLC ID information parsed
separately only with ACPI PPTT elsewhere, migrate to use the similar
information from the cacheinfo.
This is generic for both DT and ACPI systems. The ACPI LLC ID information
parsed separately can now be removed from arch specific code.
Link: https://lore.kernel.org/r/20220704101605.1318280-11-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Currently ACPI populates just the minimum information about the last
level cache from PPTT in order to feed the same to build sched_domains.
Similar support for DT platforms is not present.
In order to enable the same, the entire cache hierarchy information can
be built as part of CPU topoplogy parsing both on ACPI and DT platforms.
Note that this change builds the cacheinfo early even on ACPI systems,
but the current mechanism of building llc_sibling mask remains unchanged.
Link: https://lore.kernel.org/r/20220704101605.1318280-10-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The checks to skip the CPU itself or no cacheinfo case are implemented
bit differently though the effect is exactly same. Just align the
implementation in both cache_shared_cpu_map_{setup,remove} just for
improved readability. No functional change.
Link: https://lore.kernel.org/r/20220704101605.1318280-9-sudeep.holla@arm.com
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The cache identifiers is an optional property on most of the platforms.
The presence of one must be indicated by the CACHE_ID valid bit in the
attributes.
We can use the cache identifiers provided by the firmware to check if
any two cpus share the same cache instead of relying on the fw_token
generated and set in the OS.
Link: https://lore.kernel.org/r/20220704101605.1318280-8-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Some architecture/platforms may need to setup cache properties very
early in the boot along with other cpu topologies so that all these
information can be used to build sched_domains which is used by the
scheduler.
Allow detect_cache_attributes to be called quite early during the boot.
Link: https://lore.kernel.org/r/20220704101605.1318280-7-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
It is useful to have helper to check if the given two CPUs share last
level cache. We can do that check by comparing fw_token or by comparing
the cache ID. Currently we check just for fw_token as the cache ID is
optional.
This helper can be used to build the llc_sibling during arch specific
topology parsing and feeding information to the sched_domains. This also
helps to get rid of llc_id in the CPU topology as it is sort of duplicate
information.
Also add helper to check if the llc information in cacheinfo is valid
or not.
Link: https://lore.kernel.org/r/20220704101605.1318280-6-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
cache_leaves_are_shared is already used even with ACPI and PPTT. It
checks if the cache leaves are the shared based on fw_token pointer.
However it is defined conditionally only if CONFIG_OF is enabled which
is wrong.
Move the function cache_leaves_are_shared out of CONFIG_OF and keep it
generic. It also handles the case where both OF and ACPI is not defined.
Link: https://lore.kernel.org/r/20220704101605.1318280-5-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The cacheinfo for a given CPU at a given index is used at quite a few
places by fetching the base point for index 0 using the helper
per_cpu_cacheinfo(cpu) and offsetting it by the required index.
Instead, add another helper to fetch the required pointer directly and
use it to simplify and improve readability.
Link: https://lore.kernel.org/r/20220704101605.1318280-4-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The of_cpu_device_node_get takes care of fetching the CPU'd device node
either from cached cpu_dev->of_node if cpu_dev is initialised or uses
of_get_cpu_node to parse and fetch node if cpu_dev isn't available yet.
Just use of_cpu_device_node_get instead of getting the cpu device first
and then using cpu_dev->of_node for two reasons:
1. There is no other use of cpu_dev and can be simplified
2. It enabled the use detect_cache_attributes and hence cache_setup_of_node
much earlier before the CPUs are registered as devices.
Link: https://lore.kernel.org/r/20220704101605.1318280-3-sudeep.holla@arm.com
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
There is need to use the cache sharing information quite early during
the boot before the secondary cores are up and running. The permanent
memory map for all the ACPI tables(via acpi_permanent_mmap) is turned
on in acpi_early_init() which is quite late for the above requirement.
As a result there is possibility that the ACPI PPTT gets mapped to
different virtual addresses. In such scenarios, using virtual address as
fw_token before the acpi_permanent_mmap is enabled results in different
fw_token for the same cache entity and hence wrong cache sharing
information will be deduced based on the same.
Instead of using virtual address, just use the table offset as the
unique firmware token for the caches. The same offset is used as
ACPI identifiers if the firmware has not set a valid one for other
entries in the ACPI PPTT.
Link: https://lore.kernel.org/r/20220704101605.1318280-2-sudeep.holla@arm.com
Cc: linux-acpi@vger.kernel.org
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
When lockdep is enabled, lockdep_assert_held_write would
cause potential NULL pointer dereference.
Fix the following smatch warnings:
fs/kernfs/dir.c:1353 __kernfs_remove() warn: variable dereferenced before check 'kn' (see line 1346)
Fixes: 393c371408 ("kernfs: switch global kernfs_rwsem lock to per-fs lock")
Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Link: https://lore.kernel.org/r/20220630082512.3482581-1-zys.zljxml@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In of_register_trusted_foundations(), we need to hold the reference
returned by of_find_compatible_node() and then use it to call
of_node_put() for refcount balance.
Signed-off-by: Liang He <windhl@126.com>
Link: https://lore.kernel.org/r/20220628021640.4015-1-windhl@126.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 71066545b4 ("driver core: Set fw_devlink.strict=1 by default")
enabled iommus and dmas dependency enforcement by default. On some
systems, this caused the console device's probe to get delayed until the
deferred_probe_timeout expires.
We need consoles to work as soon as possible, so mark the console device
node with FWNODE_FLAG_BEST_EFFORT so that fw_delink knows not to delay
the probe of the console device for suppliers without drivers. The
driver can then make the decision on where it can probe without those
suppliers or defer its probe.
Fixes: 71066545b4 ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer <sha@pengutronix.de>
Reported-by: Peng Fan <peng.fan@nxp.com>
Tested-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220623080344.783549-3-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode,
fw_devlink will do a best effort ordering for that device where it'll
only enforce the probe/suspend/resume ordering of that device with
suppliers that have drivers. The driver of that device can then decide
if it wants to defer probe or probe without the suppliers.
This will be useful for avoid probe delays of the console device that
were caused by commit 71066545b4 ("driver core: Set
fw_devlink.strict=1 by default").
Fixes: 71066545b4 ("driver core: Set fw_devlink.strict=1 by default")
Reported-by: Sascha Hauer <sha@pengutronix.de>
Reported-by: Peng Fan <peng.fan@nxp.com>
Tested-by: Peng Fan <peng.fan@nxp.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20220623080344.783549-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In current kernfs design a single mutex, kernfs_open_file_mutex, protects
the list of kernfs_open_file instances corresponding to a sysfs attribute.
So even if different tasks are opening or closing different sysfs files
they can contend on osq_lock of this mutex. The contention is more apparent
in large scale systems with few hundred CPUs where most of the CPUs have
running tasks that are opening, accessing or closing sysfs files at any
point of time.
Using hashed mutexes in place of a single global mutex, can significantly
reduce contention around global mutex and hence can provide better
scalability. Moreover as these hashed mutexes are not part of kernfs_node
objects we will not see any singnificant change in memory utilization of
kernfs based file systems like sysfs, cgroupfs etc.
Modify interface introduced in previous patch to make use of hashed
mutexes. Use kernfs_node address as hashing key.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Link: https://lore.kernel.org/r/20220615021059.862643-5-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This allows to change underlying mutex locking, without needing to change
the users of the lock. For example next patch modifies this interface to
use hashed mutexes in place of a single global kernfs_open_file_mutex.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Link: https://lore.kernel.org/r/20220615021059.862643-4-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At present kernfs_notify_list is implemented as a singly linked
list of kernfs_node(s), where last element points to itself and
value of ->attr.next tells if node is present on the list or not.
Both addition and deletion to list happen under kernfs_notify_lock.
Change kernfs_notify_list to llist so that addition to list can heppen
locklessly.
Suggested by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Link: https://lore.kernel.org/r/20220615021059.862643-3-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After removal of kernfs_open_node->refcnt in the previous patch,
kernfs_open_node_lock can be removed as well by making ->attr.open
RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open
to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock.
Suggested by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Link: https://lore.kernel.org/r/20220615021059.862643-2-imran.f.khan@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since previous 'rc = -EINVAL;', rc value doesn't change, so not
necessary to re-assign it again.
Signed-off-by: Lin Feng <linf@wangsu.com>
Link: https://lore.kernel.org/r/20220617091746.206515-1-linf@wangsu.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In __driver_attach function, There are also AA deadlock problem,
like the commit b232b02bf3 ("driver core: fix deadlock in
__device_attach").
stack like commit b232b02bf3 ("driver core: fix deadlock in
__device_attach").
list below:
In __driver_attach function, The lock holding logic is as follows:
...
__driver_attach
if (driver_allows_async_probing(drv))
device_lock(dev) // get lock dev
async_schedule_dev(__driver_attach_async_helper, dev); // func
async_schedule_node
async_schedule_node_domain(func)
entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
/* when fail or work limit, sync to execute func, but
__driver_attach_async_helper will get lock dev as
will, which will lead to A-A deadlock. */
if (!entry || atomic_read(&entry_count) > MAX_WORK) {
func;
else
queue_work_node(node, system_unbound_wq, &entry->work)
device_unlock(dev)
As above show, when it is allowed to do async probes, because of
out of memory or work limit, async work is not be allowed, to do
sync execute instead. it will lead to A-A deadlock because of
__driver_attach_async_helper getting lock dev.
Reproduce:
and it can be reproduce by make the condition
(if (!entry || atomic_read(&entry_count) > MAX_WORK)) untenable, like
below:
[ 370.785650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 370.787154] task:swapper/0 state:D stack: 0 pid: 1 ppid:
0 flags:0x00004000
[ 370.788865] Call Trace:
[ 370.789374] <TASK>
[ 370.789841] __schedule+0x482/0x1050
[ 370.790613] schedule+0x92/0x1a0
[ 370.791290] schedule_preempt_disabled+0x2c/0x50
[ 370.792256] __mutex_lock.isra.0+0x757/0xec0
[ 370.793158] __mutex_lock_slowpath+0x1f/0x30
[ 370.794079] mutex_lock+0x50/0x60
[ 370.794795] __device_driver_lock+0x2f/0x70
[ 370.795677] ? driver_probe_device+0xd0/0xd0
[ 370.796576] __driver_attach_async_helper+0x1d/0xd0
[ 370.797318] ? driver_probe_device+0xd0/0xd0
[ 370.797957] async_schedule_node_domain+0xa5/0xc0
[ 370.798652] async_schedule_node+0x19/0x30
[ 370.799243] __driver_attach+0x246/0x290
[ 370.799828] ? driver_allows_async_probing+0xa0/0xa0
[ 370.800548] bus_for_each_dev+0x9d/0x130
[ 370.801132] driver_attach+0x22/0x30
[ 370.801666] bus_add_driver+0x290/0x340
[ 370.802246] driver_register+0x88/0x140
[ 370.802817] ? virtio_scsi_init+0x116/0x116
[ 370.803425] scsi_register_driver+0x1a/0x30
[ 370.804057] init_sd+0x184/0x226
[ 370.804533] do_one_initcall+0x71/0x3a0
[ 370.805107] kernel_init_freeable+0x39a/0x43a
[ 370.805759] ? rest_init+0x150/0x150
[ 370.806283] kernel_init+0x26/0x230
[ 370.806799] ret_from_fork+0x1f/0x30
To fix the deadlock, move the async_schedule_dev outside device_lock,
as we can see, in async_schedule_node_domain, the parameter of
queue_work_node is system_unbound_wq, so it can accept concurrent
operations. which will also not change the code logic, and will
not lead to deadlock.
Fixes: ef0ff68351 ("driver core: Probe devices asynchronously instead of the driver")
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Link: https://lore.kernel.org/r/20220622074327.497102-1-zhangwensheng5@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This was already defined at stable/sysfs-devices-system-cpu with
the same description, as pointed by get_abi.pl:
Warning: /sys/devices/system/cpu/cpuX/topology/core_id is defined 2 times: Documentation/ABI/stable/sysfs-devices-system-cpu:38 Documentation/ABI/testing/sysfs-devices-system-cpu:69
Remove the duplicated one.
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/1e92337c1ef74f5eb9e1c1871e20b858b490d269.1656235926.git.mchehab@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the devtmpfs fails to mount, a dangling pointer still remains in
global. Specifically, the err variable is passed by a pointer to the
devtmpfsd. When the devtmpfsd exits, it sets the error and completes the
setup_done. In this situation, the thread pointer is not set to null.
After the devtmpfsd exited, the devtmpfs can wakes up the destroyed
devtmpfsd thread by wake_up_process if a device change event comes.
Signed-off-by: Yangxi Xiang <xyangxi5@gmail.com>
Link: https://lore.kernel.org/r/20220627120409.11174-1-xyangxi5@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 77515ebaf0 as it
causes build problems in linux-next. It needs to be reintroduced in a
way that can allow the api to evolve and not require a "flag day" to
catch all users.
Link: https://lore.kernel.org/r/20220623160723.7a44b573@canb.auug.org.au
Cc: Duoming Zhou <duoming@zju.edu.cn>
Cc: Brian Norris <briannorris@chromium.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit a52ed4866d as it
causes build problems in linux-next. It needs to be reintroduced in a
way that can allow the api to evolve and not require a "flag day" to
catch all users.
Link: https://lore.kernel.org/r/20220623160723.7a44b573@canb.auug.org.au
Cc: Duoming Zhou <duoming@zju.edu.cn>
Cc: Brian Norris <briannorris@chromium.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are sleep in atomic context bugs when uploading device dump
data in mwifiex. The root cause is that dev_coredumpv could not
be used in atomic contexts, because it calls dev_set_name which
include operations that may sleep. The call tree shows execution
paths that could lead to bugs:
(Interrupt context)
fw_dump_timer_fn
mwifiex_upload_device_dump
dev_coredumpv(..., GFP_KERNEL)
dev_coredumpm()
kzalloc(sizeof(*devcd), gfp); //may sleep
dev_set_name
kobject_set_name_vargs
kvasprintf_const(GFP_KERNEL, ...); //may sleep
kstrdup(s, GFP_KERNEL); //may sleep
The corresponding fail log is shown below:
[ 135.275938] usb 1-1: == mwifiex dump information to /sys/class/devcoredump start
[ 135.281029] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265
...
[ 135.293613] Call Trace:
[ 135.293613] <IRQ>
[ 135.293613] dump_stack_lvl+0x57/0x7d
[ 135.293613] __might_resched.cold+0x138/0x173
[ 135.293613] ? dev_coredumpm+0xca/0x2e0
[ 135.293613] kmem_cache_alloc_trace+0x189/0x1f0
[ 135.293613] ? devcd_match_failing+0x30/0x30
[ 135.293613] dev_coredumpm+0xca/0x2e0
[ 135.293613] ? devcd_freev+0x10/0x10
[ 135.293613] dev_coredumpv+0x1c/0x20
[ 135.293613] ? devcd_match_failing+0x30/0x30
[ 135.293613] mwifiex_upload_device_dump+0x65/0xb0
[ 135.293613] ? mwifiex_dnld_fw+0x1b0/0x1b0
[ 135.293613] call_timer_fn+0x122/0x3d0
[ 135.293613] ? msleep_interruptible+0xb0/0xb0
[ 135.293613] ? lock_downgrade+0x3c0/0x3c0
[ 135.293613] ? __next_timer_interrupt+0x13c/0x160
[ 135.293613] ? lockdep_hardirqs_on_prepare+0xe/0x220
[ 135.293613] ? mwifiex_dnld_fw+0x1b0/0x1b0
[ 135.293613] __run_timers.part.0+0x3f8/0x540
[ 135.293613] ? call_timer_fn+0x3d0/0x3d0
[ 135.293613] ? arch_restore_msi_irqs+0x10/0x10
[ 135.293613] ? lapic_next_event+0x31/0x40
[ 135.293613] run_timer_softirq+0x4f/0xb0
[ 135.293613] __do_softirq+0x1c2/0x651
...
[ 135.293613] RIP: 0010:default_idle+0xb/0x10
[ 135.293613] RSP: 0018:ffff888006317e68 EFLAGS: 00000246
[ 135.293613] RAX: ffffffff82ad8d10 RBX: ffff888006301cc0 RCX: ffffffff82ac90e1
[ 135.293613] RDX: ffffed100d9ff1b4 RSI: ffffffff831ad140 RDI: ffffffff82ad8f20
[ 135.293613] RBP: 0000000000000003 R08: 0000000000000000 R09: ffff88806cff8d9b
[ 135.293613] R10: ffffed100d9ff1b3 R11: 0000000000000001 R12: ffffffff84593410
[ 135.293613] R13: 0000000000000000 R14: 0000000000000000 R15: 1ffff11000c62fd2
...
[ 135.389205] usb 1-1: == mwifiex dump information to /sys/class/devcoredump end
This patch uses delayed work to replace timer and moves the operations
that may sleep into a delayed work in order to mitigate bugs, it was
tested on Marvell 88W8801 chip whose port is usb and the firmware is
usb8801_uapsta.bin. The following is the result after using delayed
work to replace timer.
[ 134.936453] usb 1-1: == mwifiex dump information to /sys/class/devcoredump start
[ 135.043344] usb 1-1: == mwifiex dump information to /sys/class/devcoredump end
As we can see, there is no bug now.
Fixes: f5ecd02a8b ("mwifiex: device dump support for usb interface")
Reviewed-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/b63b77fc84ed3e8a6bef02378e17c7c71a0bc3be.1654569290.git.duoming@zju.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dev_coredumpv() and dev_coredumpm() could not be used in atomic
context, because they call kvasprintf_const() and kstrdup() with
GFP_KERNEL parameter. The process is shown below:
dev_coredumpv(.., gfp_t gfp)
dev_coredumpm(.., gfp_t gfp)
dev_set_name
kobject_set_name_vargs
kvasprintf_const(GFP_KERNEL, ...); //may sleep
kstrdup(s, GFP_KERNEL); //may sleep
This patch removes gfp_t parameter of dev_coredumpv() and dev_coredumpm()
and changes the gfp_t parameter of kzalloc() in dev_coredumpm() to
GFP_KERNEL in order to show they could not be used in atomic context.
Fixes: 833c95456a ("device coredump: add new device coredump class")
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/df72af3b1862bac7d8e793d1f3931857d3779dfd.1654569290.git.duoming@zju.edu.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have already a helper to get the first child device, use it and
drop custom approach.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220610120219.18988-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are several places in the kernel where this kind of functionality is
being used. Provide a generic helper for such cases.
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220610120219.18988-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>