cgroup: Changes for v6.10
- The locking around cpuset hotplug processing has always been a bit of mess which was worked around by making hotplug processing asynchronous. The asynchronity isn't great and led to other issues. We tried to make the behavior synchronous a while ago but that led to lockdep splats. Waiman took another stab at cleaning up and making it synchronous. The patch has been in -next for well over a month and there haven't been any complaints, so fingers crossed. - Tracepoints added to help understanding rstat lock contentions. - A bunch of minor changes - doc updates, code cleanups and selftests. -----BEGIN PGP SIGNATURE----- iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZkUrFA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGfTyAQCwd0aNQOqaKRhJGtWYShqV/aYzurCy1Z2tB9/3 dkdy9gD7BHNk6kZQEbT97RrHPIduFansLtc76VziACibWBuomgg= =2DNQ -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - The locking around cpuset hotplug processing has always been a bit of mess which was worked around by making hotplug processing asynchronous. The asynchronity isn't great and led to other issues. We tried to make the behavior synchronous a while ago but that led to lockdep splats. Waiman took another stab at cleaning up and making it synchronous. The patch has been in -next for well over a month and there haven't been any complaints, so fingers crossed. - Tracepoints added to help understanding rstat lock contentions. - A bunch of minor changes - doc updates, code cleanups and selftests. * tag 'cgroup-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits) cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints selftests/cgroup: Drop define _GNU_SOURCE docs: cgroup-v1: Update page cache removal functions selftests/cgroup: fix uninitialized variables in test_zswap.c selftests/cgroup: cpu_hogger init: use {} instead of {NULL} selftests/cgroup: fix clang warnings: uninitialized fd variable selftests/cgroup: fix clang build failures for abs() calls cgroup/cpuset: Remove outdated comment in sched_partition_write() cgroup/cpuset: Fix incorrect top_cpuset flags cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice cgroup/cpuset: Statically initialize more members of top_cpuset cgroup: Avoid unnecessary looping in cgroup_no_v1() cgroup, legacy_freezer: update comment for freezer_css_offline() docs, cgroup: add entries for pids to cgroup-v2.rst cgroup: don't call cgroup1_pidlist_destroy_all() for v2 cgroup_freezer: update comment for freezer_css_online() cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release cgroup/rstat: add cgroup_rstat_lock helpers and tracepoints cgroup/pids: Remove superfluous zeroing docs: cgroup-v1: Fix description for css_online ...
This commit is contained in:
commit
de6fef50ea
|
@ -570,7 +570,7 @@ visible to cgroup_for_each_child/descendant_*() iterators. The
|
|||
subsystem may choose to fail creation by returning -errno. This
|
||||
callback can be used to implement reliable state sharing and
|
||||
propagation along the hierarchy. See the comment on
|
||||
cgroup_for_each_descendant_pre() for details.
|
||||
cgroup_for_each_live_descendant_pre() for details.
|
||||
|
||||
``void css_offline(struct cgroup *cgrp);``
|
||||
(cgroup_mutex held by caller)
|
||||
|
|
|
@ -102,7 +102,7 @@ Under below explanation, we assume CONFIG_SWAP=y.
|
|||
The logic is very clear. (About migration, see below)
|
||||
|
||||
Note:
|
||||
__remove_from_page_cache() is called by remove_from_page_cache()
|
||||
__filemap_remove_folio() is called by filemap_remove_folio()
|
||||
and __remove_mapping().
|
||||
|
||||
6. Shmem(tmpfs) Page Cache
|
||||
|
|
|
@ -1058,12 +1058,15 @@ cpufreq governor about the minimum desired frequency which should always be
|
|||
provided by a CPU, as well as the maximum desired frequency, which should not
|
||||
be exceeded by a CPU.
|
||||
|
||||
WARNING: cgroup2 doesn't yet support control of realtime processes and
|
||||
the cpu controller can only be enabled when all RT processes are in
|
||||
the root cgroup. Be aware that system management software may already
|
||||
have placed RT processes into nonroot cgroups during the system boot
|
||||
process, and these processes may need to be moved to the root cgroup
|
||||
before the cpu controller can be enabled.
|
||||
WARNING: cgroup2 doesn't yet support control of realtime processes. For
|
||||
a kernel built with the CONFIG_RT_GROUP_SCHED option enabled for group
|
||||
scheduling of realtime processes, the cpu controller can only be enabled
|
||||
when all RT processes are in the root cgroup. This limitation does
|
||||
not apply if CONFIG_RT_GROUP_SCHED is disabled. Be aware that system
|
||||
management software may already have placed RT processes into nonroot
|
||||
cgroups during the system boot process, and these processes may need
|
||||
to be moved to the root cgroup before the cpu controller can be enabled
|
||||
with a CONFIG_RT_GROUP_SCHED enabled kernel.
|
||||
|
||||
|
||||
CPU Interface Files
|
||||
|
@ -2190,11 +2193,25 @@ PID Interface Files
|
|||
Hard limit of number of processes.
|
||||
|
||||
pids.current
|
||||
A read-only single value file which exists on all cgroups.
|
||||
A read-only single value file which exists on non-root cgroups.
|
||||
|
||||
The number of processes currently in the cgroup and its
|
||||
descendants.
|
||||
|
||||
pids.peak
|
||||
A read-only single value file which exists on non-root cgroups.
|
||||
|
||||
The maximum value that the number of processes in the cgroup and its
|
||||
descendants has ever reached.
|
||||
|
||||
pids.events
|
||||
A read-only flat-keyed file which exists on non-root cgroups. The
|
||||
following entries are defined. Unless specified otherwise, a value
|
||||
change in this file generates a file modified event.
|
||||
|
||||
max
|
||||
Number of times fork failed because limit was hit.
|
||||
|
||||
Organisational operations are not blocked by cgroup policies, so it is
|
||||
possible to have pids.current > pids.max. This can be done by either
|
||||
setting the limit to be smaller than pids.current, or attaching enough
|
||||
|
|
|
@ -690,7 +690,7 @@ static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen)
|
|||
void cgroup_rstat_updated(struct cgroup *cgrp, int cpu);
|
||||
void cgroup_rstat_flush(struct cgroup *cgrp);
|
||||
void cgroup_rstat_flush_hold(struct cgroup *cgrp);
|
||||
void cgroup_rstat_flush_release(void);
|
||||
void cgroup_rstat_flush_release(struct cgroup *cgrp);
|
||||
|
||||
/*
|
||||
* Basic resource stats.
|
||||
|
|
|
@ -70,7 +70,6 @@ extern int cpuset_init(void);
|
|||
extern void cpuset_init_smp(void);
|
||||
extern void cpuset_force_rebuild(void);
|
||||
extern void cpuset_update_active_cpus(void);
|
||||
extern void cpuset_wait_for_hotplug(void);
|
||||
extern void inc_dl_tasks_cs(struct task_struct *task);
|
||||
extern void dec_dl_tasks_cs(struct task_struct *task);
|
||||
extern void cpuset_lock(void);
|
||||
|
@ -185,8 +184,6 @@ static inline void cpuset_update_active_cpus(void)
|
|||
partition_sched_domains(1, NULL, NULL);
|
||||
}
|
||||
|
||||
static inline void cpuset_wait_for_hotplug(void) { }
|
||||
|
||||
static inline void inc_dl_tasks_cs(struct task_struct *task) { }
|
||||
static inline void dec_dl_tasks_cs(struct task_struct *task) { }
|
||||
static inline void cpuset_lock(void) { }
|
||||
|
|
|
@ -204,6 +204,98 @@ DEFINE_EVENT(cgroup_event, cgroup_notify_frozen,
|
|||
TP_ARGS(cgrp, path, val)
|
||||
);
|
||||
|
||||
DECLARE_EVENT_CLASS(cgroup_rstat,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended),
|
||||
|
||||
TP_STRUCT__entry(
|
||||
__field( int, root )
|
||||
__field( int, level )
|
||||
__field( u64, id )
|
||||
__field( int, cpu )
|
||||
__field( bool, contended )
|
||||
),
|
||||
|
||||
TP_fast_assign(
|
||||
__entry->root = cgrp->root->hierarchy_id;
|
||||
__entry->id = cgroup_id(cgrp);
|
||||
__entry->level = cgrp->level;
|
||||
__entry->cpu = cpu;
|
||||
__entry->contended = contended;
|
||||
),
|
||||
|
||||
TP_printk("root=%d id=%llu level=%d cpu=%d lock contended:%d",
|
||||
__entry->root, __entry->id, __entry->level,
|
||||
__entry->cpu, __entry->contended)
|
||||
);
|
||||
|
||||
/* Related to global: cgroup_rstat_lock */
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_lock_contended,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_locked,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_unlock,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
/* Related to per CPU: cgroup_rstat_cpu_lock */
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_lock_contended,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_lock_contended_fastpath,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_locked,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_locked_fastpath,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_unlock,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
DEFINE_EVENT(cgroup_rstat, cgroup_rstat_cpu_unlock_fastpath,
|
||||
|
||||
TP_PROTO(struct cgroup *cgrp, int cpu, bool contended),
|
||||
|
||||
TP_ARGS(cgrp, cpu, contended)
|
||||
);
|
||||
|
||||
#endif /* _TRACE_CGROUP_H */
|
||||
|
||||
/* This part must be outside protection */
|
||||
|
|
|
@ -1335,6 +1335,7 @@ static int __init cgroup_no_v1(char *str)
|
|||
continue;
|
||||
|
||||
cgroup_no_v1_mask |= 1 << i;
|
||||
break;
|
||||
}
|
||||
}
|
||||
return 1;
|
||||
|
|
|
@ -5368,7 +5368,8 @@ static void css_free_rwork_fn(struct work_struct *work)
|
|||
} else {
|
||||
/* cgroup free path */
|
||||
atomic_dec(&cgrp->root->nr_cgrps);
|
||||
cgroup1_pidlist_destroy_all(cgrp);
|
||||
if (!cgroup_on_dfl(cgrp))
|
||||
cgroup1_pidlist_destroy_all(cgrp);
|
||||
cancel_work_sync(&cgrp->release_agent_work);
|
||||
bpf_cgrp_storage_free(cgrp);
|
||||
|
||||
|
|
|
@ -201,6 +201,14 @@ struct cpuset {
|
|||
struct list_head remote_sibling;
|
||||
};
|
||||
|
||||
/*
|
||||
* Legacy hierarchy call to cgroup_transfer_tasks() is handled asynchrously
|
||||
*/
|
||||
struct cpuset_remove_tasks_struct {
|
||||
struct work_struct work;
|
||||
struct cpuset *cs;
|
||||
};
|
||||
|
||||
/*
|
||||
* Exclusive CPUs distributed out to sub-partitions of top_cpuset
|
||||
*/
|
||||
|
@ -360,9 +368,10 @@ static inline void notify_partition_change(struct cpuset *cs, int old_prs)
|
|||
}
|
||||
|
||||
static struct cpuset top_cpuset = {
|
||||
.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
|
||||
(1 << CS_MEM_EXCLUSIVE)),
|
||||
.flags = BIT(CS_ONLINE) | BIT(CS_CPU_EXCLUSIVE) |
|
||||
BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
|
||||
.partition_root_state = PRS_ROOT,
|
||||
.relax_domain_level = -1,
|
||||
.remote_sibling = LIST_HEAD_INIT(top_cpuset.remote_sibling),
|
||||
};
|
||||
|
||||
|
@ -449,12 +458,6 @@ static DEFINE_SPINLOCK(callback_lock);
|
|||
|
||||
static struct workqueue_struct *cpuset_migrate_mm_wq;
|
||||
|
||||
/*
|
||||
* CPU / memory hotplug is handled asynchronously.
|
||||
*/
|
||||
static void cpuset_hotplug_workfn(struct work_struct *work);
|
||||
static DECLARE_WORK(cpuset_hotplug_work, cpuset_hotplug_workfn);
|
||||
|
||||
static DECLARE_WAIT_QUEUE_HEAD(cpuset_attach_wq);
|
||||
|
||||
static inline void check_insane_mems_config(nodemask_t *nodes)
|
||||
|
@ -540,22 +543,10 @@ static void guarantee_online_cpus(struct task_struct *tsk,
|
|||
rcu_read_lock();
|
||||
cs = task_cs(tsk);
|
||||
|
||||
while (!cpumask_intersects(cs->effective_cpus, pmask)) {
|
||||
while (!cpumask_intersects(cs->effective_cpus, pmask))
|
||||
cs = parent_cs(cs);
|
||||
if (unlikely(!cs)) {
|
||||
/*
|
||||
* The top cpuset doesn't have any online cpu as a
|
||||
* consequence of a race between cpuset_hotplug_work
|
||||
* and cpu hotplug notifier. But we know the top
|
||||
* cpuset's effective_cpus is on its way to be
|
||||
* identical to cpu_online_mask.
|
||||
*/
|
||||
goto out_unlock;
|
||||
}
|
||||
}
|
||||
cpumask_and(pmask, pmask, cs->effective_cpus);
|
||||
|
||||
out_unlock:
|
||||
cpumask_and(pmask, pmask, cs->effective_cpus);
|
||||
rcu_read_unlock();
|
||||
}
|
||||
|
||||
|
@ -1217,7 +1208,7 @@ static void rebuild_sched_domains_locked(void)
|
|||
/*
|
||||
* If we have raced with CPU hotplug, return early to avoid
|
||||
* passing doms with offlined cpu to partition_sched_domains().
|
||||
* Anyways, cpuset_hotplug_workfn() will rebuild sched domains.
|
||||
* Anyways, cpuset_handle_hotplug() will rebuild sched domains.
|
||||
*
|
||||
* With no CPUs in any subpartitions, top_cpuset's effective CPUs
|
||||
* should be the same as the active CPUs, so checking only top_cpuset
|
||||
|
@ -1260,12 +1251,17 @@ static void rebuild_sched_domains_locked(void)
|
|||
}
|
||||
#endif /* CONFIG_SMP */
|
||||
|
||||
void rebuild_sched_domains(void)
|
||||
static void rebuild_sched_domains_cpuslocked(void)
|
||||
{
|
||||
cpus_read_lock();
|
||||
mutex_lock(&cpuset_mutex);
|
||||
rebuild_sched_domains_locked();
|
||||
mutex_unlock(&cpuset_mutex);
|
||||
}
|
||||
|
||||
void rebuild_sched_domains(void)
|
||||
{
|
||||
cpus_read_lock();
|
||||
rebuild_sched_domains_cpuslocked();
|
||||
cpus_read_unlock();
|
||||
}
|
||||
|
||||
|
@ -2079,14 +2075,11 @@ write_error:
|
|||
|
||||
/*
|
||||
* For partcmd_update without newmask, it is being called from
|
||||
* cpuset_hotplug_workfn() where cpus_read_lock() wasn't taken.
|
||||
* Update the load balance flag and scheduling domain if
|
||||
* cpus_read_trylock() is successful.
|
||||
* cpuset_handle_hotplug(). Update the load balance flag and
|
||||
* scheduling domain accordingly.
|
||||
*/
|
||||
if ((cmd == partcmd_update) && !newmask && cpus_read_trylock()) {
|
||||
if ((cmd == partcmd_update) && !newmask)
|
||||
update_partition_sd_lb(cs, old_prs);
|
||||
cpus_read_unlock();
|
||||
}
|
||||
|
||||
notify_partition_change(cs, old_prs);
|
||||
return 0;
|
||||
|
@ -3599,8 +3592,8 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
|
|||
* proceeding, so that we don't end up keep removing tasks added
|
||||
* after execution capability is restored.
|
||||
*
|
||||
* cpuset_hotplug_work calls back into cgroup core via
|
||||
* cgroup_transfer_tasks() and waiting for it from a cgroupfs
|
||||
* cpuset_handle_hotplug may call back into cgroup core asynchronously
|
||||
* via cgroup_transfer_tasks() and waiting for it from a cgroupfs
|
||||
* operation like this one can lead to a deadlock through kernfs
|
||||
* active_ref protection. Let's break the protection. Losing the
|
||||
* protection is okay as we check whether @cs is online after
|
||||
|
@ -3609,7 +3602,6 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
|
|||
*/
|
||||
css_get(&cs->css);
|
||||
kernfs_break_active_protection(of->kn);
|
||||
flush_work(&cpuset_hotplug_work);
|
||||
|
||||
cpus_read_lock();
|
||||
mutex_lock(&cpuset_mutex);
|
||||
|
@ -3782,9 +3774,6 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
|
|||
|
||||
buf = strstrip(buf);
|
||||
|
||||
/*
|
||||
* Convert "root" to ENABLED, and convert "member" to DISABLED.
|
||||
*/
|
||||
if (!strcmp(buf, "root"))
|
||||
val = PRS_ROOT;
|
||||
else if (!strcmp(buf, "member"))
|
||||
|
@ -4060,11 +4049,6 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
|
|||
cs->effective_mems = parent->effective_mems;
|
||||
cs->use_parent_ecpus = true;
|
||||
parent->child_ecpus_count++;
|
||||
/*
|
||||
* Clear CS_SCHED_LOAD_BALANCE if parent is isolated
|
||||
*/
|
||||
if (!is_sched_load_balance(parent))
|
||||
clear_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -4318,8 +4302,6 @@ int __init cpuset_init(void)
|
|||
nodes_setall(top_cpuset.effective_mems);
|
||||
|
||||
fmeter_init(&top_cpuset.fmeter);
|
||||
set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
|
||||
top_cpuset.relax_domain_level = -1;
|
||||
INIT_LIST_HEAD(&remote_children);
|
||||
|
||||
BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL));
|
||||
|
@ -4354,6 +4336,16 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
|
|||
}
|
||||
}
|
||||
|
||||
static void cpuset_migrate_tasks_workfn(struct work_struct *work)
|
||||
{
|
||||
struct cpuset_remove_tasks_struct *s;
|
||||
|
||||
s = container_of(work, struct cpuset_remove_tasks_struct, work);
|
||||
remove_tasks_in_empty_cpuset(s->cs);
|
||||
css_put(&s->cs->css);
|
||||
kfree(s);
|
||||
}
|
||||
|
||||
static void
|
||||
hotplug_update_tasks_legacy(struct cpuset *cs,
|
||||
struct cpumask *new_cpus, nodemask_t *new_mems,
|
||||
|
@ -4383,12 +4375,21 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
|
|||
/*
|
||||
* Move tasks to the nearest ancestor with execution resources,
|
||||
* This is full cgroup operation which will also call back into
|
||||
* cpuset. Should be done outside any lock.
|
||||
* cpuset. Execute it asynchronously using workqueue.
|
||||
*/
|
||||
if (is_empty) {
|
||||
mutex_unlock(&cpuset_mutex);
|
||||
remove_tasks_in_empty_cpuset(cs);
|
||||
mutex_lock(&cpuset_mutex);
|
||||
if (is_empty && cs->css.cgroup->nr_populated_csets &&
|
||||
css_tryget_online(&cs->css)) {
|
||||
struct cpuset_remove_tasks_struct *s;
|
||||
|
||||
s = kzalloc(sizeof(*s), GFP_KERNEL);
|
||||
if (WARN_ON_ONCE(!s)) {
|
||||
css_put(&cs->css);
|
||||
return;
|
||||
}
|
||||
|
||||
s->cs = cs;
|
||||
INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
|
||||
schedule_work(&s->work);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -4421,30 +4422,6 @@ void cpuset_force_rebuild(void)
|
|||
force_rebuild = true;
|
||||
}
|
||||
|
||||
/*
|
||||
* Attempt to acquire a cpus_read_lock while a hotplug operation may be in
|
||||
* progress.
|
||||
* Return: true if successful, false otherwise
|
||||
*
|
||||
* To avoid circular lock dependency between cpuset_mutex and cpus_read_lock,
|
||||
* cpus_read_trylock() is used here to acquire the lock.
|
||||
*/
|
||||
static bool cpuset_hotplug_cpus_read_trylock(void)
|
||||
{
|
||||
int retries = 0;
|
||||
|
||||
while (!cpus_read_trylock()) {
|
||||
/*
|
||||
* CPU hotplug still in progress. Retry 5 times
|
||||
* with a 10ms wait before bailing out.
|
||||
*/
|
||||
if (++retries > 5)
|
||||
return false;
|
||||
msleep(10);
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* cpuset_hotplug_update_tasks - update tasks in a cpuset for hotunplug
|
||||
* @cs: cpuset in interest
|
||||
|
@ -4493,13 +4470,11 @@ retry:
|
|||
compute_partition_effective_cpumask(cs, &new_cpus);
|
||||
|
||||
if (remote && cpumask_empty(&new_cpus) &&
|
||||
partition_is_populated(cs, NULL) &&
|
||||
cpuset_hotplug_cpus_read_trylock()) {
|
||||
partition_is_populated(cs, NULL)) {
|
||||
remote_partition_disable(cs, tmp);
|
||||
compute_effective_cpumask(&new_cpus, cs, parent);
|
||||
remote = false;
|
||||
cpuset_force_rebuild();
|
||||
cpus_read_unlock();
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -4519,18 +4494,8 @@ retry:
|
|||
else if (is_partition_valid(parent) && is_partition_invalid(cs))
|
||||
partcmd = partcmd_update;
|
||||
|
||||
/*
|
||||
* cpus_read_lock needs to be held before calling
|
||||
* update_parent_effective_cpumask(). To avoid circular lock
|
||||
* dependency between cpuset_mutex and cpus_read_lock,
|
||||
* cpus_read_trylock() is used here to acquire the lock.
|
||||
*/
|
||||
if (partcmd >= 0) {
|
||||
if (!cpuset_hotplug_cpus_read_trylock())
|
||||
goto update_tasks;
|
||||
|
||||
update_parent_effective_cpumask(cs, partcmd, NULL, tmp);
|
||||
cpus_read_unlock();
|
||||
if ((partcmd == partcmd_invalidate) || is_partition_valid(cs)) {
|
||||
compute_partition_effective_cpumask(cs, &new_cpus);
|
||||
cpuset_force_rebuild();
|
||||
|
@ -4558,8 +4523,7 @@ unlock:
|
|||
}
|
||||
|
||||
/**
|
||||
* cpuset_hotplug_workfn - handle CPU/memory hotunplug for a cpuset
|
||||
* @work: unused
|
||||
* cpuset_handle_hotplug - handle CPU/memory hot{,un}plug for a cpuset
|
||||
*
|
||||
* This function is called after either CPU or memory configuration has
|
||||
* changed and updates cpuset accordingly. The top_cpuset is always
|
||||
|
@ -4573,8 +4537,10 @@ unlock:
|
|||
*
|
||||
* Note that CPU offlining during suspend is ignored. We don't modify
|
||||
* cpusets across suspend/resume cycles at all.
|
||||
*
|
||||
* CPU / memory hotplug is handled synchronously.
|
||||
*/
|
||||
static void cpuset_hotplug_workfn(struct work_struct *work)
|
||||
static void cpuset_handle_hotplug(void)
|
||||
{
|
||||
static cpumask_t new_cpus;
|
||||
static nodemask_t new_mems;
|
||||
|
@ -4585,6 +4551,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
|
|||
if (on_dfl && !alloc_cpumasks(NULL, &tmp))
|
||||
ptmp = &tmp;
|
||||
|
||||
lockdep_assert_cpus_held();
|
||||
mutex_lock(&cpuset_mutex);
|
||||
|
||||
/* fetch the available cpus/mems and find out which changed how */
|
||||
|
@ -4666,7 +4633,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
|
|||
/* rebuild sched domains if cpus_allowed has changed */
|
||||
if (cpus_updated || force_rebuild) {
|
||||
force_rebuild = false;
|
||||
rebuild_sched_domains();
|
||||
rebuild_sched_domains_cpuslocked();
|
||||
}
|
||||
|
||||
free_cpumasks(NULL, ptmp);
|
||||
|
@ -4679,12 +4646,7 @@ void cpuset_update_active_cpus(void)
|
|||
* inside cgroup synchronization. Bounce actual hotplug processing
|
||||
* to a work item to avoid reverse locking order.
|
||||
*/
|
||||
schedule_work(&cpuset_hotplug_work);
|
||||
}
|
||||
|
||||
void cpuset_wait_for_hotplug(void)
|
||||
{
|
||||
flush_work(&cpuset_hotplug_work);
|
||||
cpuset_handle_hotplug();
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -4695,7 +4657,7 @@ void cpuset_wait_for_hotplug(void)
|
|||
static int cpuset_track_online_nodes(struct notifier_block *self,
|
||||
unsigned long action, void *arg)
|
||||
{
|
||||
schedule_work(&cpuset_hotplug_work);
|
||||
cpuset_handle_hotplug();
|
||||
return NOTIFY_OK;
|
||||
}
|
||||
|
||||
|
|
|
@ -106,8 +106,7 @@ freezer_css_alloc(struct cgroup_subsys_state *parent_css)
|
|||
* @css: css being created
|
||||
*
|
||||
* We're committing to creation of @css. Mark it online and inherit
|
||||
* parent's freezing state while holding both parent's and our
|
||||
* freezer->lock.
|
||||
* parent's freezing state while holding cpus read lock and freezer_mutex.
|
||||
*/
|
||||
static int freezer_css_online(struct cgroup_subsys_state *css)
|
||||
{
|
||||
|
@ -133,7 +132,7 @@ static int freezer_css_online(struct cgroup_subsys_state *css)
|
|||
* freezer_css_offline - initiate destruction of a freezer css
|
||||
* @css: css being destroyed
|
||||
*
|
||||
* @css is going away. Mark it dead and decrement system_freezing_count if
|
||||
* @css is going away. Mark it dead and decrement freezer_active if
|
||||
* it was holding one.
|
||||
*/
|
||||
static void freezer_css_offline(struct cgroup_subsys_state *css)
|
||||
|
|
|
@ -75,9 +75,7 @@ pids_css_alloc(struct cgroup_subsys_state *parent)
|
|||
if (!pids)
|
||||
return ERR_PTR(-ENOMEM);
|
||||
|
||||
atomic64_set(&pids->counter, 0);
|
||||
atomic64_set(&pids->limit, PIDS_MAX);
|
||||
atomic64_set(&pids->events_limit, 0);
|
||||
return &pids->css;
|
||||
}
|
||||
|
||||
|
|
|
@ -7,6 +7,8 @@
|
|||
#include <linux/btf.h>
|
||||
#include <linux/btf_ids.h>
|
||||
|
||||
#include <trace/events/cgroup.h>
|
||||
|
||||
static DEFINE_SPINLOCK(cgroup_rstat_lock);
|
||||
static DEFINE_PER_CPU(raw_spinlock_t, cgroup_rstat_cpu_lock);
|
||||
|
||||
|
@ -17,6 +19,60 @@ static struct cgroup_rstat_cpu *cgroup_rstat_cpu(struct cgroup *cgrp, int cpu)
|
|||
return per_cpu_ptr(cgrp->rstat_cpu, cpu);
|
||||
}
|
||||
|
||||
/*
|
||||
* Helper functions for rstat per CPU lock (cgroup_rstat_cpu_lock).
|
||||
*
|
||||
* This makes it easier to diagnose locking issues and contention in
|
||||
* production environments. The parameter @fast_path determine the
|
||||
* tracepoints being added, allowing us to diagnose "flush" related
|
||||
* operations without handling high-frequency fast-path "update" events.
|
||||
*/
|
||||
static __always_inline
|
||||
unsigned long _cgroup_rstat_cpu_lock(raw_spinlock_t *cpu_lock, int cpu,
|
||||
struct cgroup *cgrp, const bool fast_path)
|
||||
{
|
||||
unsigned long flags;
|
||||
bool contended;
|
||||
|
||||
/*
|
||||
* The _irqsave() is needed because cgroup_rstat_lock is
|
||||
* spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring
|
||||
* this lock with the _irq() suffix only disables interrupts on
|
||||
* a non-PREEMPT_RT kernel. The raw_spinlock_t below disables
|
||||
* interrupts on both configurations. The _irqsave() ensures
|
||||
* that interrupts are always disabled and later restored.
|
||||
*/
|
||||
contended = !raw_spin_trylock_irqsave(cpu_lock, flags);
|
||||
if (contended) {
|
||||
if (fast_path)
|
||||
trace_cgroup_rstat_cpu_lock_contended_fastpath(cgrp, cpu, contended);
|
||||
else
|
||||
trace_cgroup_rstat_cpu_lock_contended(cgrp, cpu, contended);
|
||||
|
||||
raw_spin_lock_irqsave(cpu_lock, flags);
|
||||
}
|
||||
|
||||
if (fast_path)
|
||||
trace_cgroup_rstat_cpu_locked_fastpath(cgrp, cpu, contended);
|
||||
else
|
||||
trace_cgroup_rstat_cpu_locked(cgrp, cpu, contended);
|
||||
|
||||
return flags;
|
||||
}
|
||||
|
||||
static __always_inline
|
||||
void _cgroup_rstat_cpu_unlock(raw_spinlock_t *cpu_lock, int cpu,
|
||||
struct cgroup *cgrp, unsigned long flags,
|
||||
const bool fast_path)
|
||||
{
|
||||
if (fast_path)
|
||||
trace_cgroup_rstat_cpu_unlock_fastpath(cgrp, cpu, false);
|
||||
else
|
||||
trace_cgroup_rstat_cpu_unlock(cgrp, cpu, false);
|
||||
|
||||
raw_spin_unlock_irqrestore(cpu_lock, flags);
|
||||
}
|
||||
|
||||
/**
|
||||
* cgroup_rstat_updated - keep track of updated rstat_cpu
|
||||
* @cgrp: target cgroup
|
||||
|
@ -42,7 +98,7 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu)
|
|||
if (data_race(cgroup_rstat_cpu(cgrp, cpu)->updated_next))
|
||||
return;
|
||||
|
||||
raw_spin_lock_irqsave(cpu_lock, flags);
|
||||
flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, cgrp, true);
|
||||
|
||||
/* put @cgrp and all ancestors on the corresponding updated lists */
|
||||
while (true) {
|
||||
|
@ -70,7 +126,7 @@ __bpf_kfunc void cgroup_rstat_updated(struct cgroup *cgrp, int cpu)
|
|||
cgrp = parent;
|
||||
}
|
||||
|
||||
raw_spin_unlock_irqrestore(cpu_lock, flags);
|
||||
_cgroup_rstat_cpu_unlock(cpu_lock, cpu, cgrp, flags, true);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -151,15 +207,7 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
|
|||
struct cgroup *head = NULL, *parent, *child;
|
||||
unsigned long flags;
|
||||
|
||||
/*
|
||||
* The _irqsave() is needed because cgroup_rstat_lock is
|
||||
* spinlock_t which is a sleeping lock on PREEMPT_RT. Acquiring
|
||||
* this lock with the _irq() suffix only disables interrupts on
|
||||
* a non-PREEMPT_RT kernel. The raw_spinlock_t below disables
|
||||
* interrupts on both configurations. The _irqsave() ensures
|
||||
* that interrupts are always disabled and later restored.
|
||||
*/
|
||||
raw_spin_lock_irqsave(cpu_lock, flags);
|
||||
flags = _cgroup_rstat_cpu_lock(cpu_lock, cpu, root, false);
|
||||
|
||||
/* Return NULL if this subtree is not on-list */
|
||||
if (!rstatc->updated_next)
|
||||
|
@ -196,7 +244,7 @@ static struct cgroup *cgroup_rstat_updated_list(struct cgroup *root, int cpu)
|
|||
if (child != root)
|
||||
head = cgroup_rstat_push_children(head, child, cpu);
|
||||
unlock_ret:
|
||||
raw_spin_unlock_irqrestore(cpu_lock, flags);
|
||||
_cgroup_rstat_cpu_unlock(cpu_lock, cpu, root, flags, false);
|
||||
return head;
|
||||
}
|
||||
|
||||
|
@ -222,6 +270,35 @@ __weak noinline void bpf_rstat_flush(struct cgroup *cgrp,
|
|||
|
||||
__bpf_hook_end();
|
||||
|
||||
/*
|
||||
* Helper functions for locking cgroup_rstat_lock.
|
||||
*
|
||||
* This makes it easier to diagnose locking issues and contention in
|
||||
* production environments. The parameter @cpu_in_loop indicate lock
|
||||
* was released and re-taken when collection data from the CPUs. The
|
||||
* value -1 is used when obtaining the main lock else this is the CPU
|
||||
* number processed last.
|
||||
*/
|
||||
static inline void __cgroup_rstat_lock(struct cgroup *cgrp, int cpu_in_loop)
|
||||
__acquires(&cgroup_rstat_lock)
|
||||
{
|
||||
bool contended;
|
||||
|
||||
contended = !spin_trylock_irq(&cgroup_rstat_lock);
|
||||
if (contended) {
|
||||
trace_cgroup_rstat_lock_contended(cgrp, cpu_in_loop, contended);
|
||||
spin_lock_irq(&cgroup_rstat_lock);
|
||||
}
|
||||
trace_cgroup_rstat_locked(cgrp, cpu_in_loop, contended);
|
||||
}
|
||||
|
||||
static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop)
|
||||
__releases(&cgroup_rstat_lock)
|
||||
{
|
||||
trace_cgroup_rstat_unlock(cgrp, cpu_in_loop, false);
|
||||
spin_unlock_irq(&cgroup_rstat_lock);
|
||||
}
|
||||
|
||||
/* see cgroup_rstat_flush() */
|
||||
static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
|
||||
__releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock)
|
||||
|
@ -248,10 +325,10 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp)
|
|||
|
||||
/* play nice and yield if necessary */
|
||||
if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) {
|
||||
spin_unlock_irq(&cgroup_rstat_lock);
|
||||
__cgroup_rstat_unlock(cgrp, cpu);
|
||||
if (!cond_resched())
|
||||
cpu_relax();
|
||||
spin_lock_irq(&cgroup_rstat_lock);
|
||||
__cgroup_rstat_lock(cgrp, cpu);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -273,9 +350,9 @@ __bpf_kfunc void cgroup_rstat_flush(struct cgroup *cgrp)
|
|||
{
|
||||
might_sleep();
|
||||
|
||||
spin_lock_irq(&cgroup_rstat_lock);
|
||||
__cgroup_rstat_lock(cgrp, -1);
|
||||
cgroup_rstat_flush_locked(cgrp);
|
||||
spin_unlock_irq(&cgroup_rstat_lock);
|
||||
__cgroup_rstat_unlock(cgrp, -1);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -291,17 +368,18 @@ void cgroup_rstat_flush_hold(struct cgroup *cgrp)
|
|||
__acquires(&cgroup_rstat_lock)
|
||||
{
|
||||
might_sleep();
|
||||
spin_lock_irq(&cgroup_rstat_lock);
|
||||
__cgroup_rstat_lock(cgrp, -1);
|
||||
cgroup_rstat_flush_locked(cgrp);
|
||||
}
|
||||
|
||||
/**
|
||||
* cgroup_rstat_flush_release - release cgroup_rstat_flush_hold()
|
||||
* @cgrp: cgroup used by tracepoint
|
||||
*/
|
||||
void cgroup_rstat_flush_release(void)
|
||||
void cgroup_rstat_flush_release(struct cgroup *cgrp)
|
||||
__releases(&cgroup_rstat_lock)
|
||||
{
|
||||
spin_unlock_irq(&cgroup_rstat_lock);
|
||||
__cgroup_rstat_unlock(cgrp, -1);
|
||||
}
|
||||
|
||||
int cgroup_rstat_init(struct cgroup *cgrp)
|
||||
|
@ -533,7 +611,7 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq)
|
|||
#ifdef CONFIG_SCHED_CORE
|
||||
forceidle_time = cgrp->bstat.forceidle_sum;
|
||||
#endif
|
||||
cgroup_rstat_flush_release();
|
||||
cgroup_rstat_flush_release(cgrp);
|
||||
} else {
|
||||
root_cgroup_cputime(&bstat);
|
||||
usage = bstat.cputime.sum_exec_runtime;
|
||||
|
|
48
kernel/cpu.c
48
kernel/cpu.c
|
@ -1208,52 +1208,6 @@ void __init cpuhp_threads_init(void)
|
|||
kthread_unpark(this_cpu_read(cpuhp_state.thread));
|
||||
}
|
||||
|
||||
/*
|
||||
*
|
||||
* Serialize hotplug trainwrecks outside of the cpu_hotplug_lock
|
||||
* protected region.
|
||||
*
|
||||
* The operation is still serialized against concurrent CPU hotplug via
|
||||
* cpu_add_remove_lock, i.e. CPU map protection. But it is _not_
|
||||
* serialized against other hotplug related activity like adding or
|
||||
* removing of state callbacks and state instances, which invoke either the
|
||||
* startup or the teardown callback of the affected state.
|
||||
*
|
||||
* This is required for subsystems which are unfixable vs. CPU hotplug and
|
||||
* evade lock inversion problems by scheduling work which has to be
|
||||
* completed _before_ cpu_up()/_cpu_down() returns.
|
||||
*
|
||||
* Don't even think about adding anything to this for any new code or even
|
||||
* drivers. It's only purpose is to keep existing lock order trainwrecks
|
||||
* working.
|
||||
*
|
||||
* For cpu_down() there might be valid reasons to finish cleanups which are
|
||||
* not required to be done under cpu_hotplug_lock, but that's a different
|
||||
* story and would be not invoked via this.
|
||||
*/
|
||||
static void cpu_up_down_serialize_trainwrecks(bool tasks_frozen)
|
||||
{
|
||||
/*
|
||||
* cpusets delegate hotplug operations to a worker to "solve" the
|
||||
* lock order problems. Wait for the worker, but only if tasks are
|
||||
* _not_ frozen (suspend, hibernate) as that would wait forever.
|
||||
*
|
||||
* The wait is required because otherwise the hotplug operation
|
||||
* returns with inconsistent state, which could even be observed in
|
||||
* user space when a new CPU is brought up. The CPU plug uevent
|
||||
* would be delivered and user space reacting on it would fail to
|
||||
* move tasks to the newly plugged CPU up to the point where the
|
||||
* work has finished because up to that point the newly plugged CPU
|
||||
* is not assignable in cpusets/cgroups. On unplug that's not
|
||||
* necessarily a visible issue, but it is still inconsistent state,
|
||||
* which is the real problem which needs to be "fixed". This can't
|
||||
* prevent the transient state between scheduling the work and
|
||||
* returning from waiting for it.
|
||||
*/
|
||||
if (!tasks_frozen)
|
||||
cpuset_wait_for_hotplug();
|
||||
}
|
||||
|
||||
#ifdef CONFIG_HOTPLUG_CPU
|
||||
#ifndef arch_clear_mm_cpumask_cpu
|
||||
#define arch_clear_mm_cpumask_cpu(cpu, mm) cpumask_clear_cpu(cpu, mm_cpumask(mm))
|
||||
|
@ -1494,7 +1448,6 @@ out:
|
|||
*/
|
||||
lockup_detector_cleanup();
|
||||
arch_smt_update();
|
||||
cpu_up_down_serialize_trainwrecks(tasks_frozen);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
@ -1728,7 +1681,6 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
|
|||
out:
|
||||
cpus_write_unlock();
|
||||
arch_smt_update();
|
||||
cpu_up_down_serialize_trainwrecks(tasks_frozen);
|
||||
return ret;
|
||||
}
|
||||
|
||||
|
|
|
@ -194,8 +194,6 @@ void thaw_processes(void)
|
|||
__usermodehelper_set_disable_depth(UMH_FREEZING);
|
||||
thaw_workqueues();
|
||||
|
||||
cpuset_wait_for_hotplug();
|
||||
|
||||
read_lock(&tasklist_lock);
|
||||
for_each_process_thread(g, p) {
|
||||
/* No other threads should have PF_SUSPEND_TASK set */
|
||||
|
|
|
@ -4,7 +4,7 @@ CFLAGS += -Wall -pthread
|
|||
all: ${HELPER_PROGS}
|
||||
|
||||
TEST_FILES := with_stress.sh
|
||||
TEST_PROGS := test_stress.sh test_cpuset_prs.sh
|
||||
TEST_PROGS := test_stress.sh test_cpuset_prs.sh test_cpuset_v1_hp.sh
|
||||
TEST_GEN_FILES := wait_inotify
|
||||
TEST_GEN_PROGS = test_memcontrol
|
||||
TEST_GEN_PROGS += test_kmem
|
||||
|
|
|
@ -1,7 +1,4 @@
|
|||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
|
||||
#define _GNU_SOURCE
|
||||
|
||||
#include <errno.h>
|
||||
#include <fcntl.h>
|
||||
#include <linux/limits.h>
|
||||
|
@ -195,10 +192,10 @@ int cg_write_numeric(const char *cgroup, const char *control, long value)
|
|||
return cg_write(cgroup, control, buf);
|
||||
}
|
||||
|
||||
int cg_find_unified_root(char *root, size_t len)
|
||||
int cg_find_unified_root(char *root, size_t len, bool *nsdelegate)
|
||||
{
|
||||
char buf[10 * PAGE_SIZE];
|
||||
char *fs, *mount, *type;
|
||||
char *fs, *mount, *type, *options;
|
||||
const char delim[] = "\n\t ";
|
||||
|
||||
if (read_text("/proc/self/mounts", buf, sizeof(buf)) <= 0)
|
||||
|
@ -211,12 +208,14 @@ int cg_find_unified_root(char *root, size_t len)
|
|||
for (fs = strtok(buf, delim); fs; fs = strtok(NULL, delim)) {
|
||||
mount = strtok(NULL, delim);
|
||||
type = strtok(NULL, delim);
|
||||
strtok(NULL, delim);
|
||||
options = strtok(NULL, delim);
|
||||
strtok(NULL, delim);
|
||||
strtok(NULL, delim);
|
||||
|
||||
if (strcmp(type, "cgroup2") == 0) {
|
||||
strncpy(root, mount, len);
|
||||
if (nsdelegate)
|
||||
*nsdelegate = !!strstr(options, "nsdelegate");
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
|
|
@ -18,10 +18,10 @@
|
|||
*/
|
||||
static inline int values_close(long a, long b, int err)
|
||||
{
|
||||
return abs(a - b) <= (a + b) / 100 * err;
|
||||
return labs(a - b) <= (a + b) / 100 * err;
|
||||
}
|
||||
|
||||
extern int cg_find_unified_root(char *root, size_t len);
|
||||
extern int cg_find_unified_root(char *root, size_t len, bool *nsdelegate);
|
||||
extern char *cg_name(const char *root, const char *name);
|
||||
extern char *cg_name_indexed(const char *root, const char *name, int index);
|
||||
extern char *cg_control(const char *cgroup, const char *control);
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
|
||||
#define _GNU_SOURCE
|
||||
#include <linux/limits.h>
|
||||
#include <linux/sched.h>
|
||||
#include <sys/types.h>
|
||||
|
@ -18,6 +16,8 @@
|
|||
#include "../kselftest.h"
|
||||
#include "cgroup_util.h"
|
||||
|
||||
static bool nsdelegate;
|
||||
|
||||
static int touch_anon(char *buf, size_t size)
|
||||
{
|
||||
int fd;
|
||||
|
@ -775,6 +775,9 @@ static int test_cgcore_lesser_ns_open(const char *root)
|
|||
pid_t pid;
|
||||
int status;
|
||||
|
||||
if (!nsdelegate)
|
||||
return KSFT_SKIP;
|
||||
|
||||
cg_test_a = cg_name(root, "cg_test_a");
|
||||
cg_test_b = cg_name(root, "cg_test_b");
|
||||
|
||||
|
@ -862,7 +865,7 @@ int main(int argc, char *argv[])
|
|||
char root[PATH_MAX];
|
||||
int i, ret = EXIT_SUCCESS;
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), &nsdelegate))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
|
||||
if (cg_read_strstr(root, "cgroup.subtree_control", "memory"))
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
// SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
#define _GNU_SOURCE
|
||||
#include <linux/limits.h>
|
||||
#include <sys/sysinfo.h>
|
||||
#include <sys/wait.h>
|
||||
|
@ -237,7 +235,7 @@ run_cpucg_weight_test(
|
|||
{
|
||||
int ret = KSFT_FAIL, i;
|
||||
char *parent = NULL;
|
||||
struct cpu_hogger children[3] = {NULL};
|
||||
struct cpu_hogger children[3] = {};
|
||||
|
||||
parent = cg_name(root, "cpucg_test_0");
|
||||
if (!parent)
|
||||
|
@ -408,7 +406,7 @@ run_cpucg_nested_weight_test(const char *root, bool overprovisioned)
|
|||
{
|
||||
int ret = KSFT_FAIL, i;
|
||||
char *parent = NULL, *child = NULL;
|
||||
struct cpu_hogger leaf[3] = {NULL};
|
||||
struct cpu_hogger leaf[3] = {};
|
||||
long nested_leaf_usage, child_usage;
|
||||
int nprocs = get_nprocs();
|
||||
|
||||
|
@ -700,7 +698,7 @@ int main(int argc, char *argv[])
|
|||
char root[PATH_MAX];
|
||||
int i, ret = EXIT_SUCCESS;
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), NULL))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
|
||||
if (cg_read_strstr(root, "cgroup.subtree_control", "cpu"))
|
||||
|
|
|
@ -249,7 +249,7 @@ int main(int argc, char *argv[])
|
|||
char root[PATH_MAX];
|
||||
int i, ret = EXIT_SUCCESS;
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), NULL))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
|
||||
if (cg_read_strstr(root, "cgroup.subtree_control", "cpuset"))
|
||||
|
|
|
@ -0,0 +1,46 @@
|
|||
#!/bin/sh
|
||||
# SPDX-License-Identifier: GPL-2.0
|
||||
#
|
||||
# Test the special cpuset v1 hotplug case where a cpuset become empty of
|
||||
# CPUs will force migration of tasks out to an ancestor.
|
||||
#
|
||||
|
||||
skip_test() {
|
||||
echo "$1"
|
||||
echo "Test SKIPPED"
|
||||
exit 4 # ksft_skip
|
||||
}
|
||||
|
||||
[[ $(id -u) -eq 0 ]] || skip_test "Test must be run as root!"
|
||||
|
||||
# Find cpuset v1 mount point
|
||||
CPUSET=$(mount -t cgroup | grep cpuset | head -1 | awk -e '{print $3}')
|
||||
[[ -n "$CPUSET" ]] || skip_test "cpuset v1 mount point not found!"
|
||||
|
||||
#
|
||||
# Create a test cpuset, put a CPU and a task there and offline that CPU
|
||||
#
|
||||
TDIR=test$$
|
||||
[[ -d $CPUSET/$TDIR ]] || mkdir $CPUSET/$TDIR
|
||||
echo 1 > $CPUSET/$TDIR/cpuset.cpus
|
||||
echo 0 > $CPUSET/$TDIR/cpuset.mems
|
||||
sleep 10&
|
||||
TASK=$!
|
||||
echo $TASK > $CPUSET/$TDIR/tasks
|
||||
NEWCS=$(cat /proc/$TASK/cpuset)
|
||||
[[ $NEWCS != "/$TDIR" ]] && {
|
||||
echo "Unexpected cpuset $NEWCS, test FAILED!"
|
||||
exit 1
|
||||
}
|
||||
|
||||
echo 0 > /sys/devices/system/cpu/cpu1/online
|
||||
sleep 0.5
|
||||
echo 1 > /sys/devices/system/cpu/cpu1/online
|
||||
NEWCS=$(cat /proc/$TASK/cpuset)
|
||||
rmdir $CPUSET/$TDIR
|
||||
[[ $NEWCS != "/" ]] && {
|
||||
echo "cpuset $NEWCS, test FAILED!"
|
||||
exit 1
|
||||
}
|
||||
echo "Test PASSED"
|
||||
exit 0
|
|
@ -827,7 +827,7 @@ int main(int argc, char *argv[])
|
|||
char root[PATH_MAX];
|
||||
int i, ret = EXIT_SUCCESS;
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), NULL))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
for (i = 0; i < ARRAY_SIZE(tests); i++) {
|
||||
switch (tests[i].fn(root)) {
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
// SPDX-License-Identifier: GPL-2.0
|
||||
#define _GNU_SOURCE
|
||||
|
||||
#include <linux/limits.h>
|
||||
#include <sys/mman.h>
|
||||
#include <stdio.h>
|
||||
|
@ -214,7 +212,7 @@ int main(int argc, char **argv)
|
|||
return ret;
|
||||
}
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), NULL))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
|
||||
switch (test_hugetlb_memcg(root)) {
|
||||
|
|
|
@ -276,7 +276,7 @@ int main(int argc, char *argv[])
|
|||
char root[PATH_MAX];
|
||||
int i, ret = EXIT_SUCCESS;
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), NULL))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
for (i = 0; i < ARRAY_SIZE(tests); i++) {
|
||||
switch (tests[i].fn(root)) {
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
// SPDX-License-Identifier: GPL-2.0
|
||||
#define _GNU_SOURCE
|
||||
|
||||
#include <linux/limits.h>
|
||||
#include <fcntl.h>
|
||||
#include <stdio.h>
|
||||
|
@ -192,7 +190,7 @@ static int test_kmem_memcg_deletion(const char *root)
|
|||
goto cleanup;
|
||||
|
||||
sum = anon + file + kernel + sock;
|
||||
if (abs(sum - current) < MAX_VMSTAT_ERROR) {
|
||||
if (labs(sum - current) < MAX_VMSTAT_ERROR) {
|
||||
ret = KSFT_PASS;
|
||||
} else {
|
||||
printf("memory.current = %ld\n", current);
|
||||
|
@ -380,7 +378,7 @@ static int test_percpu_basic(const char *root)
|
|||
current = cg_read_long(parent, "memory.current");
|
||||
percpu = cg_read_key_long(parent, "memory.stat", "percpu ");
|
||||
|
||||
if (current > 0 && percpu > 0 && abs(current - percpu) <
|
||||
if (current > 0 && percpu > 0 && labs(current - percpu) <
|
||||
MAX_VMSTAT_ERROR)
|
||||
ret = KSFT_PASS;
|
||||
else
|
||||
|
@ -420,7 +418,7 @@ int main(int argc, char **argv)
|
|||
char root[PATH_MAX];
|
||||
int i, ret = EXIT_SUCCESS;
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), NULL))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
|
||||
/*
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#define _GNU_SOURCE
|
||||
|
||||
#include <linux/limits.h>
|
||||
#include <linux/oom.h>
|
||||
#include <fcntl.h>
|
||||
|
@ -716,7 +714,9 @@ static bool reclaim_until(const char *memcg, long goal)
|
|||
*/
|
||||
static int test_memcg_reclaim(const char *root)
|
||||
{
|
||||
int ret = KSFT_FAIL, fd, retries;
|
||||
int ret = KSFT_FAIL;
|
||||
int fd = -1;
|
||||
int retries;
|
||||
char *memcg;
|
||||
long current, expected_usage;
|
||||
|
||||
|
@ -1314,7 +1314,7 @@ int main(int argc, char **argv)
|
|||
char root[PATH_MAX];
|
||||
int i, proc_status, ret = EXIT_SUCCESS;
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), NULL))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
|
||||
/*
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
// SPDX-License-Identifier: GPL-2.0
|
||||
#define _GNU_SOURCE
|
||||
|
||||
#include <linux/limits.h>
|
||||
#include <unistd.h>
|
||||
#include <stdio.h>
|
||||
|
@ -257,7 +255,7 @@ static int test_no_invasive_cgroup_shrink(const char *root)
|
|||
{
|
||||
int ret = KSFT_FAIL;
|
||||
size_t control_allocation_size = MB(10);
|
||||
char *control_allocation, *wb_group = NULL, *control_group = NULL;
|
||||
char *control_allocation = NULL, *wb_group = NULL, *control_group = NULL;
|
||||
|
||||
wb_group = setup_test_group_1M(root, "per_memcg_wb_test1");
|
||||
if (!wb_group)
|
||||
|
@ -342,7 +340,7 @@ static int test_no_kmem_bypass(const char *root)
|
|||
struct sysinfo sys_info;
|
||||
int ret = KSFT_FAIL;
|
||||
int child_status;
|
||||
char *test_group;
|
||||
char *test_group = NULL;
|
||||
pid_t child_pid;
|
||||
|
||||
/* Read sys info and compute test values accordingly */
|
||||
|
@ -440,7 +438,7 @@ int main(int argc, char **argv)
|
|||
char root[PATH_MAX];
|
||||
int i, ret = EXIT_SUCCESS;
|
||||
|
||||
if (cg_find_unified_root(root, sizeof(root)))
|
||||
if (cg_find_unified_root(root, sizeof(root), NULL))
|
||||
ksft_exit_skip("cgroup v2 isn't mounted\n");
|
||||
|
||||
if (!zswap_configured())
|
||||
|
|
Loading…
Reference in New Issue