Merge branch 'Wait for busy refill_work when destroying bpf memory allocator'

Hou Tao says:

====================

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset aims to fix one problem of bpf memory allocator destruction
when there is PREEMPT_RT kernel or kernel with arch_irq_work_has_interrupt()
being false (e.g. 1-cpu arm32 host or mips). The root cause is that
there may be busy refill_work when the allocator is destroying and it
may incur oops or other problems as shown in patch #1. Patch #1 fixes
the problem by waiting for the completion of irq work during destroying
and patch #2 is just a clean-up patch based on patch #1. Please see
individual patches for more details.

Comments are always welcome.

Change Log:
v2:
  * patch 1: fix typos and add notes about the overhead of irq_work_sync()
  * patch 1 & 2: add Acked-by tags from sdf@google.com

v1: https://lore.kernel.org/bpf/20221019115539.983394-1-houtao@huaweicloud.com/T/#t
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This commit is contained in:
Alexei Starovoitov 2022-10-21 19:17:38 -07:00
commit bed54aeb6a
1 changed files with 16 additions and 2 deletions

View File

@ -418,14 +418,17 @@ static void drain_mem_cache(struct bpf_mem_cache *c)
/* No progs are using this bpf_mem_cache, but htab_map_free() called /* No progs are using this bpf_mem_cache, but htab_map_free() called
* bpf_mem_cache_free() for all remaining elements and they can be in * bpf_mem_cache_free() for all remaining elements and they can be in
* free_by_rcu or in waiting_for_gp lists, so drain those lists now. * free_by_rcu or in waiting_for_gp lists, so drain those lists now.
*
* Except for waiting_for_gp list, there are no concurrent operations
* on these lists, so it is safe to use __llist_del_all().
*/ */
llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu)) llist_for_each_safe(llnode, t, __llist_del_all(&c->free_by_rcu))
free_one(c, llnode); free_one(c, llnode);
llist_for_each_safe(llnode, t, llist_del_all(&c->waiting_for_gp)) llist_for_each_safe(llnode, t, llist_del_all(&c->waiting_for_gp))
free_one(c, llnode); free_one(c, llnode);
llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist)) llist_for_each_safe(llnode, t, __llist_del_all(&c->free_llist))
free_one(c, llnode); free_one(c, llnode);
llist_for_each_safe(llnode, t, llist_del_all(&c->free_llist_extra)) llist_for_each_safe(llnode, t, __llist_del_all(&c->free_llist_extra))
free_one(c, llnode); free_one(c, llnode);
} }
@ -493,6 +496,16 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
rcu_in_progress = 0; rcu_in_progress = 0;
for_each_possible_cpu(cpu) { for_each_possible_cpu(cpu) {
c = per_cpu_ptr(ma->cache, cpu); c = per_cpu_ptr(ma->cache, cpu);
/*
* refill_work may be unfinished for PREEMPT_RT kernel
* in which irq work is invoked in a per-CPU RT thread.
* It is also possible for kernel with
* arch_irq_work_has_interrupt() being false and irq
* work is invoked in timer interrupt. So waiting for
* the completion of irq work to ease the handling of
* concurrency.
*/
irq_work_sync(&c->refill_work);
drain_mem_cache(c); drain_mem_cache(c);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress); rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
} }
@ -507,6 +520,7 @@ void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma)
cc = per_cpu_ptr(ma->caches, cpu); cc = per_cpu_ptr(ma->caches, cpu);
for (i = 0; i < NUM_CACHES; i++) { for (i = 0; i < NUM_CACHES; i++) {
c = &cc->cache[i]; c = &cc->cache[i];
irq_work_sync(&c->refill_work);
drain_mem_cache(c); drain_mem_cache(c);
rcu_in_progress += atomic_read(&c->call_rcu_in_progress); rcu_in_progress += atomic_read(&c->call_rcu_in_progress);
} }