That's good, because I also think it has a fatal flaw. I won't say it, but I'll give an exposition so you can figure it out. Here's the idea:
- You are not allowed to block, sleep, or otherwise yield while in an "atomic" section of code (usually because it means interrupts are disabled). In order to safely iterate over a thread-group, you need to hold either rcu_read_lock or tasklist_lock, both of which are low-level locks and constitute "atomic" code while being held.
- In kernel/cpuset.c, cpuset_change_task_nodemask has two steps, and a check between those steps which does a yield to synchronize its internal state (as part of an open-coded replacement for a proper concurrency primitive... ugh. from commit c0ff7453bb5c7c98e0885fb94279f2571946f280 on mmotm). Here I have changed it to return -EAGAIN instead of yielding, so the caller can synchronize "more appropriately".
- get_task_struct and put_task_struct manage a thread's reference count. After you've done get_task_struct, it will always be safe to access that memory, until you release it. If you call put_task_struct and you're the last one with a reference count, it does a bunch of cleanup. (interesting read here, perhaps a hint.)
1456 /*
1457 * This particular per-task operation requires being able to sleep, so
1458 * it can't be done in the attach_task callback, where sleeping is
1459 * forbidden. See cgroup_attach_proc.
1460 */
1461 if (threadgroup) {
1462 struct task_struct *c = NULL;
1463 again:
1464 rcu_read_lock();
1465 /* ensure safe thread-group traversal */
1466 if (thread_group_leader(tsk)) {
1467 list_for_each_entry_rcu(c, &tsk->thread_group,
1468 thread_group) {
1469 get_task_struct(c);
1470 ret = cpuset_change_task_nodemask(c,
1471 &cpuset_attach_nodemask_to);
1472 /*
1473 * on a failed nodemask change, we need to exit
1474 * the atomic section and start over.
1475 */
1476 if (ret == -EAGAIN) {
1477 rcu_read_unlock();
1478 yield();
1479 goto again;
1480 }
1481 put_task_struct(c);
1482 }
1483 rcu_read_unlock();
1484 } else if (c != NULL) {
1485 /*
1486 * if racing with exec caused an abort, we need to
1487 * finish the rebind operation in progress on it.
1488 */
1489 rcu_read_unlock();
1490 do {
1491 ret = cpuset_change_task_nodemask(c,
1492 &cpuset_attach_nodemask_to);
1493 if (ret == -EAGAIN)
1494 yield();
1495 } while (ret != -EAGAIN)
1496 put_task_struct(c);
1497 }
1498 } else {
1499 do {
1500 ret = cpuset_change_task_nodemask(tsk,
1501 &cpuset_attach_nodemask_to);
1502 if (ret == -EAGAIN)
1503 yield();
1504 } while (ret != -EAGAIN);
1505 }
No comments:
Post a Comment