2009-12-09

I learned something new today.

http://lkml.org/lkml/2009/12/9/30

One possible solution - modify cgroup_get_sb to do as follows:

1) get subsys_mutex
2) parse_cgroupfs_options()
3) release subsys_mutex
4) call sget(), which gets s->s_umount
5) get subsys_mutex again
6) verify_cgroupfs_options - also known as the "deadlock avoidance dance"
7) proceed.

Now, note, for clarification:

1508 static struct file_system_type cgroup_fs_type = {
1509         .name = "cgroup",
1510         .get_sb = cgroup_get_sb,
1511         .kill_sb = cgroup_kill_sb,
1512 };

The issue is that s->s_umount is taken in get_sb while it already has the subsys_mutex, whereas kill_sb is called with s->s_umount already held, and deadlock comes from there. There's a good question lurking here, and it is "Who ever designed an interface of seemingly symmetrical functions, one of which has to take a lock inside it while the other has the lock taken before it's called?"

So yes, there's locking order violation, but if functions were locks (which is a reasonable comparison because isn't it intuitive to have a function that takes a lock at the start and drops it at the end?) then the blame would be on whoever wrote this interface to begin with.

No comments:

Post a Comment

Followers

Contributors