Thundering Herd の件

from linux and SINGLE_LISTEN_UNSERIALIZED_ACCEPT. おもしろい。以下引用。

 ---------- Forwarded message ----------
To: linux-kernel@vger.kernel.org
From: Linus Torvalds <torvalds@transmeta.com>
Subject: Re: [PATCH] Re: Negative scalability by removal of 
    lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
Date: 3 Nov 2000 22:23:15 -0800
Organization: Transmeta Corporation

In article <3A0399CD.8B080698@uow.edu.au>,
Andrew Morton  <andrewm@uow.edu.au> wrote:
>
>neither flock() nor fcntl() serialisation are effective
>on linux 2.2 or linux 2.4.  This is because the file
>locking code still wakes up _all_ waiters.  In my testing
>with fcntl serialisation I have seen a single Apache
>instance get woken and put back to sleep 1,500 times
>before the poor thing actually got to service a request.

Indeed.

flock() is the absolute worst case, and always has been.  I guess nobody
every actually bothered to benchmark it.

>For kernel 2.2 I recommend that Apache consider using
>sysv semaphores for serialisation. They use wake-one. 
>
>For kernel 2.4 I recommend that Apache use unserialised
>accept.

No.

Please use unserialized accept() _always_, because we can fix that. 

Even 2.2.x can be fixed to do the wake-one for accept(), if required. 
It's not going to be any worse than the current apache config, and
basically the less games apache plays, the better the kernel can try to
accomodate what apache _really_ wants done.  When playing games, you
hide what you really want done, and suddenly kernel profiles etc end up
being completely useless, because they no longer give the data we needed
to fix the problem. 

Basically, the whole serialization crap is all about the Apache people
saying the equivalent of "the OS does a bad job on something we consider
to be incredibly important, so we do something else instead to hide it".

And regardless of _what_ workaround Apache does, whether it is the sucky
fcntl() thing or using SysV semaphores, it's going to hide the real
issue and mean that it never gets fixed properly.

And in the end it will result in really really bad performance. 

Instead, if apache had just done the thing it wanted to do in the first
place, the wake-one accept() semantics would have happened a hell of a
lot earlier. 

Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play
games trying to outsmart the OS, it will just hurt Apache in the long run.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



 ---------- Forwarded message ----------
Date: Sat, 4 Nov 2000 12:03:06 -0800 (PST)
From: dean gaudet <dean-list-linux-kernel@arctic.org>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Re: Negative scalability by removal of 
    lock_kernel()?(Was:Strange performance behavior of 2.4.0-test9)
X-comment: visit http://arctic.org/~dean/legal for information regarding
    copyright and disclaimer.

On Fri, 3 Nov 2000, Linus Torvalds wrote:

> Please use unserialized accept() _always_, because we can fix that.

i can unserialise the single socket case, but the multiple socket case is
not so simple.

the executive summary is that when you've got multiple sockets you have to
use select().  select is necessarily wake-all.  remember there's N
children trying to do select/accept.  if the listening socket is
non-blocking then you spin in N-1 children; if it's blocking then you
starve other sockets.

see http://www.apache.org/docs/misc/perf-tuning.html, search for "multiple
sockets" for my full analysis of the problem.

> Instead, if apache had just done the thing it wanted to do in the first
> place, the wake-one accept() semantics would have happened a hell of a
> lot earlier.

counter-example:  freebsd had wake-one semantics a few years before linux.

revision 1.237
date: 1998/09/29 01:22:57;  author: marc;  state: Exp;  lines: +1 -0
Unserialized accept() should be safe (in all versions) and efficient
(in anything vaguely recent) on FreeBSD.

ok, we done finger pointing? :)

 -dean