硬伤惊群效应工作进程数类型比较线程取代进程

小结：

1、

在gunicorn这种pre-fork模型中，master（gunicorn 中Arbiter）会fork出指定数量的worker进程，worker进程在同样的端口上监听，谁先监听到网络连接请求，谁就提供服务，这也是worker进程之间的负载均衡。

2、 worker 进程数，4-12，2cpu数+1，整体每秒处理几百、千个请求；

https://github.com/benoitc/gunicorn/blob/97a45805f85830d1f80bf769f5787704daa635d3/docs/source/design.rst -》》》》》not overly scientific

http://docs.gunicorn.org/en/stable/design.html#how-many-threads

Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request.

Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.

3、硬伤惊群效应

FAQ — Gunicorn 19.9.0 documentation
http://docs.gunicorn.org/en/stable/faq.html#does-gunicorn-suffer-from-the-thundering-herd-problem

fix thundering herd · Issue #792 · benoitc/gunicorn
https://github.com/benoitc/gunicorn/issues/792

http://docs.gunicorn.org/en/stable/design.html#design

关于负载均衡的一切：总结与思考 - xybaby - 博客园
http://www.cnblogs.com/xybaby/p/7867735.html#_label_13

Sync Workers

The most basic and the default worker type is a synchronous worker class that handles a single request at a time. This model is the simplest to reason about as any errors will affect at most a single request. Though as we describe below only processing a single request at a time requires some assumptions about how applications are programmed.

sync worker does not support persistent connections - each connection is closed after response has been sent (even if you manually add Keep-Alive or Connection: keep-alive header in your application).

Async Workers

The asynchronous workers available are based on Greenlets (via Eventlet and Gevent). Greenlets are an implementation of cooperative multi-threading for Python. In general, an application should be able to make use of these worker classes with no changes.

Tornado Workers

There’s also a Tornado worker class. It can be used to write applications using the Tornado framework. Although the Tornado workers are capable of serving a WSGI application, this is not a recommended configuration.

AsyncIO Workers

These workers are compatible with python3. You have two kind of workers.

The worker gthread is a threaded worker. It accepts connections in the main loop, accepted connections are added to the thread pool as a connection job. On keepalive connections are put back in the loop waiting for an event. If no event happen after the keep alive timeout, the connection is closed.

The worker gaiohttp is a full asyncio worker using aiohttp.

Note

The gaiohttp worker requires the aiohttp module to be installed. aiohttp has removed its native WSGI application support in version 2. If you want to continue to use the gaiohttp worker with your WSGI application (e.g. an application that uses Flask or Django), there are three options available:

Install aiohttp version 1.3.5 instead of version 2:
```
$ pip install aiohttp==1.3.5
```
Use aiohttp_wsgi to wrap your WSGI application. You can take a look at the example in the Gunicorn repository.
Port your application to use aiohttp’s web.Application API.
Use the aiohttp.worker.GunicornWebWorker worker instead of the deprecated gaiohttp worker.

Choosing a Worker Type

The default synchronous workers assume that your application is resource-bound in terms of CPU and network bandwidth. Generally this means that your application shouldn’t do anything that takes an undefined amount of time. An example of something that takes an undefined amount of time is a request to the internet. At some point the external network will fail in such a way that clients will pile up on your servers. So, in this sense, any web application which makes outgoing requests to APIs will benefit from an asynchronous worker.

This resource bound assumption is why we require a buffering proxy in front of a default configuration Gunicorn. If you exposed synchronous workers to the internet, a DOS attack would be trivial by creating a load that trickles data to the servers. For the curious, Hey is an example of this type of load.

Some examples of behavior requiring asynchronous workers:

Applications making long blocking calls (Ie, external web services)

Serving requests directly to the internet

Streaming requests and responses

Long polling

Web sockets

Comet

http://docs.gunicorn.org/en/stable/settings.html#workers

自编写worker

How Many Threads?

Since Gunicorn 19, a threads option can be used to process requests in multiple threads. Using threads assumes use of the gthread worker. One benefit from threads is that requests can take longer than the worker timeout while notifying the master process that it is not frozen and should not be killed. Depending on the system, using multiple threads, multiple worker processes, or some mixture, may yield the best results. For example, CPython may not perform as well as Jython when using threads, as threading is implemented differently by each. Using threads instead of processes is a good way to reduce the memory footprint of Gunicorn, while still allowing for application upgrades using the reload signal, as the application code will be shared among workers but loaded only in the worker processes (unlike when using the preload setting, which loads the code in the master process).

http://docs.gunicorn.org/en/stable/settings.html#threads

http://docs.gunicorn.org/en/stable/faq.html#does-gunicorn-suffer-from-the-thundering-herd-problem

Does Gunicorn suffer from the thundering herd problem?

The thundering herd problem occurs when many sleeping request handlers, which may be either threads or processes, wake up at the same time to handle a new request. Since only one handler will receive the request, the others will have been awakened for no reason, wasting CPU cycles. At this time, Gunicorn does not implement any IPC solution for coordinating between worker processes. You may experience high load due to this problem when using many workers or threads. However a work has been started to remove this issue.

惊群效应发生在有大量的休眠的是进程或线程请求处理者，同时醒来去处理新请求。由于只有一个处理者将去接受请求，而其他的被无缘无故的唤醒去等待cpu周期。在这种情况下，Gunicorn 没有在worker 进程间实现进程间通信（Internet Process Connection）。由于这个问题，当天使用workers 或者进程时，很多你将体验高负载。

Serializing accept(), AKA Thundering Herd, AKA the Zeeg Problem — uWSGI 2.0 documentation
https://uwsgi-docs.readthedocs.io/en/latest/articles/SerializingAccept.html

In modern times, the vast majority of UNIX systems have evolved, and now the kernel ensures (more or less) only one process/thread is woken up on a connection event.

附图

To increase the worker count by one:

$ kill -TTIN $masterpid

To decrease the worker count by one:

$ kill -TTOU $masterpid

增加从主进程forker