golang 中处理大规模tcp socket网络连接的方法，相当于c语言的 poll 或 epoll

https://groups.google.com/forum/#!topic/golang-nuts/I7a_3B8_9Gw

https://groups.google.com/forum/#!msg/golang-nuts/coc6bAl2kPM/ypNLG3I4mk0J

ask: -----------------------

Hello,

I'm curious as to what the proper way of listening multiple simultaneous sockets is?

Please consider the two implementations I currently have: http://play.golang.org/p/LOd7q3aawd

Receive1 is a rather simple implementation, spawning a goroutine per connection, and simply blocking on each one via the ReadFromUDP method of the connection.

Receive2 is a more C-style approach, using the real select to block until either one of the sockets actually receives something, and then actually trying to read.

Now, Receive2 is _supposedly_ better, since we rely on the kernel to notify the select if one of the file descriptors is ready for reading. That would, under ideal conditions (and maybe less goroutines), allow the process itself to fall asleep until data starts pouring into the socket.

Receive1 is relying on the I/O read blocking until something comes along, and is doing so for each connection. Or at least that's what my understanding is, I'm not sure whether internally, the connection is still using select. Though even if it was, I don't thing the go scheduler is smart enough to put the whole process to sleep if two separate goroutines are waiting for I/O. That being said, Receive1 looks so much better than its counterpart.

If there is also a better way than either of these, please share it with me.

answer1: ------------------------------------------

Recieve1 is better. Go will use asynchronous I/O (equivalent to select) under the covers for you. The go scheduler is smart enough to "put the whole process to sleep if [all] goroutines are waiting for I/O". Don't worry about it. Just use idiomatic Go.

answer2: ------------------------------------------

Receive1 is certainly the Go way. I wonder however why you need to read from two UDP ports. UDP is connectionless, so you can support multiple clients with one open UDP port.

That being said you should know that any goroutine blocking in a system call consumes one kernel thread. This will not be a problem until you need to support thousands of connections. But at this scale the file descriptor bitmaps used by select become a performance bottleneck as well. In this situation you might want to look at epoll on Linux. On other systems poll might be an alternative. If you are in this territory I strongly recommend to have a look into Michael Kerrisk's excellent reference "The LINUX Programming Interface".

answer3: ------------------------------------------
It's true that a goroutine blocking in a syscall consumes a kernel thread. However, Receive1 will *not* use any kernel threads while waiting in conn.ReadFromUDP, because under the covers, the Go runtime uses nonblocking I/O for all network activity. It's much better just to rely on the runtime implementation of network I/O rather than trying to roll your own. If you don't believe me, try doing syscall traces or profiling to prove it out.

answer4: ------------------------------------------

receive2 approach is not portable (due to syscall), and is more complex. also, unless profiling can prove it, efficiency of the approach is a speculation.

answer5: ------------------------------------------

Matt, thank you I learned something. During network access the Goroutine is not blocked in a syscall and Go is already using epoll internally. So unless you know what you are doing the Goroutine approach will work best.

这个问答中的 example code:

package main

import (
    "fmt"
    "net"
    "os"
    "syscall"
)

func Receive1(conn1, conn2 *net.UDPConn, done chan struct{}) <-chan string {
    res := make(chan string)
    tokenChan := make(chan []string)

    for _, conn := range []*net.UDPConn{conn1, conn2} {
        go func(conn *net.UDPConn) {
            buf := make([]byte, 2048)
            for {
                select {
                case <-done:
                    return
                default:
                    if n, _, err := conn.ReadFromUDP(buf); err == nil {
                        fmt.Println(string(buf[:n]))
                        res <- string(buf[:n])
                    }
                }
            }
        }(conn)
    }

    return res
}

func Receive2(conn1, conn2 *net.UDPConn, done chan struct{}) <-chan string {
    res := make(chan string)
    fds := &syscall.FdSet{}
    filemap := map[int]*os.File{}
    var maxfd = 0
    for _, conn := range []*net.UDPConn{conn1, conn2} {
        if file, err := conn.File(); err == nil {
            fd := int(file.Fd())
            FD_SET(fds, fd)
            filemap[fd] = file
            if fd > maxfd {
                maxfd = fd
            }
        }
    }

    go func() {
        for {
            select {
            case <-done:
                return
            default:
                fdsetCopy := *fds
                tv := syscall.Timeval{5, 0}
                if _, err := syscall.Select(maxfd+1, &fdsetCopy, nil, nil, &tv); err == nil {
                    for fd, file := range filemap {
                        if !FD_ISSET(&fdsetCopy, fd) {
                            continue
                        }

                        buf := make([]byte, 4196)
                        if n, err := file.Read(buf); err == nil {
                            fmt.Println(string(buf[:n]))
                            res <- string(buf[:n])
                        }
                    }
                }
            }
        }
    }()

    return res
}

func FD_SET(p *syscall.FdSet, i int) {
    p.Bits[i/64] |= 1 << (uint(i) % 64)
}

func FD_ISSET(p *syscall.FdSet, i int) bool {
    return (p.Bits[i/64] & (1 << (uint(i) % 64))) != 0
}

func main() {
    fmt.Println("Hello, playground")
}

ask: -------------------------------

Hello,

It is said that event-driven nonblocking model is not the preferred programming model in Go, so I use "one goroutine for one client" model, but is it OK to handle millions of concurrent goroutines in a server process?

And, how can I "select" millions of channel to see which goroutine has data received? The "select" statement can only select on predictable number of channels, not on a lot of unpredictable channels. And how can I "select" a TCP Connection (which is not a channel) to see if there is any data arrived? Is there any "design patterns" on concurrent programming in Go?

Thanks in advance.

answer: ----------------------------------

Hello,

It is said that event-driven nonblocking model is not the preferred programming model in Go, so I use "one goroutine for one client" model, but is it OK to handle millions of concurrent goroutines in a server process?

A goroutine itself is 4kb. So 1e6 goroutines would require 4gb of base memory. And then whatever your server needs per goroutine that you add.

Any machine that might be handling 1e6 concurrent connections should have well over 4gb of memory.

And, how can I "select" millions of channel to see which goroutine has data received?

That's not how it works. You just try to .Read() in each goroutine, and the select is done under the hood. The select{} statement is for channel communication, specifically.

All IO in go is event driven under the hood, but as far as the code you write, looks linear. The Go runtime maintains a single thread that runs epoll or kqueue or whatever under the hood, and wakes up a goroutine when new data has arrived for that goroutine.

The "select" statement can only select on predictable number of channels, not on a lot of unpredictable channels. And how can I "select" a TCP Connection (which is not a channel) to see if there is any data arrived? Is there any "design patterns" on concurrent programming in Go?

These problems you anticipate simply do not exist with go. Give it a shot!