[dpdk] 读官方文档(2)

续前节。切好继续:

一,文档里提到uio_pci_generic, igb_uio, vfio_pci三个内核模块,完全搞不懂,以及dpdk-devbind.py用来查看网卡状态,我得到了下边的输出:

[root@dpdk tools]# ./dpdk-devbind.py --status
Network devices using DPDK-compatible driver
============================================
<none>
Network devices using kernel driver
===================================
0000:00:03.0 'Virtio network device' if= drv=virtio-pci unused= 
Other network devices
=====================
<none>
[root@dpdk tools]# 

所以,首先需要学习一下qemu的网卡设置,调一调硬件再回来~~(我悲催的去man qemu了。。。)

此前,对于qemu的网络,我只有一种用法,外边一个tap,里边一个virtio。

man完,回来鸟,guest的硬件使用”-net nic model=xxx“可以模拟。但是如何passthough还不知道。

1 在前端驱动使用virtio的情况下,如何让后端使用vhost-user

突然意识到其实这个事情如此复杂,于是我觉得另起一文。move to  ” [qemu] 在前端驱动使用virtio的情况下,如何让后端使用vhost-user”

2. 设备直接访问,PCI passthrough

http://blog.csdn.net/qq123386926/article/details/47757089

http://blog.csdn.net/halcyonbaby/article/details/37776211

http://blog.csdn.net/richardysteven/article/details/9008971

两种方法,pci-stub / VFIO ,我只使用较新的VFIO。我准备把我的物理网口交给虚拟机直接访问。

1. 确保CPU支持 vt-d,并且bois中已经打开。

我的CPU是支持地:http://ark.intel.com/products/85214/Intel-Core-i7-5500U-Processor-4M-Cache-up-to-3_00-GHz

2. 修改grub在内核启动 intel_iommu=on(这里有个坑,请继续阅读后边另起一 ”“ 讲了这个坑)

[tong@T7 dpdk]$ zcat /proc/config.gz  |grep -i intel_iommu
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_SVM=y
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
[tong@T7 dpdk]$ 

3. 加载 vfio-pci 驱动至内核。

[tong@T7 dpdk]$ sudo modprobe vfio-pci
[tong@T7 dpdk]$ lsmod |grep vfio
vfio_pci               36864  0
vfio_iommu_type1       20480  0
vfio_virqfd            16384  1 vfio_pci
vfio                   24576  2 vfio_iommu_type1,vfio_pci
irqbypass              16384  2 kvm,vfio_pci
[tong@T7 dpdk]$ 

4. 查看网卡信息

[root@T7 0000:00:19.0]# lspci -vv -nn -d 8086:15a3
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
        Subsystem: Lenovo Device [17aa:2227]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 20
        Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K]
        Region 2: I/O ports at 4080 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel modules: e1000e

 5. bind / unbind

[root@T7 0000:00:19.0]# echo "0000:00:19.0" > /sys/bus/pci/devices/0000:00:19.0/driver/unbind 
[root@T7 0000:00:19.0]# echo "8086 15a3" > /sys/bus/pci/drivers/vfio-pci/new_id  

*** 问题来了,根据文档描述,已经发现些许不对,我并没有iommu_group, 那是神马鬼。。。***

[tong@T7 dpdk]$ ls /dev/vfio/
vfio
[tong@T7 dpdk]$ dmesg |grep vfio
[20355.407062] vfio-pci: probe of 0000:00:19.0 failed with error -22
[20593.172116] vfio-pci: probe of 0000:00:19.0 failed with error -22
[20684.750370] vfio-pci: probe of 0000:00:19.0 failed with error -22
[tong@T7 dpdk]$ 

我如下启动,然后报错:

[tong@T7 dpdk]$ cat start.sh
sudo qemu-system-x86_64 -enable-kvm 
        -m 2G -cpu Nehalem -smp cores=2,threads=2,sockets=2 
        -numa node,mem=1G,cpus=0-3,nodeid=0 
        -numa node,mem=1G,cpus=4-7,nodeid=1 
        -drive file=disk.img,if=virtio 
        -net nic,model=virtio,macaddr='00:00:00:00:00:03' 
        -device vfio-pci,host='0000:00:19.0' 
        -net tap,ifname=tap0 &
[tong@T7 dpdk]$ ./start.sh
[tong@T7 dpdk]$ qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: error no iommu_group for device
qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: Device initialization failed

问题解答:

为了解答这个问题,我读了内核文档,以及又读了IBM的这篇特别好的文。终于理解了iommu group到底是什么,然而并没有找到答案。

https://www.kernel.org/doc/Documentation/vfio.txt

https://www.ibm.com/developerworks/community/blogs/5144904d-5d75-45ed-9d2b-cf1754ee936a/entry/vfio?lang=en

那么为什么没有iommu_group呢? 因为我愚蠢啊!并没有如(2)所说在grub上加入内核参数 intel_iommu=on 。为什么我没加呢? 因为我已经zcat /proc/config.gz里边写着是y就是启动了的意思。然后等我加好这个参数之后,再zcat /proc/config.gz。两次竟然是一样的。嗯,原来我根本就把这个文件的功能理解错了。我猜它只是代表内核编译时的选项状态。与运行状态根本就是无关的!

于是,改完参数,系统刚刚启动开的时候,是酱紫的,就代表生效了:

[tong@T7 ~]$ ll /sys/bus/pci/devices/0000:00:19.0/ |grep io
lrwxrwxrwx 1 root root      0 Sep 27 23:44 iommu -> ../../virtual/iommu/dmar1
lrwxrwxrwx 1 root root      0 Sep 27 23:44 iommu_group -> ../../../kernel/iommu_groups/5
[tong@T7 ~]$ 

然后出栈这个问题,回到 unbind / bind 继续,我要passthrough给虚拟机的是物理网卡 lan0 :

unbind前网络灯亮,状态信息:

[tong@T7 ~]$ lspci -vv -nn -s 00:19.0
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
        Subsystem: Lenovo Device [17aa:2227]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 46
        Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K]
        Region 2: I/O ports at 4080 [size=32]
        Capabilities: <access denied>
        Kernel driver in use: e1000e
        Kernel modules: e1000e

[tong@T7 ~]$ sudo ip link show dev lan0
2: lan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 50:7b:9d:5c:1e:9b brd ff:ff:ff:ff:ff:ff
[tong@T7 ~]$ ll /sys/bus/pci/devices/0000:00:19.0/ |grep driver
lrwxrwxrwx 1 root root      0 Sep 27 23:42 driver -> ../../../bus/pci/drivers/e1000e
-rw-r--r-- 1 root root   4096 Sep 27 23:44 driver_override
[tong@T7 ~]$ 

unbind:(I don't know why ? maybe someday someone could tell me, if you see code belowj.但这并不重要

[tong@T7 ~]$ sudo echo 0000:00:19.0 > /sys/bus/pci/devices/0000:00:19.0/driver/unbind 
bash: /sys/bus/pci/devices/0000:00:19.0/driver/unbind: Permission denied
[tong@T7 ~]$ sudo su -
[root@T7 ~]# echo 0000:00:19.0 > /sys/bus/pci/devices/0000:00:19.0/driver/unbind
[root@T7 ~]# 

unbind成功后,各状态的对比如下: 网卡灯还是亮的

[root@T7 ~]# lspci -vv -nn -s 00:19.0
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
        Subsystem: Lenovo Device [17aa:2227]
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 20
        Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K]
        Region 2: I/O ports at 4080 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel modules: e1000e

[root@T7 ~]# ip link show dev lan0
Device "lan0" does not exist.
[root@T7 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
    link/ether dc:53:60:6c:b5:7e brd ff:ff:ff:ff:ff:ff
4: internal-br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 26:4a:07:a1:4f:06 brd ff:ff:ff:ff:ff:ff
[root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/ |grep driver
-rw-r--r-- 1 root root   4096 Sep 27 23:44 driver_override
[root@T7 ~]# 

bind to vfio:

[root@T7 ~]# modprobe vfio_pci
[root@T7 ~]# lsmod |grep vfio
vfio_pci               36864  0
vfio_iommu_type1       20480  0
vfio_virqfd            16384  1 vfio_pci
vfio                   24576  2 vfio_iommu_type1,vfio_pci
irqbypass              16384  2 kvm,vfio_pci
[root@T7 ~]# echo 8086 15a3 > /sys/bus/pci/drivers/vfio-pci/new_id

bind成功后,各种状态:

[root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/iommu_group/devices/
total 0
lrwxrwxrwx 1 root root 0 Sep 28 00:09 0000:00:19.0 -> ../../../../devices/pci0000:00/0000:00:19.0
[root@T7 ~]# ll /dev/vfio/
total 0
crw------- 1 root root 242,   0 Sep 28 00:08 5
crw-rw-rw- 1 root root  10, 196 Sep 28 00:06 vfio
[root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/iom*
lrwxrwxrwx 1 root root 0 Sep 27 23:44 /sys/bus/pci/devices/0000:00:19.0/iommu -> ../../virtual/iommu/dmar1
lrwxrwxrwx 1 root root 0 Sep 27 23:44 /sys/bus/pci/devices/0000:00:19.0/iommu_group -> ../../../kernel/iommu_groups/5
[root@T7 ~]# dmesg |tail
... ...
[ 1027.806155] e1000e 0000:00:19.0 lan0: removed PHC
[ 1394.134555] VFIO - User Level meta-driver version: 0.3
[root@T7 ~]# lspci -vv -nn -s 00:19.0
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
        Subsystem: Lenovo Device [17aa:2227]
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 20
        Region 0: Memory at f2200000 (32-bit, non-prefetchable) [disabled] [size=128K]
        Region 1: Memory at f223e000 (32-bit, non-prefetchable) [disabled] [size=4K]
        Region 2: I/O ports at 4080 [disabled] [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D3 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: vfio-pci
        Kernel modules: e1000e

[root@T7 ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000
    link/ether dc:53:60:6c:b5:7e brd ff:ff:ff:ff:ff:ff
4: internal-br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 26:4a:07:a1:4f:06 brd ff:ff:ff:ff:ff:ff
[root@T7 ~]# 

6. 启虚拟机测试,进去虚拟机查看,多了一个网卡,该网卡在虚拟机内可以收到交换机上的二层广播,可以dhcp到地址:

[root@dpdk ~]# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev 02)
00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [8086:7000]
00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] [8086:7010]
00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 03)
00:02.0 VGA compatible controller [0300]: Device [1234:1111] (rev 02)
00:03.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000]
00:04.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03)
00:05.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device [1af4:1001]
[root@dpdk ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 00:00:00:00:00:03 brd ff:ff:ff:ff:ff:ff
3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000
    link/ether 50:7b:9d:5c:1e:9b brd ff:ff:ff:ff:ff:ff
[root@dpdk ~]# tcpdump -i ens4 -nn -c 10
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens4, link-type EN10MB (Ethernet), capture size 65535 bytes
00:17:32.969547 ARP, Request who-has 192.168.197.100 tell 192.168.197.101, length 46
00:17:33.970617 ARP, Request who-has 192.168.197.100 tell 192.168.197.101, length 46

7. 是否可以复用??? 我打算再启动一个虚拟机看看。

[tong@T7 CentOS7]$ ./start.sh 
[tong@T7 CentOS7]$ qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: error opening /dev/vfio/5: Device or resource busy
qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: failed to get group 5
qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: Device initialization failed
^C
[tong@T7 CentOS7]$ 

答案是不能!

至此,pci网卡使用 vfio 配置passthrough完成!: )

原文地址:https://www.cnblogs.com/hugetong/p/5904024.html