一个docker镜像中的目录删除不了问题

在一个容器中,删除一个目录,失败:

bash-4.2# pwd
/home/zxcdn/ottcache/tomcat
bash-4.2# uname -a
Linux 3516b6c97679 3.10.0-327.22.2.el7.x86_64 #1 SMP Fri Sep 29 15:13:08 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
bash-4.2# whoami
root


bash-4.2# ls -alrt bin
total 8
drwxr-xr-x. 1 root root 4096 Dec  3 02:49 .
drwxr-xr-x. 1 root root 4096 Dec  4 02:28 ..


bash-4.2# rm -rf bin
bash-4.2# ls -i
33012 bin
bash-4.2# rm -rf bin
bash-4.2# ls -i
33012 bin

相关docker版本信息:

[root@host-80-80-34-255 caq]# docker info
Containers: 2
 Running: 1
 Paused: 0
 Stopped: 1
Images: 1
Server Version: 1.13.1
Storage Driver: overlay2----------存储引擎
 Backing Filesystem: extfs--------底层文件系统
 Supports d_type: true
 Native Overlay Diff: false
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: docker-runc runc
Default Runtime: docker-runc
Init Binary: /usr/libexec/docker/docker-init-current
containerd version:  (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
runc version: 5eda6f6fd0c2884c2c8e78a6e7119e8d0ecedb77 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Security Options:
 seccomp
  WARNING: You're not using the default seccomp profile
  Profile: /etc/docker/seccomp.json
Kernel Version: 3.10.0-327.22.2.el7.x86_64
Operating System: Carrier Grade Server Linux 5
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 3
CPUs: 2
Total Memory: 3.703 GiB
Name: host-80-80-34-255
ID: 4CV6:Y3Q4:NYGV:PABH:VG42:3CN7:CKET:SEIV:4SYF:63PI:HYAB:AZR2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Experimental: false
Insecure Registries:
 0.0.0.0/0
 127.0.0.0/8
Live Restore Enabled: false
Registries: docker.io (secure)

发现删除不了这个空目录,strace跟踪一下,报错如下:

fcntl(3, F_GETFL)                       = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW)
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
getdents(3, /* 2 entries */, 32768)     = 48
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
unlinkat(AT_FDCWD, "bin", AT_REMOVEDIR) = -1 EINVAL (Invalid argument)
lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)

原来是unlinkat报错,然后内核打点跟踪,堆栈如下:

Returning from:  0xffffffff811ed500 : vfs_rename+0x0/0x790 [kernel]
Returning to  :  0xffffffffa039860b : ovl_do_rename+0x3b/0xa0 [overlay]
 0xffffffffa0398e4e : ovl_clear_empty+0x27e/0x2e0 [overlay]
 0xffffffffa0398f28 : ovl_check_empty_and_clear+0x78/0x90 [overlay]
 0xffffffffa039999c : ovl_do_remove+0x1ec/0x470 [overlay]
 0xffffffffa0399c36 : ovl_rmdir+0x16/0x20 [overlay]
 0xffffffff811ec738 : vfs_rmdir+0xa8/0x100 [kernel]
 0xffffffff811f16d5 : do_rmdir+0x1a5/0x200 [kernel]
 0xffffffff811f28b5 : SyS_unlinkat+0x25/0x40 [kernel]
 0xffffffff81649909 : system_call_fastpath+0x16/0x1b [kernel]

 看下确定是vfs_rename出错了,具体按行号打点:

probe kernel.statement("vfs_rename@namei.c:4122")
{
    p_my=@cast($old_dir,"struct inode")->i_op;
    iflags=@cast($old_dir,"struct inode")->i_flags;
    printf("line 4122 flags=%u,rename2=%x,iflags=%u
",$flags,@cast(p_my,"struct inode_operations_wrapper")->rename2,iflags);
    print_backtrace();
}

对应的内核源码:

int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
           struct inode *new_dir, struct dentry *new_dentry,
           struct inode **delegated_inode, unsigned int flags)
{
。。。。
    rename2 = get_rename2_iop(old_dir);---------------4118行
    if (!old_dir->i_op->rename && !rename2)
        return -EPERM;

    if (flags && !rename2)----------------------------4122行
        return -EINVAL;
。。。。
}

一开始我直接取的rename2,发现不为NULL,按道理进不去4122行,后来经细心的谈虎走查,才发现是进入了如下的判断条件:

static inline const struct inode_operations_wrapper *get_iop_wrapper(struct inode *inode,
                                     unsigned version)
{
    const struct inode_operations_wrapper *wrapper;
        
    if (!IS_IOPS_WRAPPER(inode))------------最终是这个条件起作用了
        return NULL;
    wrapper = container_of(inode->i_op, const struct inode_operations_wrapper, ops);
    if (wrapper->version < version)
        return NULL;
    return wrapper;
}

static inline iop_rename2_t get_rename2_iop(struct inode *inode)
{
    const struct inode_operations_wrapper *wrapper = get_iop_wrapper(inode, 0);
    return wrapper ? wrapper->rename2 : NULL;
}

看起来,该内核版本的overlay存储引擎,对ext3的底层文件系统,兼容性存在一些问题。后来使用device-mapper来解决了该问题。 

ext4里面,ext4_iget的时候,对目录操作的时候,inode的i_flags是设置了S_IOPS_WRAPPER属性的,
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = &ext4_dir_inode_operations.ops;
inode->i_fop = &ext4_dir_operations;
inode->i_flags |= S_IOPS_WRAPPER;

 但是ext3没有设置。

水平有限,如果有错误,请帮忙提醒我。如果您觉得本文对您有帮助,可以点击下面的 推荐 支持一下我。版权所有,需要转发请带上本文源地址,博客一直在更新,欢迎 关注 。
原文地址:https://www.cnblogs.com/10087622blog/p/10062918.html