参加天池Flink TPC-DS性能优化竞赛实况(docker环境搭建与ubuntu容器内编译篇)

  想提高一下自己的程序水平,就开始捣鼓linux环境参加阿里天池的flink竞赛

  刚开始想用windows里面的cygwin编译数据生成器,结果在/home/hadoop/flink-community/resource/tpcds中找到的compileTpcds.sh执行之后提示找不到gcc和make,于是在cygwin安装界面安装gcc,安装完成之后冲过去目录去运行,结果报错。

  错误如下:

gcc -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DYYDEBUG  -DLINUX -g -Wall   -c -o mkheader.o mkheader.c
In file included from mkheader.c:37:0:
porting.h:46:10: 致命错误:values.h:No such file or directory
 #include <values.h>
          ^~~~~~~~~~
编译中断。
make: *** [<内置>:mkheader.o] 错误 1
compileTpcds.sh:行40: ./dsdgen: No such file or directory
compileTpcds.sh:行41: ./dsqgen: No such file or directory
cp: 无法获取'/cygdrive/d/javaProgram/home/hadoop/flink-community-perf/resource/tpcds/tpc-ds-tool/tools/dsdgen' 的文件状态(stat): No such file or directory
cp: 无法获取'/cygdrive/d/javaProgram/home/hadoop/flink-community-perf/resource/tpcds/tpc-ds-tool/tools/tpcds.idx' 的文件状态(stat): No such file or directory
cp: 无法获取'/cygdrive/d/javaProgram/home/hadoop/flink-community-perf/resource/tpcds/tpc-ds-tool/tools/dsqgen' 的文件状态(stat): No such file or directory
cp: 无法获取'/cygdrive/d/javaProgram/home/hadoop/flink-community-perf/resource/tpcds/tpc-ds-tool/tools/tpcds.idx' 的文件状态(stat): No such file or directory
chmod: 无法访问'/cygdrive/d/javaProgram/home/hadoop/flink-community-perf/resource/tpcds/querygen/dsqgen': No such file or directory
Compile SUCCESS...
error log

  只能给自己的windows机器安装个docker toolbox,用里面的虚拟docker运行ubuntu环境。

  docker容器的下载不是一般的慢,更何况这个环境还是在windows的虚拟机里面,为了加速,ssh进入虚拟docker之后,执行以下命令

sudo sed -i "s|EXTRA_ARGS='|EXTRA_ARGS='--registry-mirror=http://f1361db2.m.daocloud.io |g" /var/lib/boot2docker/profile

  (http://f1361db2.m.daocloud.io也可改为自己在阿里云拿到的地址)

  (之前也有人说在docker环境新建/etc/docker/daemon.json,键入{"registry-mirrors": ["https://registry.docker-cn.com"]},再重启服务就可以改变镜像地址。我试过,不知道是不是toolbox的原因,这个方法用了之后,物理机开启不了虚拟机内部的docker daemon服务。)

  将镜像地址配置进去之后,在物理机运行docker-machine restart default,即可让配置生效。使用docker search gcc找到镜像rikorose/gcc-cmake之后,使用docker pull 这个gcc镜像:docker pull rikorose/gcc-cmake,再执行docker run -itd -P rikorose/gcc-cmake即可运行gcc容器。

  如果执行过程需要关闭容器,之后输入指令docker start <容器名>后再docker exec -it <容器名> /bin/bash进入容器

  这里需要传文件,于是乎执行docker ps指令(如果容器是关闭状态要用docker ps -a)

docker@default:~$ docker ps
CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS               NAMES
c6b71328afa8        rikorose/gcc-cmake   "bash"              7 days ago          Up 5 hours                              serene_noyce

  这里知道容器名字是serene_noyce,所以下一个指令是docker inspect -f '{{.Id}}' serene_noyce(第3、第4个参数的作用是过滤参数,不然运行结果一堆参数)

docker@default:~$ docker inspect -f '{{.Id}}' serene_noyce
c6b71328afa828c9e4c62ac37dd9dede538c3999355189e44218290a2ae885d3

  下一步就是把物理机的“我的照片”里面的项目放进来了,使用docker cp指令:

docker cp /c/Users/Administrator/Pictures/home c6b71328afa828c9e4c62ac37dd9dede538c3999355189e44218290a2ae885d3:/root

  此处命令格式为:docker cp 本地文件路径 ID全称:容器路径  ,如果需要反过来传送,把容器内文件拷出来,命令格式的第三和第四参数互换就可以了。

  把项目放进容器里面后,进入容器,跳到flink-community/resource/tpcds里面的目录,运行compileTpcds.sh,提示有命令找不到路径:yacc

make: yacc: Command not found

  看来又要用apt-get指令了。不过,这种安装程序可以改镜像地址,首先要修改地址

  由于docker镜像没有vi指令,我们要在虚拟docker里面编辑好package.json文件:

deb http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse

  编辑好文件之后,容器内执行 mv /etc/apt/sources.list /etc/apt/sources.list.bak,虚拟docker执行

docker cp package.json c6b71328afa828c9e4c62ac37dd9dede538c3999355189e44218290a2ae885d3:/etc/apt

  apt-get更新之后,下载地址就会指向新的镜像地址。

  执行apt-get install bison(命令名与包名不同的理由见这里),结果返回Package has no installation candidate错误,就想到可能是apt-get没有更新的原因,遂输入指令:

apt-get update
apt-get upgrade
apt-get install flex bison bc

  之后再执行flink-community/resource/tpcds里面的compileTpcds.sh,ok,编译完成!(编译日志太长了,没有放上来)

  进入下一级datagen目录,运行generateTpcdsData.sh,居然报错:96行,Syntax error: “(” unexpected,可是这个程序的96行是正常的阿:

#!/bin/bash
##############################################################################
# TPC-DS data Generation
##############################################################################

export JAVA_HOME=/home/hadoop/java

# set data save path
targetPath=./data/

# set work threads ,initial value is 0
workThreads=0

# set data scale
if [ $# -lt 1 ]; then
	echo "[ERROR] Insufficient # of params"
	echo "USAGE: `dirname $0`/$0 <scaleFactor>"
	exit 127
fi
scaleFactor=$1

# random seed to build data,default value is 0
rngSeed=0
if [ $# -ge 2 ]; then
    rngSeed=$2
fi


### Check for target path
if [ -z $targetPath ]; then
	echo "[ERROR] HDFS target path was not configured"
	exit 127
fi

### Init
### in workload.lst, dimension table was configured parallel value 1,and fact table  was configured bigger parallel
workFile=workloads/tpcds.workload.${scaleFactor}.lst

if [ ! -e $workFile ]; then

	echo "[INFO] generating Workload file: "$workFile
	echo "a call_center	$((scaleFactor))" >>$workFile
	echo "b catalog_page	$((scaleFactor))" >>$workFile
	echo "d catalog_sales	$((scaleFactor))" >>$workFile
	echo "e customer_address	$((scaleFactor))" >>$workFile
	echo "f customer	$((scaleFactor))" >>$workFile
    echo "g customer_demographics	$((scaleFactor))" >>$workFile
	echo "h date_dim	$((scaleFactor))" >>$workFile
    echo "i household_demographics $((scaleFactor))" >>$workFile
    echo "j income_band    $((scaleFactor))" >>$workFile
    echo "k inventory    $((scaleFactor))" >>$workFile
    echo "l item    $((scaleFactor))" >>$workFile
    echo "m promotion    $((scaleFactor))" >>$workFile
    echo "n reason    $((scaleFactor))" >>$workFile
    echo "o ship_mode    $((scaleFactor))" >>$workFile
    echo "p store    $((scaleFactor))" >>$workFile
    echo "r store_sales    $((scaleFactor))" >>$workFile
    echo "s time_dim    $((scaleFactor))" >>$workFile
    echo "t warehouse    $((scaleFactor))" >>$workFile
    echo "u web_page    $((scaleFactor))" >>$workFile
    echo "w web_sales    $((scaleFactor))" >>$workFile
    echo "x web_site    $((scaleFactor))" >>$workFile
fi

### Basic Params
echo "[INFO] Data will be generated locally on each node at a named pipe ./<tblName.tbl.<chunk#>"
echo "[INFO] Generated data will be streamingly copied to the cluster at "$targetHSDFPath
echo "[INFO] e.g. lineitem.tbl.10 --> /disk/1/tpcds/data/SF100/lineitem/lineitem.10.tbl"

#Clear existing workloads
rm -rf writeData.sh

#Check Dir on disk
targetPath=${targetPath}/SF${scaleFactor}

rm -rf ${targetPath}
mkdir -p ${targetPath}

### Init Workloads

fileName=writeData.sh
echo "#!/bin/bash" >> $fileName
echo "  "  >> $fileName
echo "ps -efww|grep dsdgen |grep -v grep|cut -c 9-15|xargs kill -9" >> $fileName
echo "ps -efww|grep FsShell |grep -v grep|cut -c 9-15|xargs kill -9" >> $fileName
echo "ps -efww|grep wait4process.sh |grep -v grep|cut -c 9-15|xargs kill -9" >> $fileName
echo "rm -rf *.dat" >> $fileName
echo "  "  >> $fileName

mkdir -p ${targetPath}/catalog_returns
mkdir -p ${targetPath}/store_returns
mkdir -p ${targetPath}/web_returns

### Generate Workloads
while read line; do
	params=( $line )
	#Extracting Parameters
	#echo ${params[*]}
	tblCode=${params[0]}
	tblName=${params[1]}
	tblParts=${params[2]}
	echo "====$tblName==="
	mkdir -p ${targetPath}/$tblName
	# Assigning workload in round-robin fashion
	partsDone=1
	while [ $partsDone -le $tblParts ]; do
		if [ $tblParts -gt 1 ]; then
			echo "rm -rf ./${tblName}_${partsDone}_${tblParts}.dat" >> writeData.sh
			echo "mkfifo ./${tblName}_${partsDone}_${tblParts}.dat" >> writeData.sh
            if [ "$tblName" = "catalog_sales" ]; then
                echo "rm -rf ./catalog_returns_${partsDone}_${tblParts}.dat" >> writeData.sh
                echo "mkfifo ./catalog_returns_${partsDone}_${tblParts}.dat" >> writeData.sh
            fi
            if [ "$tblName" = "store_sales" ]; then
                echo "rm -rf ./store_returns_${partsDone}_${tblParts}.dat" >> writeData.sh
                echo "mkfifo ./store_returns_${partsDone}_${tblParts}.dat" >> writeData.sh
            fi
            if [ "$tblName" = "web_sales" ]; then
                echo "rm -rf ./web_returns_${partsDone}_${tblParts}.dat" >> writeData.sh
                echo "mkfifo ./web_returns_${partsDone}_${tblParts}.dat" >> writeData.sh
            fi
			echo "./dsdgen -SCALE $scaleFactor -TABLE $tblName -CHILD $partsDone -PARALLEL $tblParts -FORCE Y -RNGSEED $rngSeed  &" >> writeData.sh
			echo "./copyAndDelete.sh ./${tblName}_${partsDone}_${tblParts}.dat ${targetPath}/$tblName  &" >> writeData.sh
            if [ "$tblName" = "catalog_sales" ]; then
                echo "./copyAndDelete.sh ./catalog_returns_${partsDone}_${tblParts}.dat ${targetPath}/catalog_returns  &" >> writeData.sh
            fi
            if [ "$tblName" = "store_sales" ]; then
                echo "./copyAndDelete.sh ./store_returns_${partsDone}_${tblParts}.dat ${targetPath}/store_returns &" >> writeData.sh
            fi
            if [ "$tblName" = "web_sales" ]; then
                echo "./copyAndDelete.sh ./web_returns_${partsDone}_${tblParts}.dat ${targetPath}/web_returns &" >> writeData.sh
            fi
		else
			echo "rm -rf ./${tblName}.dat" >> writeData.sh
			echo "mkfifo ./${tblName}.dat" >> writeData.sh

			if [ "$tblName" = "catalog_sales" ]; then
                echo "rm -rf ./catalog_returns.dat" >> writeData.sh
                echo "mkfifo ./catalog_returns.dat" >> writeData.sh
            fi
            if [ "$tblName" = "store_sales" ]; then
                echo "rm -rf ./store_returns.dat" >> writeData.sh
                echo "mkfifo ./store_returns.dat" >> writeData.sh
            fi
            if [ "$tblName" = "web_sales" ]; then
                echo "rm -rf ./web_returns.dat" >> writeData.sh
                echo "mkfifo ./web_returns.dat" >> writeData.sh
            fi

			echo "./dsdgen -SCALE $scaleFactor -TABLE $tblName -FORCE Y -RNGSEED $rngSeed &" >> writeData.sh
			echo "./copyAndDelete.sh ./${tblName}.dat ${targetPath}/$tblName &" >> writeData.sh

			if [ "$tblName" = "catalog_sales" ]; then
                echo "./copyAndDelete.sh ./catalog_returns.dat ${targetPath}/catalog_returns &" >> writeData.sh
            fi
            if [ "$tblName" = "store_sales" ]; then
                echo "./copyAndDelete.sh ./store_returns.dat ${targetPath}/store_returns  &" >> writeData.sh
            fi
            if [ "$tblName" = "web_sales" ]; then
                echo "./copyAndDelete.sh ./web_returns.dat ${targetPath}/web_returns &" >> writeData.sh
            fi

		fi

		let partsDone=1+$partsDone
		let workThreads=1+workThreads
	done
done <$workFile;
echo "echo "[INFO] this machine has ${workThreads} dsden thread" ">> writeData.sh
echo "echo "[INFO] Waiting until completion..." ">> writeData.sh
echo "./wait4process.sh dsdgen 0 " >> writeData.sh
echo "   " >> writeData.sh


echo "[INFO] Started Generation @ "`date +%H:%M:%S`
startTime=`date +%s`

echo "[INFO] Executing writeData.sh on "${worker}
chmod 755 writeData.sh
sh writeData.sh


endTime=`date +%s`
echo "[INFO] Completed Generation @ "`date +%H:%M:%S`
echo "[INFO] Generated and loaded SF"${scaleFactor}" in "`echo $endTime - $startTime |bc`" sec"
generateTpcdsData.sh

  然而这个问题好像大家都碰到过,还不是简单的拼写错误。。。这可能跟sh命令对应的运行程序有关,执行ls -l /bin/*sh就知道了,我的容器是这样的:

-rwxr-xr-x 1 root root 1099016 May 15  2017 /bin/bash
-rwxr-xr-x 1 root root  117208 Jan 24  2017 /bin/dash
lrwxrwxrwx 1 root root       4 May 15  2017 /bin/rbash -> bash
lrwxrwxrwx 1 root root       4 Jan 24  2017 /bin/sh -> dash

  看起来,系统把sh程序交给了dash而不是bash,导致程序编译错误,这时就不要用sh执行了,要用bash。

原文地址:https://www.cnblogs.com/dgutfly/p/11634714.html