2020.12.07 -- 2020.12.12

周一看了attention方面的论文，但是《attention is all you need》这篇文章好难看懂，所以先上网查了它。（每天get 一个小技巧，碰到看不懂的文章可以直接上网查标题，特别是有名的文章，网上肯定有人读过，有博客）。看attention 要先从RNN (Recursive Neural Network)，LSTM（Long Short Term Memory）看起，

1. 《Neural Machine Translation: By Jointly Learning To align And Translate》

basic encoder-decoder architecture use a fix-length vector。最后一个输入单元的输出是输出单元的输入，最后一个输入单元的输出要表达所有输入的信息，所以对于较长的句子可能表达能力不够。本文用了softmax-attention, 将输出单元的隐变量 h 与输入单元的隐变量 h 做对比，与哪个接近哪个权重大。总体结构用的还是RNN encoder-decoder, encoder and decoder 的结构用的是GRU。

　　看完这篇回过去看attention is all you need, 还是没看懂。说是不用卷积和recursive，对输入的Q（query）, K (key) , V (Value) 做多次线性投影。

　　然后又去看姜学长的论文， Binary Neural Network 用来做hotspot detection。

2. 《XNOR-Net:ImageNet Classification Using Binary Convolutional Neural Networks》

　　卷积网络中的real-value 用binary-value近似，能reduce memory usage, 原来的卷积操作用XNOR and bitcount operation代替。
　　weight-binary-net：t在 forward and backward 用binary, input and parameters update 用real-value; XNOR-Net： input tensor and weight 都二值化，input tensor X用H and scaling factor beta近似，weight W use B and scaling factor alpha近似。
　　传统CNN 的 block 的结构是：Conv -- batchNorm -- Activ -- Pooling。对于BNN，pooling on binary input results big loss in informations. So the block structure is ：BatchNorm -- BinActiv -- BinConv -- Pool。

姜学长的论文用到的BNN就是XNOR，所以他改写了mxnet, 然后我配置那个环境配置了很久，（而且还没配置好）。我可能就是菜吧~~

还读了学长那篇用NAS来优化网络结构，用来做热点检测的文章。