python 读取mat文件

需求是提取mat文件里的信息并实时传到后台服务器，因为matlib的脚本实在太难用，如果只是单纯提取信息还可以，

还涉及到一个工程问题，只好用其他语言提取。

同时补充一下这个mat文件的特点，是多层结构，里面有个变量套了多个Cell，重点是解析出多个Cell里的信息，并计算。

开始使用了C#来解析mat文件，因为原来用C#写过一个自动定时上传文件的客户端，稍加改造即可。

但是想当然地又入坑了。

先说C#客户端，两个要点

一：多文件传输，核心代码就四行

MultipartFormDataContent multipartFormDataContent = new MultipartFormDataContent(boundary);
var requestUri = "http://XXXXX:8000/imgupload";
multipartFormDataContent.Add(new ByteArrayContent(System.IO.File.ReadAllBytes(file)), "file", uploadfilename);
var result = client.PostAsync(requestUri, multipartFormDataContent).Result.Content.ReadAsStringAsync().Result;

二：定时器，时间可以由用户指定

timer.Interval = Convert.ToInt32(this.textBox2.Text)*1000;//执行间隔时间,单位为毫秒    
timer.Start();
timer.Elapsed += new System.Timers.ElapsedEventHandler(Timer1_Elapsed);

以前这个小工具是用来传图的，现在改成解析成csv再传，后台启了个node服务来接受csv，备份数据后并写库。如果实时性要求高可以采用socket-websocket架构方案。

app.post('/imgupload',function(req, res, next){
		var file = req.files;
		var savepath = mypath.parse(file['file']['path']);
		fs.rename(file['file']['path'],mypath.join(savepath.dir,file['file']['originalname']),function(err){
			if(err){
				console.log('上传失败');
			}else{
				console.log('上传成功'+file['file']['originalname']);
			}
		});
		res.send({ret_code: '0'});
	});

接下来就是重点了，使用MathNet两个包来解析mat文件，用VS工具的Nuget包管理器下载Math.net numberics，实在不行百度用install-pacakge装

using MathNet.Numerics.LinearAlgebra;
using MathNet.Numerics.Data.Matlab;

发现MathNet.Numerics.Data.Matlab这个4.0版本总是装不好，报错，只能安装到3.2.1版本。

解析数据发现，mat文件的Cell读不出来，只能读出个bytesize，一看源码，还真只有一个bytesize，读不出内容，github上找了源码，直接把源码放到本地，

这些终于有内容了，不过内容好像还是bytesize。。于是只好放弃了这个方案了。。

List<MatlabMatrix> ms = MatlabReader.List(path);
            Matrix<double> StartRanMatrix = MatlabReader.Unpack<double>(ms.Find(m => m.Name == "StartRan"));
            MatlabMatrix TrackMatrix = ms[9];
            string[] read = { "Track" };
            Dictionary<string, Matrix<double>> ms1 = MatlabReader.ReadAll<double>(path, read);
            //找到最低值
            double StartRan = StartRanMatrix.Row(0).First();

过程不表，结局就是Python大法好，真香，里面唯一要注意的是pandas读取pd.DataFrame 之后求列的均值，

有个很简单的方法 item.mean()，具体请参考https://blog.csdn.net/tanlangqie/article/details/78656588

下面是python实现读取解析mat文件的源码，看看，是不是很优雅，而且执行效率比matlib原生代码高很多呢~

import pandas as pd
import scipy
from scipy import io
import math
import os

def transformData(path,outputpath):
    features_struct = scipy.io.loadmat(path)
    features = features_struct["Track"]
    StartRan = features_struct["StartRan"][0][0]
    calc = []
    insectNumber = features.size
    for num in range(1,insectNumber):
        dfdata = pd.DataFrame(features[0][num])
        calc.append(dfdata)
    output = []
    count = 1
    for item in calc:
        insectInfo = []
        height = item.mean()
        insectInfo.append(count)
        insectInfo.append(math.floor((height[1]*960/8192+StartRan).real))
        insectInfo.append(math.floor(height[8].real))
        output.append(insectInfo)
        count = count + 1
    outputdata = pd.DataFrame(output)
    outputdata.to_csv(outputpath, index=False)
if __name__ == '__main__':
    rootdir = 'D:mat'
    folderlist = os.listdir(rootdir)  
    for d in folderlist:
        folder = os.path.join(rootdir, d)
        list = os.listdir(folder) 
        for i in range(0, len(list)):
            path = os.path.join(folder, list[i])
            if os.path.isfile(path):
                str = path.rsplit(".", 1)
                if str[1] == 'mat':
                    output_img_path = str[0] + ".csv"
                    transformData(path,output_img_path)