基于UCI数据集Condition Based Maintenance of Naval Propulsion Plants的随机森林回归

数据集参考文献:[1] A. Coraddu, L. Oneto, A. Ghio, S. Savio, D. Anguita, M. Figari, Machine Learning Approaches for Improving Condition?Based Maintenance of Naval Propulsion Plants, Journal of Engineering for the Maritime Environment, 2014, DOI: 10.1177/1475090214540874.

具体网站参见:http://archive.ics.uci.edu/ml/datasets/condition+based+maintenance+of+naval+propulsion+plants

数据集信息说明:

The experiments have been carried out by means of a numerical simulator of a naval vessel (Frigate) characterized by a Gas Turbine (GT) propulsion plant. The different blocks forming the complete simulator (Propeller, Hull, GT, Gear Box and Controller) have been developed and fine tuned over the year on several similar real propulsion plants. In view of these observations the available data are in agreement with a possible real vessel. 

In this release of the simulator it is also possible to take into account the performance decay over time of the GT components such as GT compressor and turbines. 
The propulsion system behaviour has been described with this parameters: 
- Ship speed (linear function of the lever position lp). 
- Compressor degradation coefficient kMc. 
- Turbine degradation coefficient kMt. 
so that each possible degradation state can be described by a combination of this triple (lp,kMt,kMc). 
The range of decay of compressor and turbine has been sampled with an uniform grid of precision 0.001 so to have a good granularity of representation. 
In particular for the compressor decay state discretization the kMc coefficient has been investigated in the domain [1; 0.95], and the turbine coefficient in the domain [1; 0.975]
Ship speed has been investigated sampling the range of feasible speed from 3 knots to 27 knots with a granularity of representation equal to tree knots. 
A series of measures (16 features) which indirectly represents of the state of the system subject to performance decay has been acquired and stored in the dataset over the parameter's space. 

Attribute Information:

- A 16-feature vector containing the GT measures at steady state of the physical asset: 
Lever position (lp) [ ] 
Ship speed (v) [knots] 
Gas Turbine (GT) shaft torque (GTT) [kN m] 
GT rate of revolutions (GTn) [rpm] 
Gas Generator rate of revolutions (GGn) [rpm] 
Starboard Propeller Torque (Ts) [kN] 
Port Propeller Torque (Tp) [kN] 
Hight Pressure (HP) Turbine exit temperature (T48) [C] 
GT Compressor inlet air temperature (T1) [C] 
GT Compressor outlet air temperature (T2) [C] 
HP Turbine exit pressure (P48) [bar] 
GT Compressor inlet air pressure (P1) [bar] 
GT Compressor outlet air pressure (P2) [bar] 
GT exhaust gas pressure (Pexh) [bar] 
Turbine Injecton Control (TIC) [%] 
Fuel flow (mf) [kg/s] 

- GT Compressor decay state coefficient 
- GT Turbine decay state coefficient

 

使用python对数据集应用随机森林回归:

import os
import pandas as pd
import numpy as np 
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor 
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error   
from sklearn.grid_search import GridSearchCV

df = pd.read_csv('data.txt', header= None,sep='   ')
df.shape
df.head(5)

X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,:16],df.iloc[:,16:18],test_size=0.33, random_state=42)

rgr1 = RandomForestRegressor()
rgr1.fit(X_train.iloc[:,:16],y_train.iloc[:,0])
rgr1_pre = rgr1.predict(X_test.iloc[:,:16])
mean_squared_error(y_test.iloc[:,1],rgr1_pre)

结果为:0.00042135854935931014

rgr2 = RandomForestRegressor()
rgr2.fit(X_train.iloc[:,:16],y_train.iloc[:,1])
rgr2_pre = rgr2.predict(X_test.iloc[:,:16])
mean_squared_error(y_test.iloc[:,1],rgr2_pre)

结果为:8.9887552213572615e-07

原文地址:https://www.cnblogs.com/gangzhuzi/p/7152869.html