航班延误分析SQL

Part 1 数据库建立

1、创建数据库 flight,并创建新表:on_time_performance

CREATE DATABASE flight;
CREATE TABLE on_time_performance;

2、航班数据检查csv ,导入到sql数据库中
使用Navicat进行可视化导入,导入时,要注意文本分隔符的选择

3、查看总行数

SELECT COUNT(*) FROM on_time_performance

结果如下:

Part 2 延误分析

  1. 在航班及时率表中,所有与延迟delay相关的字段如下,找出导致延迟次数最多的是哪一个(WeatherDelay CarrierDelay NASDelay SecurityDelay LateAircraftDelay)

    1. 首先分析表格中的数据
    SELECT WeatherDelay,CarrierDelay,NASDelay,SecurityDelay,LateAircraftDelay
    FROM on_time_performance LIMIT 20
    

    从数据可以看出,当航班没有延误时,所有列数据均为NULL;当有延误时,每一列的数字代表该原因导致的延误时长;0则表示该趟航班延误,但不是该原因导致的。

    1. 提取出每一列的延误原因,次数,延误时长总和,并按照次数排序(此处需用到组合查询)
      将每一个原因的延误次数都计算出来,再组合到一起形成一个临时表,将临时表按照次数降序排序,并选择第一个,就是延误最多次数的
    SELECT 'WeatherDelay', sum(case when WeatherDelay>0 then 1 END)as delaytime FROM on_time_performance
    UNION ALL
    SELECT 'CarrierDelay', sum(CASE WHEN CarrierDelay>0 THEN 1 END) FROM on_time_performance
    UNION ALL
    SELECT 'NASDelay', sum(CASE WHEN NASDelay>0 THEN 1 END) FROM on_time_performance
    UNION ALL
    SELECT 'SecurityDelay', sum(CASE WHEN SecurityDelay>0 THEN 1 END) FROM on_time_performance
    UNION ALL
    SELECT 'LateAircraftDelay', sum(CASE WHEN LateAircraftDelay>0 THEN 1 END) FROM on_time_performance
    ORDER BY delaytime desc limit 1;
    
  2. 统计各个原因累计的延迟时间

  3. 延迟时间倒数第二的原因及累计的延迟时间:
    将上述的表进行降序排列,并使用limit选择第排名第二的

SELECT 'WeatherDelay', sum(case when WeatherDelay>0 then 1 END)as delaytime FROM on_time_performance
UNION ALL
SELECT 'CarrierDelay', sum(CASE WHEN CarrierDelay>0 THEN 1 END) FROM on_time_performance
UNION ALL
SELECT 'NASDelay', sum(CASE WHEN NASDelay>0 THEN 1 END) FROM on_time_performance
UNION ALL
SELECT 'SecurityDelay', sum(CASE WHEN SecurityDelay>0 THEN 1 END) FROM on_time_performance
UNION ALL
SELECT 'LateAircraftDelay', sum(CASE WHEN LateAircraftDelay>0 THEN 1 END) FROM on_time_performance
ORDER BY delaytime asc limit 1,1;

Part 3 建立索引

  1. 由于时间部分查询非常多,所以在时间字段上增加一个索引,以加快查询速度;
ALTER TABLE on_time_performance ADD INDEX idx_flight_date(`FlightDate`);
# 使用下列语句验证查询的速度是否有优化
SELECT AirlineID, FlightNum, Origin, Dest, DepTime, TailNum, AirTime, Distance
FROM on_time_performance WHERE `FlightDate` = '2015-01-03';

查询时间由0.326秒降低至0.004秒,提升明显

Part 4 航班的季节性探寻

1、探究一年内飞行航班的季节性

SELECT `Quarter`, count(*) AS QuarterFlightAmount FROM on_time_performance 
GROUP BY `Quarter` WITH ROLLUP;

2、航班最多的月份,有多少个航班

SELECT `Month`,count(*) 
FROM on_time_performance 
GROUP BY `Month` ORDER BY count(*) desc limit 1;
图片名称
原文地址:https://www.cnblogs.com/talented/p/14973630.html