Python笔记_第五篇_Python数据分析基础教程

Python笔记_第五篇_Python数据分析基础教程_相关安装和版本查看

1. IDE说明：

　　所有的案例用Anacoda中的Jupiter工具进行交互式讲解。

2. 版本和安装：

　　NumPy从如下网站安装：http://sourceforge.net/projects/numpy/files。

　　我们通过环境查看相关的版本。如果电脑上安装了Anaconda的话这些版本基本上都是最新版本的。

如果Anaconda的库不是最新的可以通过Prompt进行安装和更新。可以参照如下博客，非常简单。https://blog.csdn.net/xiexu911/article/details/80282440

3. 我们通过Anaconda打开Jupiter或spyder打开进行讲解。

4. 第一个简单操作：通过对比Python和NumPy的计算观察NumPy的运算速度：

from datetime import datetime
import numpy as np

# 纯Python写的程序
def pythonsum(n):
    a = []
    b = []
    c = []

    for i in range(n):
        a.append(i)
        b.append(i)
        c.append(a[i]**2 + b[i]**3)
    return c

# NumPy写的程序
def numpysum(n):
    a = np.arange(n,dtype=object) ** 2
    b = np.arange(n,dtype=object) ** 3
    c = a + b
    return c

# 进行比较测试
size = 30000

start = datetime.now()
c1 = pythonsum(size)
delta = datetime.now()-start
print("The last 2 elements of the sum",c1[-2:])
print("PythonSum elapsed time in microsecond",delta.microseconds)

start = datetime.now()
c2 = numpysum(size)
delta = datetime.now()-start
print("The last 2 elements of the sum",c2[-2:])
print("NumPySum elapsed time in microsecond",delta.microseconds)

# 10000的情况下：
#The last 2 elements of the sum [999500079996, 999800010000]
#PythonSum elapsed time in microsecond 15625
#The last 2 elements of the sum [999500079996 999800010000]
#NumPySum elapsed time in microsecond 15623

# 20000的情况下：
#The last 2 elements of the sum [7998000159996, 7999200020000]
#PythonSum elapsed time in microsecond 31247
#The last 2 elements of the sum [7998000159996 7999200020000]
#NumPySum elapsed time in microsecond 0

# 30000的情况下：
#The last 2 elements of the sum [26995500239996, 26998200030000]
#PythonSum elapsed time in microsecond 46871
#The last 2 elements of the sum [26995500239996 26998200030000]
#NumPySum elapsed time in microsecond 0

　　我们发现越是数据大NumPy的优势就能够体现出来了。注意我们用NumPy的时候规定dtype = object是为了放置数组的溢出，这个在很多教材中都没有提及。如果不写，在数值过大的时候，数组会产生溢出，导致计算的记过不一样。

第二个简单操作：通过help查看NumPy的帮助文档：

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""

import numpy as np

help(np.arange)

#Help on built-in function arange in module numpy.core.multiarray:
#
#arange(...)
#    arange([start,] stop[, step,], dtype=None)
#    
#    Return evenly spaced values within a given interval.
#    
#    Values are generated within the half-open interval ``[start, stop)``
#    (in other words, the interval including `start` but excluding `stop`).
#    For integer arguments the function is equivalent to the Python built-in
#    `range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
#    but returns an ndarray rather than a list.
#    
#    When using a non-integer step, such as 0.1, the results will often not
#    be consistent.  It is better to use ``linspace`` for these cases.
#    
#    Parameters
#    ----------
#    start : number, optional
#        Start of interval.  The interval includes this value.  The default
#        start value is 0.
#    stop : number
#        End of interval.  The interval does not include this value, except
#        in some cases where `step` is not an integer and floating point
#        round-off affects the length of `out`.
#    step : number, optional
#        Spacing between values.  For any output `out`, this is the distance
#        between two adjacent values, ``out[i+1] - out[i]``.  The default
#        step size is 1.  If `step` is specified as a position argument,
#        `start` must also be given.
#    dtype : dtype
#        The type of the output array.  If `dtype` is not given, infer the data
#        type from the other input arguments.
#    
#    Returns
#    -------
#    arange : ndarray
#        Array of evenly spaced values.
#    
#        For floating point arguments, the length of the result is
#        ``ceil((stop - start)/step)``.  Because of floating point overflow,
#        this rule may result in the last element of `out` being greater
#        than `stop`.
#    
#    See Also
#    --------
#    linspace : Evenly spaced numbers with careful handling of endpoints.
#    ogrid: Arrays of evenly spaced numbers in N-dimensions.
#    mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.
#    
#    Examples
#    --------
#np.arange(3)
#    array([0, 1, 2])
#np.arange(3.0)
#    array([ 0.,  1.,  2.])
#np.arange(3,7)
#    array([3, 4, 5, 6])
#np.arange(3,7,2)
#    array([3, 5])