Python读取文件内容与存储

Python读取与存储文件内容

一、.csv文件

读取：

import pandas as pd
souce_data = pd.read_csv(File_Path)

　　其中File_path是文件的路径

储存：

import pandas as pd
souce_data.to_csv(file_path)

其中，souce_data格式应该为series或者Dataframe格式

二、Excel文件

读取：

import xlrd as xl
data_excel = xlrd.open_workbook(file_path)
souce_data = data_excel.sheet_by_name(sheet)
row_len = souce_data.nrows
col_len = souce_data.ncols
for i in range(row_len):
    for j in range(col_len):
        print(souce_data.cell_value(i,j))

　　其中，open_workbook（file_path）函数是打开文件file_path，data_excel.sheet_by_name(sheet)函数是打开sheet中的文件并赋值给souce_data。souce_data.nrows与souce_data.ncols是分别计算表格的行数与列数。

三、txt文件

读取：　　

Python对txt的内容读取有三类方法：read()、readline()、readlines()，这三种方法各有利弊，下面逐一介绍其使用方法和利弊。

1.read()：

　　read()函数通过一次性读取文件的所有内容放在一个大字符串中，即存在内存中

with open(file_path) as f:
    souce_data = f.read()
    print(souce_data)

　　　　read()的优势：方便、简单；一次性独读出文件放在一个大字符串中，速度最快。

　　　　read()的弊端：文件过大的时候，占用内存会过大

2.readline()：

　　readline()逐行读取文本，结果是一个list

1 with open(file_path) as f:
2     line = f.readline()
3     while line:
4         print(line)
5         line = f.readline()

　　　　readline()的优势：占用内存小，逐行读取。

　　　　readline()的弊端：由于是逐行读取，读取速度比较慢

3.readlines()：

　　readlines()一次性读取文本的所有内容，结果是一个list

with open(file) as f:
    for line in f.readlines():
         print line

　　这种方法读取的文本内容，每行文本末尾都会带一个' '换行符 (可以使用L.rstrip(' ')去掉换行符

　　　　readlines()的利端：一次性读取文本内容，速度比较快

　　　　readlines()的弊端：随着文本的增大，占用内存会越来越多

储存:

with open(file_path,'w') as f:
    f.write(souce_data)

四、储存与读取json文件

存储：

import json
with open(file_path,'w') as cf:
    cf.write(json.dumps(souce_data))

读取：

import json
with open(file_path,'r') as rf:
    souce_data = rf.read()
souce_data = eval(souce_data)