把稀疏矩阵还原成原矩阵

源数据中的稀疏矩阵保存在文本文档中,成了文本格式,要还原成矩阵形式处理

源数据格式:

[u'0b85922424fb39bb566723aa3d71c', u'7afb428b3dc75', u'1', u'42-44', u'SM-N7506V', SparseVector(5834, {0: 1.0, 9: 1.0, 11: 1.0, 14: 1.0, 29: 1.0, 42: 1.0, 59: 1.0, 180: 1.0, 617: 1.0, 639: 1.0, 1356: 1.0})]

[u'08d242d4f0baba3aa6feb9ad5ea', u'3c02f9965966c117fd3f', u'0', u'33-35', u'P7', SparseVector(5834, {11: 1.0, 45: 1.0, 249: 1.0, 363: 1.0, 405: 1.0, 456: 1.0, 710: 1.0, 802: 1.0, 1053: 1.0, 4340: 1.0})]

[u'cabee1431f8bb3cf5080851835', u'a2d6926a05cc7ff70288', u'1', u'27-29', u'OPPO R9tm', SparseVector(5834, {1: 1.0, 20: 1.0, 30: 1.0, 39: 1.0, 42: 1.0, 54: 1.0, 56: 1.0, 60: 1.0, 108: 1.0, 282: 1.0, 327: 1.0, 408: 1.0, 1795: 1.0, 1907: 1.0, 2287: 1.0})]

处理过程

import os
import numpy as np

out=[]
f_tain=open("installed_applist_sample",'r')
for line in f_tain.readlines()[0:3]:
      out1=[0]*5
      line=line.replace('SparseVector','')
      samp=eval(line)
      out1[0:5]=samp[0:5]
      mat=samp[5]
      lenvec=mat[0]
      dic1=mat[1]
      klist=list(dic1.keys())
      for i in range(lenvec):
              if i in klist:
                   out1.append(1)
              else:
                   out1.append(0)
       out.append(out1)
f_tain.close()

还原后的数据形式:

[[u'85922424fb39bb566723aa3d71c', u'e104d981a7afb428b3dc75', u'1', u'42-44', u'SM-N7506V', 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, ...],[u'08d242d4f0baba3aa6feb9ad5ea', u'33c02f9965966c117fd3f', u'0', u'33-35', u'P7', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...] ...]

原文地址:https://www.cnblogs.com/zhangbojiangfeng/p/6406546.html