3.1日学习笔记|3.2日学习笔记

2.fastText适用方法及原理理解

https://blog.csdn.net/feilong_csdn/article/details/88655927 (待看)

3-2————————

1.IMDB项目读入

class IMDBDataset(Dataset):
    # Initialize your data, download, etc.

    def __init__(self, filename="./data/trainData.tsv"):
        self.len = 0
        with open(filename, 'rt') as f:
            reader=f.readlines()
        reader=reader[1:]
        self.sentences=[]
        self.labels=[]
        for r in reader:
            line=r.strip().split('	')
            self.sentences.append([line[2].split()])
            self.labels.append(int(line[1]))
        self.len=len(self.labels)

    def __getitem__(self, index):
        return self.sentences[index], self.labels[index]

    def __len__(self):
        return self.len

3-3_________

1.原来torch.tensor是不能处理str的,只能对整形的进行处理,否则:

torch.tensor(['1'])

  File "<ipython-input-14-602ae6edb6c1>", line 1, in <module>
    torch.tensor(['1'])
ValueError: too many dimensions 'str'

2.如果针对最后的hidden输出,size为(numlayes*directions, batch_size, hidden_size)

a=np.random.randn(2,3,4)
a
Out[18]: 
array([[[-0.66829249, -0.76499464, -0.74026101, -0.48124549],
        [-2.178205  ,  1.60187794, -0.30890059,  0.33686713],
        [ 2.05912833,  0.37473968, -0.56136807, -0.72185835]],
       [[-0.91576137, -0.86514116,  0.03120998,  0.23659517],
        [-2.83921972,  2.10052047, -0.61340947,  1.20655193],
        [ 0.60929364,  1.22116812,  0.39065162, -0.53161084]]])
a[-1]
Out[19]: 
array([[-0.91576137, -0.86514116,  0.03120998,  0.23659517],
       [-2.83921972,  2.10052047, -0.61340947,  1.20655193],
       [ 0.60929364,  1.22116812,  0.39065162, -0.53161084]])

只选择最后的话,就相当于是最后一个batch的最后一个word的隐层输出。

原文地址:https://www.cnblogs.com/BlueBlueSea/p/12392881.html