python使用deepwalk模型算节点相似度

待整理
github:https://github.com/prateekjoshi565/DeepWalk
方法:
https://blog.csdn.net/gdh756462786/article/details/79108665/

一、直接依赖requirements.txt会有问题,

ImportError: cannot import name 'Vocab' from 'gensim.models.word2vec' 

需要把gensim的版本改成3.8.3

 

二、具体过程

下载源代码
https://github.com/phanein/deepwalk

数据集的定义
http://leitang.net/social_dimension.html

核心代码

walks = graph.build_deepwalk_corpus(G, num_paths=args.number_walks, path_length=args.walk_length, alpha=0, rand=random.Random(args.seed))

print("Training...")

model = Word2Vec(walks, size=args.representation_size, window=args.window_size, min_count=0, workers=args.workers)


安装

cd deepwalk-master
pip install -r requirements.txt
python setup.py install


复现试验结果
1. BlogCatalog dataset

生成Embedding

deepwalk --format mat --input example_graphs/blogcatalog.mat --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/blogcatalog.embeddings


评估

python example_graphs/scoring.py --emb example_graphs/blogcatalog.embeddings --network example_graphs/blogcatalog.mat --num-shuffle 10 --all


2. Karate dataset

生成Embedding

--format默认.adjlist文件

deepwalk --input example_graphs/karate.adjlist --max-memory-data-size 0 --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/karate.embeddings


评估

--network需要.mat文件

option如下:

usage: scoring [-h] --emb EMB --network NETWORK
[--adj-matrix-name ADJ_MATRIX_NAME]
[--label-matrix-name LABEL_MATRIX_NAME]
[--num-shuffles NUM_SHUFFLES] [--all]

optional arguments:
-h, --help show this help message and exit
--emb EMB Embeddings file (default: None)
--network NETWORK A .mat file containing the adjacency matrix and node
labels of the input network. (default: None)
--adj-matrix-name ADJ_MATRIX_NAME
Variable name of the adjacency matrix inside the .mat
file. (default: network)
--label-matrix-name LABEL_MATRIX_NAME
Variable name of the labels matrix inside the .mat
file. (default: group)
--num-shuffles NUM_SHUFFLES
Number of shuffles. (default: 2)
--all The embeddings are evaluated on all training percents
from 10 to 90 when this flag is set to true. By
default, only training percents of 10, 50 and 90 are
used. (default: False)





参考:https://blog.csdn.net/YizhuJiao/article/details/81095346

github:https://github.com/phanein/deepwalk

原文地址:https://www.cnblogs.com/StarZhai/p/15545387.html