WordNet简介

接口

from nltk.corpus import wordnet as wn  #用nltk的接口

wn.synsets('dog')  #synsets的查询(一个synset由lemma.POS.number组成,代表一个语义);注意synset和synsets 的区别,synsets是list,synset是一个object 

  >> [Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'),

    Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

wn._synset_from_pos_and_offset('n',4543158)  #用POS和offset序号来查询,返回一个synset

  >> Synset('wagon.n.01')

wn.synset('dog.n.01').lemma_names()  #返回一个synset的所有lemma_name

  >> ['dog', 'domestic_dog', 'Canis_familiaris']

lemma和synset的关系:

  

Synset数量:

  total:117659

  noun:82115

  verb:13767

  adjective:18156

  ( ADJ, ADJ_SAT, ADV, NOUN, VERB = 'a', 's', 'r', 'n', 'v' )

Synset之间的关系:(对数在noun+verb+adj上测的)

  • hypernyms, instance_hypernyms:89089 对
  • hyponyms, instance_hyponyms (hyponyms和hyponyms相反)
  • member_holonyms, substance_holonyms, part_holonyms :12293 797 9097 对
  • member_meronyms, substance_meronyms, part_meronyms (holonyms和meronyms相反)
  • attributes:1278 对
  • entailments:408 对
  • causes:220 对
  • also_sees
  • verb_groups
  • similar_tos

参考:

http://www.nltk.org/howto/wordnet.html

http://www.nltk.org/index.html

http://www.nltk.org/_modules/nltk/corpus/reader/wordnet.html (nltk中WordNet源码)

原文地址:https://www.cnblogs.com/sbj123456789/p/9673829.html