Food Log with Speech Recognition and NLP

1. 分词 word segmentation

国内有jieba 分词

2. Named Entity Recognition

  1. 训练自己的Model

      

How can I train my own NER model

https://nlp.stanford.edu/software/crf-faq.html#a

C:my_studyMLNLPstanford-ner-2018-02-27>java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop chinese.meal.fpp.prop
Invoked on Thu Mar 22 16:34:06 CST 2018 with arguments: -prop chinese.meal.fpp.prop
usePrevSequences=true
useClassFeature=true
useTypeSeqs2=true
useSequences=true
wordShape=chris2useLC
useTypeySequences=true
useDisjunctive=true
noMidNGrams=true
serializeTo=ner-model.ser.gz
maxNGramLeng=6
useNGrams=true
usePrev=true
useNext=true
maxLeft=1
trainFile=chinese.meal.fpp.tsv
map=word=0,answer=1
useWord=true
useTypeSeqs=true
numFeatures = 564
Time to convert docs to feature indices: 0.0 seconds
numClasses: 5 [0=O,1=TIME,2=QUANTITY,3=UNIT,4=FOOD]
numDocuments: 1
numDatums: 56
numFeatures: 564
Time to convert docs to data/labels: 0.0 seconds
numWeights: 6460
QNMinimizer called on double function of 6460 variables, using M = 25.
               An explanation of the output:
Iter           The number of iterations
evals          The number of function evaluations
SCALING        <D> Diagonal scaling was used; <I> Scaled Identity
LINESEARCH     [## M steplength]  Minpack linesearch
                   1-Function value was too high
                   2-Value ok, gradient positive, positive curvature
                   3-Value ok, gradient negative, positive curvature
                   4-Value ok, gradient negative, negative curvature
               [.. B]  Backtracking
VALUE          The current function value
TIME           Total elapsed time
|GNORM|        The current norm of the gradient
{RELNORM}      The ratio of the current to initial gradient norms
AVEIMPROVE     The average improvement / current value
EVALSCORE      The last available eval score

Iter ## evals ## <SCALING> [LINESEARCH] VALUE TIME |GNORM| {RELNORM} AVEIMPROVE EVALSCORE

Iter 1 evals 1 <D> [M 1.000E-1] 9.068E2 0.04s |4.550E1| {4.995E-1} 0.000E0 -
Iter 2 evals 2 <D> [M 1.000E0] 6.222E2 0.05s |3.525E1| {3.870E-1} 2.287E-1 -
Iter 3 evals 3 <D> [M 1.000E0] 2.386E2 0.07s |5.406E1| {5.935E-1} 9.334E-1 -
Iter 4 evals 4 <D> [M 1.000E0] 9.082E1 0.08s |1.571E1| {1.724E-1} 2.246E0 -
Iter 5 evals 5 <D> [M 1.000E0] 7.031E1 0.10s |1.181E1| {1.297E-1} 2.379E0 -
Iter 6 evals 6 <D> [M 1.000E0] 5.308E1 0.11s |1.025E1| {1.125E-1} 2.681E0 -
Iter 7 evals 7 <D> [1M 2.740E-1] 2.988E1 0.14s |7.586E0| {8.328E-2} 4.193E0 -
Iter 8 evals 9 <D> [1M 1.292E-1] 2.234E1 0.16s |6.471E0| {7.105E-2} 4.949E0 -
Iter 9 evals 11 <D> [1M 1.801E-1] 1.615E1 0.18s |5.573E0| {6.118E-2} 6.127E0 -
Iter 10 evals 13 <D> [1M 1.815E-1] 1.218E1 0.24s |4.477E0| {4.915E-2} 7.346E0 -
Iter 11 evals 15 <D> [1M 3.119E-1] 8.873E0 0.30s |4.694E0| {5.154E-2} 6.912E0 -
Iter 12 evals 17 <D> [1M 4.760E-1] 6.621E0 0.31s |2.092E0| {2.296E-2} 3.504E0 -
Iter 13 evals 19 <D> [M 1.000E0] 6.093E0 0.32s |1.906E0| {2.092E-2} 1.390E0 -
Iter 14 evals 20 <D> [M 1.000E0] 5.844E0 0.33s |9.067E-1| {9.955E-3} 1.103E0 -
Iter 15 evals 21 <D> [M 1.000E0] 5.721E0 0.33s |5.774E-1| {6.339E-3} 8.279E-1 -
Iter 16 evals 22 <D> [M 1.000E0] 5.660E0 0.34s |3.535E-1| {3.881E-3} 4.279E-1 -
Iter 17 evals 23 <D> [M 1.000E0] 5.640E0 0.35s |1.946E-1| {2.137E-3} 2.961E-1 -
Iter 18 evals 24 <D> [M 1.000E0] 5.632E0 0.36s |7.832E-2| {8.599E-4} 1.868E-1 -
Iter 19 evals 25 <D> [M 1.000E0] 5.631E0 0.38s |3.559E-2| {3.907E-4} 1.163E-1 -
Iter 20 evals 26 <D> [M 1.000E0] 5.631E0 0.39s |2.149E-2| {2.359E-4} 5.758E-2 -
Iter 21 evals 27 <D> [M 1.000E0] 5.631E0 0.41s |1.027E-2| {1.128E-4} 1.758E-2 -
Iter 22 evals 28 <D> [M 1.000E0] 5.631E0 0.42s |3.631E-3| {3.986E-5} 8.218E-3 -
Iter 23 evals 29 <D> [M 1.000E0] 5.631E0 0.44s |1.629E-3| {1.789E-5} 3.791E-3 -
Iter 24 evals 30 <D> [M 1.000E0] 5.631E0 0.45s |9.548E-4| {1.048E-5} 1.596E-3 -
Iter 25 evals 31 <D> [M 1.000E0] 5.631E0 0.45s |5.724E-4| {6.284E-6} 5.196E-4 -
Iter 26 evals 32 <D> [M 1.000E0] 5.631E0 0.47s |1.578E-4| {1.732E-6} 1.686E-4 -
QNMinimizer terminated due to average improvement: | newest_val - previous_val | / |newestVal| < TOL
Total time spent in optimization: 0.49s
CRFClassifier training ... done [0.6 sec].
Serializing classifier to ner-model.ser.gz... done.

 2. 使用训练好的Model来evaluate 一下,看看效果怎么样. 

C:my_studyMLNLPstanford-ner-2018-02-27>java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier ner-model.ser.gz -testFile chinese.meal.fpp.test.tsv
Invoked on Thu Mar 22 16:30:48 CST 2018 with arguments: -loadClassifier ner-model.ser.gz -testFile chinese.meal.fpp.test.tsv
testFile=chinese.meal.fpp.test.tsv
loadClassifier=ner-model.ser.gz
Loading classifier from ner-model.ser.gz ... done [0.1 sec].
我      O       O
今天    O       O
晚上    TIME    TIME
吃      O       O
了      O       O
两      QUANTITY        QUANTITY
盘      UNIT    UNIT
回锅肉  FOOD    FOOD

CRFClassifier tagged 8 words in 1 documents at 88.89 words per second.
         Entity P       R       F1      TP      FP      FN
           FOOD 1.0000  1.0000  1.0000  1       0       0
       QUANTITY 1.0000  1.0000  1.0000  1       0       0
           TIME 1.0000  1.0000  1.0000  1       0       0
           UNIT 1.0000  1.0000  1.0000  1       0       0
         Totals 1.0000  1.0000  1.0000  4       0       0

还不错哦!

Ref:

1. Standford NLP NER: https://nlp.stanford.edu/software/CRF-NER.html

转载请注明出处 http://www.cnblogs.com/mashuai-191/
原文地址:https://www.cnblogs.com/mashuai-191/p/8621413.html