[Python学习笔记-007] 使用PyEnchant检查英文单词

最近在教儿子做自然拼读,跟他玩了一个单词游戏,就是利用简单的枚举找出适合小朋友学习的两个字母的单词。人工找寻难免有疏漏之处,这里使用PyEnchant给出一个简单的脚本。

01 - foo.py

 1 #!/usr/bin/python3
 2 """
 3     A simple script to check a string is an English word
 4 
 5     1. download PyEnchant from https://pypi.org/project/pyenchant/
 6     2. save pyenchant-2.0.0.tar.gz to /tmp
 7     3. tar zxf pyenchant-2.0.0.tar.gz
 8     4. export PYTHONPATH=/tmp/pyenchant-2.0.0:$PYTHONPATH
 9     5. ./foo.py <string>
10 """
11 
12 import sys
13 import enchant
14 
15 
16 def is_english_word(word):
17     d_en = enchant.Dict("en_US")
18     return d_en.check(word)
19 
20 
21 def get_alphabet():
22     l_alph = []
23     for i in range(26):
24         l_alph.append(chr(ord('a') + i))
25     return l_alph
26 
27 
28 def main(argc, argv):
29     if argc != 2:
30         sys.stderr.write("Usage: %s <char>
" % argv[0])
31         return 1
32 
33     char_in = argv[1]
34 
35     l_word1 = []
36     l_alph = get_alphabet()
37     for char in l_alph:
38         word = char_in + char
39         if is_english_word(word):
40             l_word1.append(word)
41     print(l_word1)
42 
43     l_word2 = []
44     for char in l_alph:
45         word = char_in + char
46         word = word.upper()
47         if is_english_word(word):
48             if word.lower() in l_word1:
49                 continue
50             l_word2.append(word)
51     print(l_word2)
52     return 0
53 
54 if __name__ == '__main__':
55     sys.exit(main(len(sys.argv), sys.argv))

很简单,核心代码就是:

def is_english_word(word):
    d_en = enchant.Dict("en_US")
    return d_en.check(word)

02 - 测试foo.py

kaiba$ ./foo.py 'a'
['ab', 'ac', 'ad', 'ah', 'am', 'an', 'as', 'at', 'av', 'aw', 'ax']
['AA', 'AF', 'AG', 'AI', 'AK', 'AL', 'AP', 'AR', 'AU', 'AZ']
kaiba$ ./foo.py 'b'
['be', 'bf', 'bi', 'bk', 'bl', 'bu', 'bx', 'by']
['BA', 'BB', 'BC', 'BM', 'BO', 'BP', 'BR', 'BS']
kaiba$ ./foo.py 'be'
['bed', 'bee', 'beg', 'bet', 'bey']
['BEN']
kaiba$ ./foo.py 't'
['ta', 'ti', 'tn', 'to', 'tr', 'ts']
['TB', 'TC', 'TD', 'TE', 'TH', 'TL', 'TM', 'TU', 'TV', 'TX', 'TY']
kaiba$ ./foo.py 'tea'
['teak', 'teal', 'team', 'tear', 'teas', 'teat']
[]

附记 - foo.sh (直接egrep /usr/share/dict/words)

 1 #!/bin/bash
 2 
 3 function is_english_word
 4 {
 5     typeset word=${1?"*** str, e.g. a"}
 6     egrep "^$word$" /usr/share/dict/words > /dev/null 2>&1
 7     return $?
 8 }
 9 
10 (( $# != 1 )) && echo "Usage: $0 <str prefix>" >&2 && exit 1
11 str_prefix=$1
12 
13 lwords=""
14 uwords=""
15 for c in {a..z}; do
16     typeset -l lword=$str_prefix$c
17     typeset -u uword=$lword
18     is_english_word $lword && lwords+="$lword "
19     is_english_word $uword && uwords+="$uword "
20 done
21 
22 lwords=$(echo $lwords)
23 uwords=$(echo $uwords)
24 rc=1
25 [[ -n $lwords ]] && echo $lwords && rc=0
26 [[ -n $uwords ]] && echo $uwords && rc=0 
27 exit $rc
  • 运行foo.sh
$ for c in {a..z}; do ./foo.sh $c; echo; done
aa ab ac ad ae af ag ah ai ak al am an ap aq ar as at av aw ax ay az
AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AY AZ

ba bb bd be bf bg bi bk bl bm bn bo bp br bs bt bu bv bx by bz
BA BB BC BD BE BF BG BH BI BL BM BN BO BP BR BS BT BU BV BW BX

ca cb cc cd ce cf cg ch ck cl cm co cp cq cr cs ct cu cv cy
CA CB CC CD CE CF CG CH CI CJ CL CM CN CO CP CQ CR CS CT CU CV CW CY CZ

da db dc dd de dg di dj dk dl dm dn do dp dr ds dt du dx dy dz
DA DB DC DD DE DF DG DH DI DJ DK DM DN DO DP DQ DR DS DT DU DV DW DX DZ

ea ec ed ee ef eg eh el em en eo ep eq er es et eu ew ex ey
EA EC ED EE EF EG EI EL EM EO EP EQ ER ES ET EV EW

fa fb fc fe ff fg fi fl fm fn fo fp fr fs ft fu fv fw fy fz
FA FB FC FD FE FF FI FL FM FO FP FR FS FT FV FW FX FY

ga gd ge gi gl gm gn go gp gr gs gt gu gv
GA GB GC GD GE GG GH GI GM GN GO GP GQ GR GS GT GU GW

ha hb hd he hf hg hi hl hm ho hp hq hr hs ht hv hw hy
HA HB HC HD HE HF HG HH HI HJ HK HL HM HO HP HQ HR HS HT HU HV HW HZ

ia ib ic id ie if ii ik il im in io iq ir is it iv iw ix
IA IB IC ID IE IF IG IL IM IN IO IP IQ IR IS IT IU IV IW IX

ja jg jo jr js jt
JA JC JD JI JJ JO JP JV

ka kb kc kg ki kl km kn ko kr kt kv kw ky
KB KC KD KE KG KI KN KO KP KR KS KT KV KW KY

la lb lc ld le lf lg lh li ll lm ln lo lp lr ls lt lu lv lx ly
LA LB LC LD LE LF LG LH LI LJ LL LM LO LP LR LS LT LU LV LW LZ

ma mb mc md me mf mg mh mi mk ml mm mn mo mp mr ms mt mu mv mw my
MA MB MC MD ME MF MG MH MI MJ ML MM MN MO MP MR MS MT MU MV MW MX MY

na nb nd ne ng ni nj nl nm no np nr ns nt nu nv ny
NA NB NC ND NE NF NG NH NI NJ NL NM NP NQ NS NT NU NV NW NY NZ

ob oc od oe of og oh ok ol om on op or os ot ow ox oy oz
OA OB OC OD OE OF OG OH OK OL OM ON OO OP OR OS OT OU OV OW

pa pc pd pe pf pg ph pi pk pl pm po pp pq pr ps pt pu
PA PB PC PD PE PF PG PH PI PK PL PM PN PO PP PQ PR PS PT PU PV PW PX PY

qe qh ql qm qn qp qr qs qt qu qv qy
QA QB QC QD QE QF QM QN QP QR QS QV

ra rc rd re rf rg rh rm rn ro rs rt
RA RB RC RD RE RF RH RI RJ RL RM RN RO RP RQ RR RS RT RU RV RW RX

sa sb sc sd se sf sg sh si sk sl sm sn so sp sq sr ss st su sv sw
SA SB SC SD SE SF SG SI SJ SL SM SN SO SP SR SS ST SU SV SW SX SY

ta tb tc te tg th ti tk tm tn to tp tr ts tu tv tx
TA TB TC TD TE TG TH TI TL TM TN TO TP TR TS TT TU TV TW TX

uc ug uh ui um un up ur us ut ux
UA UB UC UG UH UI UK UL UN UP UR US UT UU UV UW

va vb vc vd vg vi vl vo vp vr vs vt vv
VA VB VC VD VE VF VG VI VJ VL VM VN VO VP VR VS VT VU VV VW

wa wb wc wd we wf wg wh wi wk wl wm wo wr ws wt wy
WA WB WC WD WF WG WH WI WL WM WO WP WR WS WU WV WW WY

xc xd xi xr xs xu xw xx
XA XB XD XL XN XO XP XQ XT

ya yd ye yi ym yn yo yr ys yt
YA YB YP YT YU YV YY

za zn zo zs
ZA ZB ZD ZG ZI ZK ZT ZZ
原文地址:https://www.cnblogs.com/idorax/p/12003057.html