I just downloaded the word-frequency list for the British National Corpus (ftp://ftp.itri.bton.ac.uk/bnc/all.num.gz), in search of an easy way to divide my 3,650-word jotto dictionary into reasonable and rare sections.
Downloaded file big! Uncompress, examine, awk out frequency and word columns, grep out the 5-letter words, word-count. Over 56,000 words -- aaaah! (I did in fact shout "aaaah" at this point.)
Examine tail of file to see what I'm dealing with, and see:
[11:46pm] [bang] ~/jotto/dicts >tail all.num.tmp
1 aamer
1 aalto
1 aalge
1 aahed
1 aabcc
1 aabba
1 aaahs
1 aaahh
1 aaaaw
1 aaaah
(Well, I thought it was funny. :-P)
Downloaded file big! Uncompress, examine, awk out frequency and word columns, grep out the 5-letter words, word-count. Over 56,000 words -- aaaah! (I did in fact shout "aaaah" at this point.)
Examine tail of file to see what I'm dealing with, and see:
[11:46pm] [bang] ~/jotto/dicts >tail all.num.tmp
1 aamer
1 aalto
1 aalge
1 aahed
1 aabcc
1 aabba
1 aaahs
1 aaahh
1 aaaaw
1 aaaah
(Well, I thought it was funny. :-P)