かな出現頻度調査

2003/05/17

目的

調査方法

結果

かな出現頻度グラフ

出現度数

 aiueoyayuyo
 33361618913789199743401522584261844375
K884749966537309948367734064329375
S36968918664325412684488717123726924
T7401200640756378724429423235527985
N57125503681511687539372420084
H504890011656551351171419268
M68921733607166238257172414767
R41173584370728901658353967316703
W20513 32713   4770
G482755849057321862013519006
Z62918726384133223403916905295
D374451216291237102112535
B1649734109032143608254263
P41715737419928306151451
小文字12916379620122303
F12617 1259000214
V100010002
5862247191393412864141204282427186779227320
13313
5426
2889

パーセンテージ

 aiueoyayuyo
 1.34%6.50%5.54%0.80%1.74%0.61%0.23%1.05%17.83%
K3.55%2.01%2.63%1.24%1.94%0.03%0.14%0.26%11.80%
S1.48%3.58%2.67%1.02%1.08%0.20%0.29%0.50%10.82%
T2.97%0.81%1.64%2.56%2.91%0.12%0.09%0.14%11.24%
N2.29%2.21%0.03%0.61%2.76%0.02%0.15%0.00%8.07%
H2.03%0.36%0.47%0.26%0.54%0.00%0.00%0.06%3.72%
M2.77%0.70%0.24%0.67%1.54%0.00%0.01%0.01%5.93%
R1.65%1.44%1.49%1.16%0.67%0.01%0.02%0.27%6.71%
W0.82%0.00% 0.00%1.09%   1.92%
G1.94%0.22%0.20%0.23%0.88%0.01%0.00%0.14%3.62%
Z0.25%0.75%0.26%0.17%0.13%0.14%0.16%0.28%2.13%
D1.50%0.00%0.05%2.53%0.95%0.00%0.00%0.00%5.04%
B0.66%0.29%0.44%0.13%0.18%0.00%0.00%0.01%1.71%
P0.17%0.06%0.15%0.08%0.11%0.00%0.00%0.01%0.58%
小文字0.05%0.01%0.01%0.04%0.01%0.00%0.00%0.00%0.12%
F0.05%0.01% 0.00%0.02%0.00%0.00%0.00%0.09%
V0.00%0.00%0.00%0.00%0.00%0.00%0.00%0.00%0.00%
23.55%18.96%15.80%11.50%16.55%1.13%1.09%2.72%91.31%
5.35%
2.18%
1.16%

考察

日本語入力の目的はほとんどがメール書きなので、この統計は私の利用実態にかなり近い。メール文らしい特徴も見られる。

解析スクリプト

メールボックスから本文を抽出し、カナの出現度数を数える Ruby スクリプト。別途 KAKASITMail が必要。

関連研究