Appendix 14: Type-Token and Letter-Statistics
Frequency Observed Freq. Words in Types Tokens % of % of % of word
Rank of Rank Frequency Total Total Types Tokens in freq.
1 2746 2746 2746 2746 64.34 13.28 13.28
2 564 1128 3310 3874 77.55 18.74 5.46
3 255 765 3565 4639 83.53 22.44 3.70
4 165 660 3730 5299 87.39 25.63 3.19
5 103 515 3833 5814 89.81 28.12 2.49
6 58 348 3891 6162 91.17 29.81 1.68
7 38 266 3929 6428 92.06 31.09 1.29
8 41 328 3970 6756 93.02 32.68 1.59
9 24 216 3994 6972 93.58 33.73 1.04
10 28 280 4022 7252 94.24 35.08 1.35
11 19 209 4041 7461 94.68 36.09 1.01
12 19 228 4060 7689 95.13 37.19 1.10
13 13 169 4073 7858 95.43 38.01 0.82
14 10 140 4083 7998 95.67 38.69 0.68
15 9 135 4092 8133 95.88 39.34 0.65
16 10 160 4102 8293 96.11 40.12 0.77
17 7 119 4109 8412 96.27 40.69 0.58
18 8 144 4117 8556 96.46 41.39 0.70
19 10 190 4127 8746 96.70 42.31 0.92
20 1 20 4128 8766 96.72 42.40 0.10
21 11 231 4139 8997 96.98 43.52 1.12
22 3 66 4142 9063 97.05 43.84 0.32
23 4 92 4146 9155 97.14 44.28 0.45
24 3 72 4149 9227 97.21 44.63 0.35
25 4 100 4153 9327 97.31 45.12 0.48
26 4 104 4157 9431 97.40 45.62 0.50
27 3 81 4160 9512 97.47 46.01 0.39
28 4 112 4164 9624 97.56 46.55 0.54
29 4 116 4168 9740 97.66 47.11 0.56
30 4 120 4172 9860 97.75 47.70 0.58
31 2 62 4174 9922 97.80 47.99 0.30
32 1 32 4175 9954 97.82 48.15 0.15
33 4 132 4179 10086 97.91 48.79 0.64
34 4 136 4183 10222 98.01 49.45 0.66
35 2 70 4185 10292 98.06 49.78 0.34
36 3 108 4188 10400 98.13 50.31 0.52
37 1 37 4189 10437 98.15 50.49 0.18
38 1 38 4190 10475 98.17 50.67 0.18
39 3 117 4193 10592 98.24 51.24 0.57
42 1 42 4194 10634 98.27 51.44 0.20
44 4 176 4198 10810 98.36 52.29 0.85
45 2 90 4200 10900 98.41 52.73 0.44
46 1 46 4201 10946 98.43 52.95 0.22
47 2 94 4203 11040 98.48 53.40 0.45
48 4 192 4207 11232 98.57 54.33 0.93
49 3 147 4210 11379 98.64 55.04 0.71
50 1 50 4211 11429 98.66 55.28 0.24
51 1 51 4212 11480 98.69 55.53 0.25
53 2 106 4214 11586 98.73 56.04 0.51
54 1 54 4215 11640 98.76 56.31 0.26
55 2 110 4217 11750 98.81 56.84 0.53
60 1 60 4218 11810 98.83 57.13 0.29
62 1 62 4219 11872 98.85 57.43 0.30
63 3 189 4222 12061 98.92 58.34 0.91
68 1 68 4223 12129 98.95 58.67 0.33
71 2 142 4225 12271 98.99 59.36 0.69
74 1 74 4226 12345 99.02 59.72 0.36
78 1 78 4227 12423 99.04 60.09 0.38
81 1 81 4228 12504 99.06 60.48 0.39
86 1 86 4229 12590 99.09 60.90 0.42
88 2 176 4231 12766 99.13 61.75 0.85
90 1 90 4232 12856 99.16 62.19 0.44
93 1 93 4233 12949 99.18 62.64 0.45
95 1 95 4234 13044 99.20 63.10 0.46
96 1 96 4235 13140 99.23 63.56 0.46
105 1 105 4236 13245 99.25 64.07 0.51
107 1 107 4237 13352 99.27 64.59 0.52
108 1 108 4238 13460 99.30 65.11 0.52
111 1 111 4239 13571 99.32 65.65 0.54
116 1 116 4240 13687 99.34 66.21 0.56
122 2 244 4242 13931 99.39 67.39 1.18
125 1 125 4243 14056 99.41 67.99 0.60
127 1 127 4244 14183 99.44 68.61 0.61
136 1 136 4245 14319 99.46 69.26 0.66
138 1 138 4246 14457 99.48 69.93 0.67
144 1 144 4247 14601 99.51 70.63 0.70
154 1 154 4248 14755 99.53 71.37 0.74
158 1 158 4249 14913 99.55 72.14 0.76
162 1 162 4250 15075 99.58 72.92 0.78
174 1 174 4251 15249 99.60 73.76 0.84
178 1 178 4252 15427 99.63 74.62 0.86
181 1 181 4253 15608 99.65 75.50 0.88
188 2 376 4255 15984 99.70 77.32 1.82
193 1 193 4256 16177 99.72 78.25 0.93
197 1 197 4257 16374 99.74 79.20 0.95
212 1 212 4258 16586 99.77 80.23 1.03
240 1 240 4259 16826 99.79 81.39 1.16
274 1 274 4260 17100 99.81 82.72 1.33
370 2 740 4262 17840 99.86 86.30 3.58
387 1 387 4263 18227 99.88 88.17 1.87
423 1 423 4264 18650 99.91 90.21 2.05
443 1 443 4265 19093 99.93 92.36 2.14
483 1 483 4266 19576 99.95 94.69 2.34
506 1 506 4267 20082 99.98 97.14 2.45
591 1 591 4268 20673 100.00 100.00 2.86
Number of Types = 4268
Number of Tokens = 20673
Type/Token ratio = 0.206
Token/Type ratio = 4.844
Hapax Legomena = 2746
Hapax Dislegomena = 564
Hapax Legomena/Dislegomena ratio = 4.8688
Hapax Legomena/Number of Types = 0.6434
Hapax Legomena/Number of Tokens = 0.1328
Hapax Legomena cubed/Types squared = 1136.7181
Variance ( S.D. squared ) = 580.4314
Standard Deviation (S.D.) = 24.0921
Coefficient of skewness = 14.5953
Coefficient of kurtosis = 261.4507
Herdan's characteristic = 0.0761
Yule's characteristic = 602.9476
Carroll TTR (Types / Sqrt of 2 X Tokens) = 20.9898
Most Frequent word "and" occurred 591 times
repeat rate (Tokens / frequency most frequent word) = 34.9797
Word Length Statistics
----------------------
Word Freq. % Percentage
Len 10 20 30 40 50
+----+----+----+----+----+----+----+----+----+----+
1 705 3.41 |***
2 3523 17.04 |*****************
3 4022 19.46 |*******************
4 4494 21.74 |**********************
5 3293 15.93 |****************
6 1921 9.29 |*********
7 1261 6.10 |******
8 772 3.73 |****
9 369 1.78 |**
10 188 0.91 |*
11 59 0.29 |
12 52 0.25 |
13 8 0.04 |
14 3 0.01 |
15 2 0.01 |
16 1 0.00 |
Total letters (Tokens) = 87453
Total Words (Types) = 20673
Type/Token ratio = 0.2364
Mean word length = 4.2303
Variance (S.D. squared) = 3.9467
Standard Deviation (S.D.)= 1.9866
Herdan's characteristic = 0.0033
First letter in words statistics
--------------------------------
Letter Freq. % Percentage
10 20 30 40 50
+----+----+----+----+----+----+----+----+----+----+
a 1747 8.45 |********
b 1193 5.77 |******
c 560 2.71 |***
d 789 3.82 |****
e 408 1.97 |**
_ 0 0.00 |
f 914 4.42 |****
g 373 1.80 |**
h 1020 4.93 |*****
i 1424 6.89 |*******
j 0 0.00 |
k 119 0.58 |*
l 769 3.72 |****
m 1336 6.46 |******
n 615 2.97 |***
o 1043 5.05 |*****
_ 0 0.00 |
p 572 2.77 |***
q 30 0.15 |
r 329 1.59 |**
s 1816 8.78 |*********
t 3403 16.46 |****************
u 0 0.00 |
v 298 1.44 |*
w 1447 7.00 |*******
x 0 0.00 |
y 353 1.71 |**
z 1 0.00 |
0 0 0.00 |
1 1 0.00 |
2 22 0.11 |
3 21 0.10 |
4 13 0.06 |
5 12 0.06 |
6 11 0.05 |
7 11 0.05 |
8 11 0.05 |
9 12 0.06 |
Sorted by frequency
Letter Freq. % Percentage
10 20 30 40 50
+----+----+----+----+----+----+----+----+----+----+
t 3403 16.46 |****************
s 1816 8.78 |*********
a 1747 8.45 |********
w 1447 7.00 |*******
i 1424 6.89 |*******
m 1336 6.46 |******
b 1193 5.77 |******
o 1043 5.05 |*****
h 1020 4.93 |*****
f 914 4.42 |****
d 789 3.82 |****
l 769 3.72 |****
n 615 2.97 |***
p 572 2.77 |***
c 560 2.71 |***
e 408 1.97 |**
g 373 1.80 |**
y 353 1.71 |**
r 329 1.59 |**
v 298 1.44 |*
k 119 0.58 |*
q 30 0.15 |
2 22 0.11 |
3 21 0.10 |
4 13 0.06 |
5 12 0.06 |
9 12 0.06 |
7 11 0.05 |
6 11 0.05 |
8 11 0.05 |
z 1 0.00 |
1 1 0.00 |
Total initial letters (Tokens) = 20673
Total different letters (Types) = 38
Type/Token ratio = 0.0018
Arithmetric Mean = 544.0263
Standard Deviation (S.D.) = 732.5250
Herdan's characteristic = 0.2184
Repeat rate for initial letter "t" = 6.07
Final letter in words statistics
--------------------------------
Letter Freq. % Percentage
10 20 30 40 50
+----+----+----+----+----+----+----+----+----+----+
a 220 1.06 |*
b 9 0.04 |
c 3 0.01 |
d 1777 8.60 |*********
e 5323 25.75 |**************************
_ 1 0.00 |
f 525 2.54 |***
g 493 2.38 |**
h 927 4.48 |****
i 385 1.86 |**
j 0 0.00 |
k 57 0.28 |
l 636 3.08 |***
m 226 1.09 |*
n 1145 5.54 |******
o 893 4.32 |****
_ 0 0.00 |
p 40 0.19 |
q 0 0.00 |
r 1044 5.05 |*****
s 2196 10.62 |***********
t 2505 12.12 |************
u 362 1.75 |**
v 1 0.00 |
w 281 1.36 |*
x 3 0.01 |
y 1454 7.03 |*******
z 0 0.00 |
0 16 0.08 |
1 0 0.00 |
2 27 0.13 |
3 25 0.12 |
4 21 0.10 |
5 15 0.07 |
6 14 0.07 |
7 15 0.07 |
8 17 0.08 |
9 17 0.08 |
Sorted by frequency
Letter Freq. % Percentage
10 20 30 40 50
+----+----+----+----+----+----+----+----+----+----+
e 5323 25.75 |**************************
t 2505 12.12 |************
s 2196 10.62 |***********
d 1777 8.60 |*********
y 1454 7.03 |*******
n 1145 5.54 |******
r 1044 5.05 |*****
h 927 4.48 |****
o 893 4.32 |****
l 636 3.08 |***
f 525 2.54 |***
g 493 2.38 |**
i 385 1.86 |**
u 362 1.75 |**
w 281 1.36 |*
m 226 1.09 |*
a 220 1.06 |*
k 57 0.28 |
p 40 0.19 |
2 27 0.13 |
3 25 0.12 |
4 21 0.10 |
8 17 0.08 |
9 17 0.08 |
0 16 0.08 |
5 15 0.07 |
7 15 0.07 |
6 14 0.07 |
b 9 0.04 |
x 3 0.01 |
c 3 0.01 |
_ 1 0.00 |
v 1 0.00 |
Total final letters (Tokens) = 20673
Total different letters (Types) = 38
Type/Token ratio = 0.0018
Arithmetric Mean = 544.0263
Standard Deviation (S.D.) = 1025.4043
Herdan's characteristic = 0.3058
Repeat rate for final letter "e" = 3.88
All letters in words statistics
-------------------------------
Letter Freq. % in all Initial % in all Final % in all
a 5822 6.66 1747 30.01 220 3.78
b 1421 1.62 1193 83.95 9 0.63
c 1614 1.85 560 34.70 3 0.19
d 3321 3.80 789 23.76 1777 53.51
e 12354 14.13 408 3.30 5323 43.09
_ 1 0.00 0 0.00 1 100.00
f 1950 2.23 914 46.87 525 26.92
g 1601 1.83 373 23.30 493 30.79
h 5883 6.73 1020 17.34 927 15.76
i 5851 6.69 1424 24.34 385 6.58
j 0 0.00 0 0.00 0 0.00
k 693 0.79 119 17.17 57 8.23
l 3664 4.19 769 20.99 636 17.36
m 2429 2.78 1336 55.00 226 9.30
n 5382 6.15 615 11.43 1145 21.27
o 6618 7.57 1043 15.76 893 13.49
_ 1 0.00 0 0.00 0 0.00
p 1255 1.44 572 45.58 40 3.19
q 59 0.07 30 50.85 0 0.00
r 4953 5.66 329 6.64 1044 21.08
s 5896 6.74 1816 30.80 2196 37.25
t 8357 9.56 3403 40.72 2505 29.97
u 3360 3.84 0 0.00 362 10.77
v 342 0.39 298 87.13 1 0.29
w 2201 2.52 1447 65.74 281 12.77
x 78 0.09 0 0.00 3 3.85
y 2022 2.31 353 17.46 1454 71.91
z 25 0.03 1 4.00 0 0.00
0 27 0.03 0 0.00 16 59.26
1 1 0.00 1 100.00 0 0.00
2 48 0.05 22 45.83 27 56.25
3 46 0.05 21 45.65 25 54.35
4 41 0.05 13 31.71 21 51.22
5 32 0.04 12 37.50 15 46.88
6 25 0.03 11 44.00 14 56.00
7 25 0.03 11 44.00 15 60.00
8 27 0.03 11 40.74 17 62.96
9 28 0.03 12 42.86 17 60.71
Sorted by frequency
Letter Freq. % Percentage
10 20 30 40 50
+----+----+----+----+----+----+----+----+----+----+
e 12354 14.13 |**************
t 8357 9.56 |**********
o 6618 7.57 |********
s 5896 6.74 |*******
h 5883 6.73 |*******
i 5851 6.69 |*******
a 5822 6.66 |*******
n 5382 6.15 |******
r 4953 5.66 |******
l 3664 4.19 |****
u 3360 3.84 |****
d 3321 3.80 |****
m 2429 2.78 |***
w 2201 2.52 |***
y 2022 2.31 |**
f 1950 2.23 |**
c 1614 1.85 |**
g 1601 1.83 |**
b 1421 1.62 |**
p 1255 1.44 |*
k 693 0.79 |*
v 342 0.39 |
x 78 0.09 |
q 59 0.07 |
2 48 0.05 |
3 46 0.05 |
4 41 0.05 |
5 32 0.04 |
9 28 0.03 |
8 27 0.03 |
0 27 0.03 |
7 25 0.03 |
z 25 0.03 |
6 25 0.03 |
_ 1 0.00 |
_ 1 0.00 |
1 1 0.00 |
Total all letters (Tokens) = 87453
Total different letters (Types) = 38
Type/Token ratio = 0.0004
Arithmetric Mean = 2301.3947
Standard Deviation (S.D.) = 2939.7315
Herdan's characteristic = 0.2072
Repeat rate for all letter "e" = 7.08