# Learning BitxBit – Class 4 – N-Grams

First of, here are some stats for 3-Grams based on different texts. The text files were downloaded from Project Gutenberg.

Children’s Story of War, Volume 3 by James Edward Parrott:

# 3-grams in model: 114186
NGRAM frequencies:
371.0 – * * *
286.0 – , and the
105.0 – . It was
99.0 – [ Illustration :
95.0 - . On the
91.0 - . [ Illustration
89.0 - , however ,
88.0 - . * *
81.0 - of the enemy
72.0 - enemy ' s
69.0 - the enemy '
66.0 - , and they
64.0 - . The Germans

unfortunately this text contains a lot of indications for illustration.

The Kamasutra:

# 3-grams in model: 47126
NGRAM frequencies:
119.0 - , and the
97.0 - , it is
87.0 - it is called
74.0 - ) . When
68.0 - . When a
67.0 - , she should
61.0 - is called the
60.0 - , he should
56.0 - called the "
55.0 - is called a
52.0 - ] [ Footnote
48.0 - . ] [
47.0 – go – between
47.0 – . In the
46.0 – as follows :
43.0 – , viz .
41.0 – , and should
41.0 – . ” (
38.0 – , and then
38.0 – . A woman
36.0 – . She should
34.0 – on account of
33.0 – her , and
31.0 – on the subject
31.0 – of the man
31.0 – the man should

Now we recreate text by matching sequences of words based on their probability to follow one another. I am using “Huckleberry Finn” as the corpus. Here are some unigrams as a result:

Always calf jawing . the never it said . he

text trees ‘ in and forked half times it for

Of course, if we want to create more sense we need to work with larger chunks of word successions. A few Bigrams:

distance across the woods

pretty keen . He had the raft , and so I

Now let’s jump right ahead into 8-Grams. I will also allow to generate more words:

looking up at the moon , and tears running down her cheeks ; and she had an open letter in one hand with black sealing wax showing on one edge of it , and she was mashing a locket with a chain to it against her mouth.

from Pride and Prejudice:

my fancy , it is only because he does not rattle away like other young men .

When all of the house that was open to general inspection had been seen , they returned downstairs , and , taking leave of the housekeeper , were consigned over to the gardener , who met them at the hall.

the Kamasutra:

them make him acquainted of her devotion to him . In religious ceremonies she should be a leader , as also in vows and fasts , and should not hold too good an opinion of herself . When her husband is lying on his bed she should only go near

Kamasutra 6-Gram:

wife who is liked most by her husband , and annoyed and distressed by his other wives , should associate with the wife who is liked most by her husband , and annoyed and distressed by his other wives , should associate

Kamasutra 4-Grams:

have done good work , in order to show her love by outward signs and motions , as described in the previous chapters . Such is the mode of temporary marriage among courtesans

composed of the asparagus racemosus , and the other is stretched out , it is called the ” congress of a collection of facts , told in plain and simple language , it must be remembered

And finally, a good 3-gram:

While anxious to please her with his lips , the man should begin to enjoy her .

