Introduction to statistical analysis in the context of Liber Primus[]
This article is intended to not only explain how we know Liber Primus does not utilize some of the more highly suggested ciphers that often immediately get shot down, but also to give you the data and help interpret some of it. For more information on the methods outlined as a whole, please see the resources section at the bottom of the page. If you are just here for the data and already know what everything means, suffer and scroll. Your pain brings me joy.
What is frequency analysis?[]
Letter | Relative frequency in the English language | |
---|---|---|
Texts(%) | Dictionaries(%) | |
A | 8.2 | 7.8 |
B | 1.5 | 2.0 |
C | 2.8 | 4.0 |
D | 4.3 | 3.8 |
E | 12.7 | 11.0 |
F | 2.2 | 1.4 |
G | 2.0 | 3.0 |
H | 6.1 | 2.3 |
I | 7.0 | 8.6 |
J | 0.15 | 0.21 |
K | 0.77 | 0.97 |
L | 4.0 | 5.3 |
M | 2.4 | 2.7 |
N | 6.7 | 7.2 |
O | 7.5 | 6.1 |
P | 1.9 | 2.8 |
Q | 0.095 | 0.19 |
R | 6.0 | 7.3 |
S | 6.3 | 8.7 |
T | 9.1 | 6.7 |
U | 2.8 | 3.3 |
V | 0.98 | 1.0 |
W | 2.4 | 0.91 |
X | 0.15 | 0.27 |
Y | 2.0 | 1.6 |
Z | 0.074 | 0.44 |
Frequency analysis is one of many techniques used in cryptography to decipher encrypted messages. It relies on the fact that certain letters or combinations of letters occur more frequently in a given language. By analyzing the frequency of letters or patterns in the encrypted text, it is possible to make educated guesses about the corresponding letters or patterns in the original message.
The process involves counting the occurrences of each letter or pattern in the encrypted text and comparing them to the expected frequencies in the language being used. For example, in English, the letter 'e' is the most commonly used letter, so if a particular letter appears most frequently in the encrypted text, it is likely to correspond to 'e' in the original message. Letter frequency varies by language, however since we can assume the plaintext of the unsolved portion of Liber Primus will be in English, we are using it as our point of reference from here on out.
What is a bigram? Why won't everyone shut up about IOC?[]
This is how we arrive at bigrams and IOC/IC- also known as Index of Coincidence. IC is a measure used to determine how likely it is that a pair of two letters(also known as a bigram or doublets, if the letters are the same) that when randomly selected, are going to be the same from a given text.
You may also see the term 'n-grams' floating around. The concept is the same as bigrams, except instead of two characters it could be referring to any number of characters(trigram, quadgram).
This is a list of character-1-grams:
A,N, I,N,S,T,R,U,C,T,IO,N, C,O,M,M,A,N,D, Y,O,U,R, O,W,N, S,E,L,F
We could also have character bigrams, word-1-grams, word bigrams, etc:
(A,N),(N,I),(I,N),(N,S),(S,T),(T,R),(R,U),(U,C),(C,T),(T,IO),(IO,N),(N,C),(C,O),(O,M),(M,M),(M,A),(A,N),(N,D),(D,Y),(Y,O),(O,U),(U,R),(R,O),(O,W),(W,N),(N,S),(S,E),(E,L),(L,F)
AN, INSTRUCTION, COMMAND, YOUR, OWN, SELF
(AN,INSTRUCTION), (INSTRUCTION,COMMAND), (COMMAND,YOUR), (YOUR,OWN), (OWN,SELF)
Why is this important? How do we know my idea won't work?[]
Because letter occurrence is not random, this is useful to determine patterns in text that can inform us on what encryption method has been used(or, in this case, which ones haven't). Frequency analysis and IOC can be effective against substitution ciphers such as Caesar and Vigenere, where each letter is replaced by another letter or symbol, but it becomes less effective with more complex methods of encryption and modern encryption methods are designed to be resistant to being broken in this way. Things also get slightly more complicated when you remember shifts need to be applied to the runes themselves and not their Latin character equivalents. However, we can still glean some interesting things through its use.
The reason we know Liber Primus is not encrypted using Caesar, Vigenere, Rot13, etc. is because of the distribution of characters. With Caesar or Rot13 for example, the distribution of characters would remain the same as normal English even though the most frequently occurring letter will be different as mentioned above. With Vigenere there is the Kasiski–Kerckhoff Method which to make a long, boring, mathematical explanation brief, utilizes the bigrams mentioned earlier to determine the length of the key used and from there decrypt the ciphertext.
Methods of statistical analysis that haven't been done on Liber Primus are few and far between nowadays, but at least once a year someone crawls out of the woodwork with a new one. I will do my best to keep this page as up to date as possible, but do keep in mind the best resource for real time solving updates is our Discord server. The author of this article would like to thank everyone there for their hard work doing nerd things on Liber Primus so that they can steal it and make a mid article about it.
Bigram Distribution[]

Bigram distribution of Liber Primus as a whole
Analysis of bigrams shows the only deviation from random text. It reveals a lower than expected number of doublets. This can be seen in the image as the blue line across the diagonal. They are not completely absent, but much lower than expected. This is likely a direct result of the utilized key/cipher.
See below table. The chapters are separated by artwork, as it is commonly believed that encryption method will change with each section of marginalia.
Pages | Number of runes |
IC | Number of doublets |
Doublet occurrence rate | |
---|---|---|---|---|---|
Random ciphertext | None | None | 1 | None | 3.45% |
Cross | 0-2 | 729 | 0.988 | 4 | 0.549% |
Spirals | 3-7 | 1145 | 1.004 | 6 | 0.524% |
Branches | 8-14 | 1729 | 0.999 | 9 | 0.520% |
Möbius | 15-22 | 1903 | 1.000 | 10 | 0.525% |
Mayfly | 23-26 | 1021 | 0.993 | 11 | 1.078% |
Wing/Tree | 27-32 | 1433 | 0.991 | 13 | 0.907% |
Cuneiform | 33-39 | 1680 | 0.996 | 12 | 0.714% |
Spiral/Branches | 40-53 | 3008 | 1.001 | 18 | 0.598% |
Hollow | 54-55 | 308 | 0.980 | 3 | 0.977% |
Total | 0-55 | 12956 | 0.999 | 86 | 0.663% |
The doublet occurrence rate of random text is 1/29 = 3.45%. IC of plaintext English is expected to be 1.73.
Low doublet counts point traditionally to some form of autoclave or autokey cipher. Alternatively, there is the option that 3301 created their own cipher or used autokey as well as another encryption method. The rest of the bigrams are distributed relatively evenly as seen in the image on the right.
Not only does this show that Liber Primus is not completely random, but it also points directly away from ciphers mentioned in the first section of this article.
Word repeats[]
Ciphertext in latin | D-J-U-B-E-I | B-M-R-N-M | O-U-N-W-M | O-F-L-E-ING | I-M-ING-Y-A |
---|---|---|---|---|---|
Indices | 23-11-1-17-18-10 | 17-19-4-9-19 | 3-1-9-7-19 | 3-0-20-22-21 | 10-19-21-26-24 |
Starting position | 6555, 12950 | 5448, 12001 | 6985, 8016 | 7393, 12385 | 10671, 12764 |
Difference between
starting positions: |
6395 | 6533 | 1031 | 4992 | 2093 |
Ciphertext in runes | ᛞᛄᚢᛒᛖᛁ | ᛒᛗᚱᚾᛗ | ᚩᚢᚾᚹᛗ | ᚩᚠᛚᛟᛝ | ᛁᛗᛝᚣᚪ |
Words containing
sequence |
ᛒᚠ-ᛞᛄᚢ-ᛒᛖᛁ-ᚫᚠ
ᚳᛠᛁᛗᚳᛉ-ᛞᛄᚢ-ᛒᛖᛁ |
ᚹᛒᛗᚱᚾᛗᚻᛗᛁᚾᚪᛞ
ᛗᛁᛄᛒᛗᚱᚾᛗ |
ᛠᛈᛄᛞᚾᛟᚩᚢᚾᚹᛗ
ᚩᚢᚾᚹᛗᛚ |
ᚩᚠᛚ-ᛟᛝᛈ
ᚩᚷᛗᚩ-ᚠᛚᛟᛝᚦᛠ |
ᛗᚠᛝᛉᛞᛁ-ᛗᛝᚣᚪᛝᚠᛉᛁᛟᚷᛚ
ᛏᛝᛁ-ᛗᛝᚣᚪᚫ |
Chapters | Wing_Tree
Hollow |
Möbius
Spiral_Branches |
Wing_Tree
Cuneiform |
Wing_Tree
Spiral_Branches |
Spiral_Branches
Hollow |
Pages | 27 (second row)
55 (last two words) |
22 (row 4, word 2)
48 (penultimate row, first letters) |
28 (third last row, last word)
33 (row 4, third word) |
30 (row 5, words 2 and 3)
52 (row 9, third word) |
43 (last row, word 1 and 2)
54 (row 6 word 5) |
Starting position
in chapter |
28, 302 | 1845, 2361 | 458, 56 | 866, 2745 | 1031, 116 |
This is interesting to note as repeated words or phrases are often used in cryptanalysis to determine a repeating key or key length.
Analysis of n-gram frequencies[]
This section contains some short analysis of the n-gram frequencies. Here we compare the n-gram frequency of random text to those of the unsolved pages. Counted unique n-grams is the number of unique n-grams in the text, number of repeated n-grams is the number of n-grams that appear at least twice in the text, and the total number of repeated n-grams is what we get when we add up the total number of n-grams that appear more than once in the text.
---------------------------------------------
LP text:
Counted unique bigrams: 840
Number of repeated bigrams: 837
Total number of repeated bigrams: 12952
Random text:
Mean counted unique bigrams: 840.9997
Std counted unqiue bigrams: 0.01731790980459247
Mean number of repeated bigrams: 840.9963
Std number of repeated bigrams: 0.060714989911882546
Mean total number of repeated bigrams: 12954.9966
Std total number of repeated bigrams: 0.0582103083654433
---------------------------------------------
LP text:
Counted unique trigrams: 9945
Number of repeated trigrams: 2508
Total number of repeated trigrams: 5517.0
Random text:
Mean counted unique trigrams: 10050.1294
Std counted unqiue trigrams: 38.91237406841171
Mean number of repeated trigrams: 2433.5345
Std number of repeated trigrams: 31.238108293397026
Mean total number of repeated trigrams: 5337.4051
Std total number of repeated trigrams: 67.1163422274337 (edited)
---------------------------------------------
LP text:
Counted unique quadgrams: 12825
Number of repeated quadgrams: 127
Total number of repeated quadgrams: 255
Random text:
Mean counted unique quadgrams: 12835.0622
Std counted unqiue quadgrams: 11.02112204632541
Mean number of repeated quadgrams: 117.2225
Std number of repeated quadgrams: 10.925364696430046
Mean total number of repeated quadgrams: 235.1603
Std total number of repeated quadgrams: 21.93048116002018
---------------------------------------------