Printable Shavian dictionary of the 1650 most common English words
Published by Arun Isaac on
In other languages: தமிழ்
I am trying to learn the Shavian alphabet1, but I miss a good paper dictionary. Flitting back and forth between my computer when I want to be writing away from it is a no go. So, I made this printable dictionary of the 16502 most common English words.
I used the Kingsley Read lexicon3 with some Unix-fu to produce this printable Shavian dictionary. Here’s the code.
wget https://github.com/Shavian-info/readlex/raw/refs/heads/main/kingsleyreadlexicon.tsv
# The number of lines (1650, in this case) is constrained to be a
# multiple as described in fold.awk.
# Combine parts of speech that are spelt the same, sort by word
# frequency, take the first 1650 words, extract only the Latin and
# Shavian spellings, sort alphabetically, and "fold" columns for
# printing.
awk -f collapse.awk kingsleyreadlexicon.tsv | \
sort -nrk 3,3 | head -n1650 \
| cut -f1,2 | sort -k 1,1 \
| awk -f fold.awk > shavian-1650.tsv
where collapse.awk is
BEGIN {
OFS = "\t"
}
{
latin = $1
shavian = $2
frequency = $5
}
NR == 1 {
# Initialize state.
previous_latin = latin
previous_shavian = shavian
accumulated_frequency = frequency
}
NR > 1 {
# Output line and clear accumulator if this is a different word.
if ((latin != previous_latin) || (shavian != previous_shavian)) {
print previous_latin, previous_shavian, accumulated_frequency
accumulated_frequency = 0
}
# Keep state.
accumulated_frequency += frequency
previous_latin = latin
previous_shavian = shavian
}
END {
# This is the end. Output line unconditionally.
print previous_latin, previous_shavian, accumulated_frequency
}
and fold.awk is
# This script only works if the total number of lines is a multiple of
# lines_per_page * sections_per_page. This is when pages are
# perfectly filled, and there is no empty space left.
BEGIN {
lines_per_page = 55;
sections_per_page = 3;
}
# Accumulate lines in a matrix.
{
page_line = (NR - 1) % (lines_per_page * sections_per_page)
lines[page_line % lines_per_page, int(page_line / lines_per_page)] = $0
}
# Dump accumulated matrix once end of page is reached. Then, clear the
# matrix so the next page can be built up.
(page_line == lines_per_page*sections_per_page - 1) {
for (i=0; i<lines_per_page; i++) {
for (j=0; j<sections_per_page; j++) {
printf (j == 0) ? "%s" : "\t%s", lines[i, j]
delete lines[i, j]
}
printf "\n"
}
}
Finally, I printed shavian-1650.tsv using LibreOffice Calc. LibreOffice took care of neatly aligning the columns.
On a related note, you may also be interested in this Shavian alphabet chart from Omniglot.
UPDATE on 30 March, 2025: Based on a suggestion on the fediverse, I added collapse.awk to combine parts of speech that are spelt the same.
Footnotes:
Thanks to indieterminacy for nudging me in this direction a long time ago.
Why 1650, you say? I was going for something above 1000, and 1650 fit perfectly into 10 pages.
The Kingsley Read Lexicon, and therefore my printable dictionary, is copyright shavian.info 2020-2022 under a Creative Commons Attribution-ShareAlike 4.0 International Licence.