Wordle starter words

I was late to Wordle, but decided (like many, I’m sure) to find the ideal words to start with.

I started with a list of English words, filtered it down to only five-letter words, and counted the number of occurrences of each letter:

e: 1681          c: 655           k: 298
a: 1557          u: 613           f: 274
r: 1127          y: 588           w: 230
o: 1056          h: 541           v: 193
l: 1005          d: 516           z: 72
s: 960           m: 467           x: 67
t: 952           p: 466           j: 58
i: 946           b: 423           q: 38
n: 870           g: 410

Ideally we want to start with a word that uses as many of the most common letters as possible (I think??).

So we loop through all the five-letter words, and award points according to the list above to each letter in each word, ignoring duplicates. So, e.g. quiet gets 38 points from ‘q’, plus 613 from ‘u’ and so on. This gives the following top-scoring words:

arose: 6381           later: 6322
orate: 6373           alert: 6322
erato: 6373           alter: 6322
lares: 6330           stare: 6277

orate means “to speak in an elevated and often pompous manner”. erato and lares are Greek gods, and probably not valid. later, alert and alter all use the same letters. So we have arose, orate, later, and stare as good starter words.

(The lowest-scoring words, if you want to play on hard mode, are kudzu and fuzzy.)

Then we can go a step further, and find the best word to use second, to get as much coverage as possible. So we do the same thing as above, but don’t award any points to letters that are already in our first word.

I calculated the 10 best first words, and for each one the 10 best second words to use.

arose:	clint until glint flint tunic multi tulip unity built guilt
orate:	linus sling slink slimy sibyl sybil shiny indus cling pliny
alert:	sonic scion noisy simon minos bison pious sound monic synod
aster:	login lingo logic monic folic limbo mobil cling ilona pliny
irate:	locus lousy scold sound synod bonus scowl solon salon olson
learn:	stoic moist posit scout foist south shout optic topic spout
snare:	pilot clout cloth pluto optic topic logic multi tulip eliot
stale:	irony rhino minor robin groin curio choir doric crony corny
aisle:	north thorn court torch crony corny front tudor huron round
alien:	torus story short storm strom sport strop scour court stork

So a good strategy might be to use arose in the first row, and until (ignoring clint, as it’s probably a name) in the second. That’s what I did this morning, to mediocre success.

And if you really, really don’t trust your instincts, I also calculated the best third word to use! Spit these out and inshallah you’ll have all the info you need to get it on word four!

The second words are to the right of the colon, and the third words are underneath their respective second words.

arose →  clint until glint flint
           ↓     ↓     ↓     ↓
         dumpy psych chump chump
         bumpy champ dumpy dumpy
         mushy chump bumpy bumpy

orate →  linus sling slink slimy
           ↓     ↓     ↓     ↓
         psych chump chump punch
         champ dumpy dumpy bunch
         chump bumpy bumpy chunk

later →  scion sonic noisy minos
           ↓     ↓     ↓     ↓
         dumpy dumpy chump dutch
         bumpy bumpy dutch coypu
         mushy mushy mulch cubby

stare →  lingo login logic monic
           ↓     ↓     ↓     ↓
         chump chump bundy glyph
         dumpy dumpy nymph bulky
         bumpy bumpy punky dully

So you might use orate in the first row, slink in the second, and bumpy in the third!

Here’s the code:

import sys
from typing import Optional
from collections import defaultdict

from english_words import english_words_lower_alpha_set


def count_letters(words: list) -> dict[str, int]:
    letters = defaultdict(int)
    for wo in words:
        for ch in wo:
            letters[ch] += 1
    return dict(sorted(letters.items(), key=lambda x: x[1], reverse=True))


def calc_word_scores(
    words: list, letters: dict, options: int = 3, exclude: Optional[str] = None
) -> list:
    scores = defaultdict(int)
    for word in words:
        for ch in set(word):
            if not exclude or ch not in exclude:
                scores[word] += letters[ch]
    scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return [word[0] for word in scores][:options]


def main(letters=5, options=4):
    words = [w for w in english_words_lower_alpha_set if len(w) == letters]

    letters = count_letters(words)
    best_letters = [let[0] for let in letters.items()]
    print(f"Ordered letter frequency:\n{' '.join(best_letters)}\n")

    print("First word, then →  second words, then ↓ third words")
    word_scores_1 = calc_word_scores(words, letters, options)
    for word1 in word_scores_1:
        word_scores_2 = calc_word_scores(words, letters, options, word1)
        word_scores_3 = zip(*[
            calc_word_scores(words, letters, options, word1 + word2)
            for word2 in word_scores_2
        ])

        print(f"{word1}{' '.join(word_scores_2)}")
        print(f"{'':>9}{' '.join(['  ↓  ' for _ in range(options)])}")
        for row in word_scores_3:
            print(f"{'':>9}{' '.join(row)}")
        print()


if __name__ == "__main__":
    letters, options = int(sys.argv[1]), int(sys.argv[2])
    main(letters, options)