Crossword Clone

Intro

In this blog we will create a NY Times style dense crossword. I will focus more on generating the crossword than the presentation. I use Go programming language because finding the word combinations is a little computation heavy. Go also has built in HTTP server that I can use to serve the API.

I will use the word lists I generated previously. If you want to see how to generate a word list, check out my post Create word list.

Wave Function Collapse (WFC)

We will use a simplified version of wave function collapse. If you are not familiar with the algorithm, check out this article by Robert Heaton: https://robertheaton.com/2018/12/17/wavefunction-collapse-algorithm/

So, how do we use the WFC for our crossword? There are three things we need to take note of in WFC:

  1. The variables we are trying to fill. This would be entries for our crossword.
  2. The domain which has all possible for the variables. This would be our word lists.
  3. The constraints which narrow down the possible values for the variables. This would be existing letters in the crossword. For example a five letter entry with no letters filled will have about 20k choices, but a file letter entry with the middle letter filled to ‘A’ will only 2k choices.

We get a list of all unfilled entries from the crossword and find then number of all possible choices for those entries, still staying true to the constraints. For the entry with the lowest number of entries, we will fill it with a random word from possible choices. We will keep repeating this until all the entries are filled.

We might run into situations where there might not be any words that fit into an entry because of the constraints. To solve this issue, we will save the crossword state at each iteration and revert back to the previous state and try with a different choice. We will also blacklist the previous choice for that state, so we don’t pick it again if we have to revert back to this state again.

Code

Crossword

We will start with creating a Go module

go mod init generator
touch crossword.go

In crossword.go create crossword package and Crossword struct

package crossword

type Crossword struct {
	Height int
	Width  int
	Cells  [][]byte
}

[!TIP]

You can use [][]rune instead of [][]byte if you are not worried about memory and performance. You might need to make some changes to other parts of the code if you use [][]rune. I will use [][]byte since I am only using ASCII characters.

We will create a function to create a new crossword. We will also initialize the cell values to . (dot) (ASCII 46). This will be useful when using regular expressions to find matches in word lists.

func New(height, width int) *Crossword {
	c := Crossword{Height: height, Width: width}
	c.Cells = make([][]byte, height)
	for i := 0; i < height; i++ {
		c.Cells[i] = make([]byte, width)
	}
	for i := range c.Height {
		for j := range c.Width {
			c.Cells[i][j] = 46
		}
	}
	return &c
}

Create functions to export crossword as byte array and initialize from byte array

func (c *Crossword) ToByteArray() []byte {
	return bytes.Join(c.Cells, nil)
}

func (c *Crossword) FromByteArray(b []byte) {
	for row := range c.Height {
		copy(c.Cells[row], b[row*c.Width:(row+1)*c.Width])
	}
}

Create functions to get the words at starting at a row and column. We will use - (hyphen) (ASCII 45) to denote black squares

func (c *Crossword) getWordAcross(row, col int) string {
	b := make([]byte, c.Width)

	i := col
	for i < c.Width && c.Cells[row][i] != 45 {
		b[i] = c.Cells[row][i]
		i++
	}

	return string(b[col:i])
}

Create a function to get entries

type CrosswordEntry struct {
	Row    int
	Column int
	Across bool
	Word   string
}

func (c *Crossword) GetEntries() []CrosswordEntry {
	entries := make([]CrosswordEntry, 0)

	for row := range c.Height {
		for col := range c.Width {
			if c.Cells[row][col] != 45 {
				if col == 0 || c.Cells[row][col-1] == 45 {
					word := c.getWordAcross(row, col)
					if len(word) > 1 {
						entries = append(entries, CrosswordEntry{Row: row, Column: col, Across: true, Word: word})
					}
				}
				if row == 0 || c.Cells[row-1][col] == 45 {
					word := c.getWordDown(row, col)
					if len(word) > 1 {
						entries = append(entries, CrosswordEntry{Row: row, Column: col, Across: false, Word: word})
					}
				}
			}
		}
	}

	return entries
}

Create a type State to store the crossword state. cbytes is the flattened cells array. target is the word that will be filled in the current iteration. picked is the word chosen from the word lists. kaput is an array of words that led to a dead end in the later iterations and have been blacklisted.

type Target struct {
	Row    int
	Column int
	Across bool
}

type State struct {
	cbytes []byte
	target Target
	picked string
	kaput  []string
}

Let’s create the last function to fill the crossword. Start by creating a state array with an initial state

func (c *Crossword) Fill(wordList *WordList) bool {
	sa := []State{}
	sa = append(sa, State{c.ToByteArray(), Target{2, 0, true}, "", []string{}})
}

Next we setup the loop that will fill the crossword. We can limit the number of iterations if needed. The wordList is an array of list of words of length equal to the index of the array

type WordList []string

func (c *Crossword) Fill(wordList *WordList) bool {
	sa := []State{}
	sa = append(sa, State{c.ToByteArray(), Target{2, 0, true}, "", []string{}})

	var done = false
	for iterations := 0; !done && iterations < 1000; iterations++ {
  }
  
  return done
}

Every iteration of the loop begins with using the most recent state in the state array and ends with adding an empty state to the state array

func (c *Crossword) Fill(wordList *WordList) bool {
	...
  
	for iterations := 0; !done && iterations < 1000; iterations++ {
		s := &sa[len(sa)-1]
    	
		...
      
		newState := State{c.ToByteArray(), Target{0, 0, true}, "", []string{}}
		sa = append(sa, newState)
  }
  
  ...
}

Inside the loop, we get the crossword entries and then find the number of words matching each entry. We only try to search words for entries that have a . (dot) in it. The regular expression is simply the entry word since we initialized them to a . (dot).

At the same time, we also save the minimum number of choices found for the entries and the index of the entries

entries := c.GetEntries()
numMatches := make([]int, len(entries))
for i := range numMatches {
  numMatches[i] = math.MaxInt
}
minMatch := math.MaxInt
minMatchIndex := -1

for i, entry := range entries {
  if strings.Contains(entry.Word, ".") {
    re := regexp.MustCompile(entry.Word)
    numMatches[i] = len(re.FindAllStringIndex((*wordList)[len(entry.Word)], -1))
    if numMatches[i] < minMatch {
      minMatch = numMatches[i]
      minMatchIndex = i
    }
  }
}

If there are no more matches found, check if the crossword is complete and break out if it is. If it is not complete, delete the current state and go back to the previous state and add the previous state chosen word to the kaput array and continue the loop. This handles the dead ends by backtracking to the previous state and blacklisting the word that led to the dead end.

if minMatchIndex == -1 {
  correct := true
  for _, entry := range entries {
    re := regexp.MustCompile(entry.Word)
    found := re.FindAllString((*wordList)[len(entry.Word)], -1)
    if len(found) == 0 {
      correct = false
      break
    }
  }

  if correct {
    done = true
    break
  } else {
    sa = sa[:len(sa)-1]
    s = &sa[len(sa)-1]
    s.kaput = append(s.kaput, s.picked)
    c.FromByteArray(s.cbytes)
    continue
  }
}

If there are matches found, we will grab the matches and find the usable array (difference between matches found and the kaput array)

entry := entries[minMatchIndex]

matches := regexp.MustCompile(entry.Word).FindAllString((*wordList)[len(entry.Word)], -1)

usable := slices.DeleteFunc(matches, func(match string) bool {
  return slices.Contains(s.kaput, match)
})

Like before, if there are no usable words, revert to previous state

if len(usable) == 0 {
  sa = sa[:len(sa)-1]
  s = &sa[len(sa)-1]
  s.kaput = append(s.kaput, s.picked)
  c.FromByteArray(s.cbytes)
  continue
}

If there are words we can use, pick a random one

s.picked = usable[rand.Intn(len(usable))]
s.target = Target{entry.Row, entry.Column, entry.Across}
c.SetWord(entry.Row, entry.Column, entry.Across, []byte(s.picked))

OpenAI Clue Generation

We can OpenAI for generating clues. The OpenAI official Go library is still in beta, so this code might change later.

Initialize the OpenAI Client. This code uses environment variable OPENAI_API_KEY which has the API key generated from the OpenAI dashboard.

client := openai.NewClient(
  option.WithAPIKey(
    os.Getenv("OPENAI_API_KEY"),
  ),
)

Create an array words from entries

entries := c.GetEntries()
answers := []string{}

for _, entry := range entries {
  answers = append(answers, entry.Word)
}

Use the chat completion endpoint to generate the hints. We provide a generic system prompt and an example of the clues generated. We also ask OpenAI to respond with JSON

response, err := client.Chat.Completions.New(
  context.Background(),
  openai.ChatCompletionNewParams{
    Model: openai.F(openai.ChatModelGPT4o),
    Messages: openai.F([]openai.ChatCompletionMessageParamUnion{
      openai.SystemMessage("You are a crossword clue generator. You will read a comma separated list of words and generate clues for them. You will respond in json format with the each word as name and its clue as value. Some words are a combination of multiple words. Pay attention to prefixes, suffixes and plurals. Try to be accurate and concise. Do not use the word or the form of the abbreviations in the clue."),
      openai.UserMessage(`DOG,AG,ETC`),
      openai.AssistantMessage(`{"DOG":"Man's best friend","AG":"Symbol for silver","ETC":"Other similar things (abbr)"}`),
      openai.UserMessage(strings.Join(answers, ",")),
    }),
  },
)

if err != nil {
  log.Fatal(err)
}
clues := map[string]string{}
json.Unmarshal([]byte(response.Choices[0].Message.Content), &clues)

Server

The last part is to use HTTP server to create the endpoint for new crossword generation.

We start with reading the word lists so they can be used for the lifetime of the server

func initWordList() {
	wordList = make([]string, 5+1)
	for i := 2; i < 5+1; i++ {
		tmp, err := os.ReadFile("../generator/words-" + strconv.Itoa(i) + ".txt")
		if err != nil {
			log.Fatal(err)
		}
		wordList[i] = string(tmp)
	}
}

func main() {
	initWordList()
  ...
}

Add handler for /new url path

function main() {
  ...
  http.HandleFunc("/new", func(w http.ResponseWriter, r *http.Request) {
    ...
  })
}

In the handler, we will create the crossword, set some black cells, and then fill it using the WFC algorithm. If the fill fails then we send HTTP status 500.

function main() {
  ...
  http.HandleFunc("/new", func(w http.ResponseWriter, r *http.Request) {
    ...
		numBlanks := rand.Intn(2)

		for i := 0; i < numBlanks+1; i++ {
			x := rand.Intn(5)
			y := rand.Intn(5)
			c.SetCell(x, y, 45)
			c.SetCell(4-x, 4-y, 45)
		}

		if !c.Fill(&wordList) {
			fmt.Println("Failed to fill crossword")
			w.WriteHeader(http.StatusInternalServerError)
			return
		}
    ...
  })
}

Next we generate clues as shown in the previous section, format the output and return JSON reponse

function main() {
  ...
  http.HandleFunc("/new", func(w http.ResponseWriter, r *http.Request) {
    ...
		for _, entry := range c.GetEntries() {
			if entry.Across {
				across = append(across, []interface{}{entry.Row, entry.Column, len(entry.Word), clues[entry.Word]})
			} else {
				down = append(down, []interface{}{entry.Row, entry.Column, len(entry.Word), clues[entry.Word]})
			}
		}

		data := NewCrosswordResponse{Rows: c.Height, Columns: c.Width, Grid: c.ToByteArray(), Across: across, Down: down}

		j, err := json.Marshal(data)
		if err != nil {
			log.Fatal(err)
		}

		w.Header().Set("Access-Control-Allow-Origin", "*")
		w.Header().Set("Content-Type", "application/json")
		w.Write(j)
    ...
  })
}

Finally, we can start the server with

http.ListenAndServe(":8080", nil)

I will create a separate post for the frontend for the crossword. The code for the frontend is also in the same Github repo.