A Dataset of Cryptic Crossword Clues

A Dataset of Cryptic Crossword Clues is a dataset of cryptic crossword cluesIf you’re new to cryptic crosswords, rejoice! A whole new world awaits you! The New Yorker has an excellent introduction to cryptic crosswords, and Matt Gritzmacher has a daily newsletter with links to crosswords.

, collected from various blogs and publicly available digital archives. I originally started this project to practice my web scraping and data engineering skills, but as it’s evolved I hope it can be a resource to solvers and constructors of cryptic crosswords.

The project scrapes several blogs and digital archives for cryptic crosswords. Out of these collected web pages, the clues, answers, clue numbers, blogger’s explanation and commentary, puzzle title and publication date are all parsed and extracted into a tabular dataset. The result (as of September 2021) is a little over half a million clues from cryptic crosswords over the past twelve years, which makes for a rich and peculiar dataset.

Currently the sources for clues are:

The data can be viewed online and downloaded for free (CSV, JSON, SQLite, advancedThe CSV request will only return the first 1000 rows, click here to stream all rows (this will take a while). The JSON request is paginated with 100 rows per page.

). Detailed documentation can be found on the datasheet and the source code for creating the dataset is available on GitHub.

Send all comments, suggestions and complaints to george[æ]

Please share and enjoy!

~ George Ho