Parse all the words in your language
Link to an English wordlist (made by linguist John M. Lawler at the University of Michigan).1
In case you’re looking for a wordlist in any other language, you can compile your own adapting the currently available dictionaries in Libreoffice: this is exactly the topic of this blog post. I made the following code for my Paraulògic solver, as having a wordlist stored in my pc allowed me to perform a great deal of queries for specific sets of words.
A quick note, though: some of the results will be words that make little sense. This is due to the nature in which these dictionaries are created: they usually contain many names of places, acronyms or particular words that may only be used in a handful of fields.
Steps to compiling your own wordlist
-
Download the set of libreoffice dictionaries: In this webpage search for SDK and Sourcecode and choose the option that says
libreoffice-dictionaries-[version].tar.xz
. -
Unzip and search find the .dic file for your language: Keep in mind that the names for languages are abbreviated. For example,
catalan
is abbreviated asca
. You can expect a file structure similar to this one:libreoffice-7.3.0.3/ ├─ dictionaries/ │ ├─ af_ZA/ │ ├─ an_ES/ │ ├─ .../ │ ├─ ca/ │ │ ├─ dictionaries/ │ │ │ ├─ ca-valencia.aff │ │ │ ├─ ca_valencia.dic │ │ │ ├─ ca.aff │ │ │ ├─ ca.dic │ │ ├─ images/ │ │ ├─ META-INF/ │ │ ├─ LICENSES-EN.txt
Here, the most important files are
ca.dic
andLICENSES-EN.txt
(read the latter to make sure you comply with the uses intended for the dictionary, in this case the GPL license). -
Copy
[your language].dic
into another folder and run the following code:Filename: cleaner.py
- ???
- Profit
Endnotes
This concludes the tutorial for this week!
If you find any other already made wordlists, feel free to send me an email and I’ll add them at the start of this article.
You can find the previous code in the Paraulogicked repository, a solver I made for the Paraulògic online game.
To learn more about this project, you can read my previous blog post here.
Footnotes
-
The solution this author came up is interesting in it of itself: using only bash shell commands, the author parsed english words from texts and sorted them in a dictionary. This may be a good approach to creating your own wordlist, but it’s not an easy task to correctly identify which words should be parsed. ↩