Hence, I set about making one for myself: Collective nouns in English.
These are the steps I took to do this. Note that I was doing these things with the express intention of making it a quick and dirty, run once and throw-away script. There probably are better ways to do it, and if you know one, please tell me.
Step 1: Extract data
I used Google Spreadsheets for this task. It has a friendly
ImportHtml function which can import almost any HTML table structure directly.
I did an import on
B2 on this spreadsheet.
Step 2: Realize limitations
I wanted to create cloze deletion cards for each card. However, I noticed that there were several nouns possible for the same animal and the same noun may be used for more than one animal. Hence, there would be an ambiguity about which animal/noun was the correct answer. Since Anki does not support multiple answers for a single gap in Cloze Deletions, I decided to add minimal hints to the questions to remove ambiguity of which answer was needed.
One of the simplest ways of doing it was by leaving just enough characters in the prefix such that the answer was unambiguous.
For example, since a group of both ducks and dolphins can be called a team, the cloze deletion would read:
team of do[…]
team of du[…]
To remove the ambiguity. The python code for finding such a suffix is simple:
However, in some cases, it is not possible without giving away the whole answer. For example, you can use either
trip as a collective noun for goats. In this case, we will have to give trip as the hint if we want to disambiguate it from tribe. There were three such cases:
- tribe/trip of goats
- dole of turtles/turtle doves
- rake/rag of colts
These had to dropped from the deck.
Step 3: Script it
I then wrote a python script to do the menial tasks of reading the csv file, removing the crud (e.g. the single character alphabets) and outputting a single CSV file.
Then I opened Anki, and imported the file to create the deck:
The import work smoothly. I synced with AnkiWeb, and shared the deck.
The entire process took about an hour of work. Almost half the time was spent figuring out which tool to use (I dabbled with
BeautifulSoup before using Google spreadsheets) and the algorithm I wanted to use for disambiguation. Then another hour or so writing this blog post about it.