To do:
- Revamp tsv writer in terms of objects, so that all the functionality gets wrapped into each group of words
I pulled lists of the most common Greek words appearing in a corpus of web pages from SketchEngine. Then I used WiktionaryParser to pull definitions from Wiktionary. I packaged the results as a tsv that can be uploaded to Anki, a flashcard app.
anki.tsv
is the Anki flashcard listdb.json
is a database with the words, their frequencies, and their definitionswords.txt
is just a list of the words included in the listsraw/
contains un-tracked files downloaded from SketchEngine that are parsed
parse_html.py
turns the html files inraw/
into database entriesfetch_definitions.py
populates the database with Wiktionary definitionsmake_anki_tsv.py
translates the database into the Anki-ready file
I considered but did not ultimately pursue scraping a Greek-English dictionary website (e.g., dict.com or Word Reference) in part because of the difficulty in reliably parsing the pages and also because of licensing concerns.