Web app that allows for an easy retrieval of SARS-CoV-2 test results from the Arztruf Hamburg.
The KVHH has recently launched a new website, and with that, it has become apparent that the format of the PDF files is not consistent. I've looked into extending the parsing, but the format is just so different from file to file.
Even for files from the same testing site (e. g. the lab in Hamburg-Altona), the structure differs depending on the date at which the results were uploaded.
Pull requests that fix the PDF parsing are more than welcome. But be aware that there's some heavy lifting involved, given that even tabula-java has its difficulties making sense of the data.
Unfortunately, even with the new website, it is still not possible for test subjects to easily lookup their result.
The app consists of three parts:
- A Node.js script (
download-pdf-files.js
) that scrapes the website and downloads all PDF files - A Bash script (
convert-and-combine-all-pdf-files.sh
) that uses tabula-java to extract the tables from the PDF files. It then combines the data into one single CSV file - The web app that consumes the combined CSV file and allows users to search it
- Download the latest release (the
.jar
file) of tabula-java - Rename it to
tabula.jar
. (Otherwise, the script will not be able to find it.)
In order to generate the large all-in-one CSV file, follow these steps.
- Download the PDF files:
node download-pdf-files
- Convert the PDF files into one single CSV file:
./convert-and-combine-all-pdf-files.sh
You can now browse all test results from all.csv
. Get well soon!
Configuring the app for Uberspace is pretty easy:
- On your Uberspace, create the directory where the app will be installed to (see the
post-receive
script) - Register the supervisord service, using the
.ini
file that comes with this repository - Setup an empty repository on your Uberspace (
git init --bare
) - Add the
post-receive
hook to that repository - Add the repository as a remote to your clone of this repository
- Configure the subdomain and the web backend
In order to automatically update the data, setup a Cronjob:
# Download PDF files
0 */1 * * * /usr/bin/node /home/juitblip/apps/covid.milchbartstrasse.de/download-pdf-files >> /home/juitblip/apps/covid.milchbartstrasse.de/cronjob-download.log 2>&1
# Five minutes later, convert PDF to CSV
5 */1 * * * /home/juitblip/apps/covid.milchbartstrasse.de/convert-and-combine-all-pdf-files.sh >> /home/juitblip/apps/covid.milchbartstrasse.de/cronjob-convert.log 2>&1
This downloads the PDF files every hour and generates the CSV file five minutes later.