How to best model the data? #5
dpriskorn
announced in
Announcements
Replies: 4 comments
-
Question:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I created this datamodel today for the Riksdagen open data to sentences project and I would like some feedback from the community.
Basically the idea is to analyze all 160k documents and store every single unique rawtoken and sentence in a database.
This is going to be a huge database which I'm not sure ToolsDB can handle (WMF recommend Trove for databases >125 GB)
I want to store normalized tokens and later I want to link the raw tokens to Wikidata Lexeme Form IDs.
I'm curious to see:
The different tables are explained in the UML here:
https://github.com/dpriskorn/riksdagen_sentences/blob/save_to_database/diagrams/datamodel.puml
Beta Was this translation helpful? Give feedback.
All reactions