Skip to content

Create and update an MLB stats database in BigQuery.

License

Notifications You must be signed in to change notification settings

anthonydelage/mlb-db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlb-db

Loads a range of baseball data to Google BigQuery.

Gets data from three main sources:

  • Baseball Savant (Statcast): uses the Statcast Search tool to collect pitch-by-pitch logs for every team and player. Sample download here.
  • Crunchtime Baseball player maps: a full table of current MLB players by MLBAM ID, mapped to their IDs in other "systems". Sample download here.
  • Baseball Prospectus player maps: a table containing current and retired MLB players. Not as complete as the Crunchtime Baseball maps. Sample download here.
  • Bill Petti's weather (hosted on Box): a table containing weather for every game. Sample here.

Requirements

  • Python 3.6+ (versions 3.5 and earlier haven't been tested)

Setup

Before using any of this tool's features, a BigQuery project and dataset need to be created with credentials matching those in config.yaml.

For a quick introduction to Google BigQuery, have a look at their tutorials here.

Usage

To set up the repository's virtual evironment, run:

> make venv

To initialize the BigQuery tables, run:

> make tables

To run a standard database update (all events for the current year and players), run:

> make data

To make more granular updates, refer to the documentation in the src/data.py file. For example, to update all events from 2016 without updating the players table, run:

> python src/update.py --year=2016 --no-players

Useful Resources

About

Create and update an MLB stats database in BigQuery.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published