Code for Git Analytics
Copyright Doug Williams, 2015
Development notes and summary results can be found in _README.ipynb Notebook (needs update)
Overall Estimator Performance:
- Estimator performance vs probability for Nova
- Build Script for Estimator Results
- Estimator Performance vs Feature Reduction
Sample analysis data can be found at williamsdoug/GitAnalyticsDatasets
- Various Dataset Sizes
- See: IPython notebook OpenStack_Sample_Data.ipynb for examples of each record format.
Analysis Notebooks for Various OpenStack Projects:
Performance of individual solvers:
- AdaBoost
- DecisionTree
- ExtraTrees
- GradientTreeBoosting
- Logistic Regression
- Naive Bayes - Gaussian
- Random Forest
- Stochastic Gradient Descent
- Support Vector Machines
- Neural Networks using Theano
Others, may be a bit rough:
- Composite learner using boosting
- Neural Networks using Multi-Layer Perceptron:First Attempt and Second Attempt and Various topologies using Theano
Python source code located in: ./dev
Configuration file located at: ./dev/git_analysis_config.py
Dataset build script: Build_All_Datasets.ipynb
3/12/2015: Major update
5/28/2015: Clean-up notebooks, new python aware diff routine, language-specific features, Theano-based NN
6/10/2015: Add notebook with prediction probability curves
6/19/2015: Add Precision/Recall curves