这个库纯粹是因为我比较懒,所以搞了个集合,方便自己用而已,后来点star的越来越多,有点意外。我们组 主页 正在做相关的研究工作,在我的账号fangvv下会不断地发布相关的论文代码,希望大家多多交流,互相学习。
Please note that I just want to collect these links from the original sites for research purposes. Welcome to join us to discuss interesting ideas on efficient DNN training/inference.
https://zhuanlan.zhihu.com/p/58705979
http://blog.csdn.net/wspba/article/details/75671573
https://www.ctolib.com/ZhishengWang-Embedded-Neural-Network.html
https://blog.csdn.net/touch_dream/article/details/78441332
https://zhuanlan.zhihu.com/p/28439056
https://blog.csdn.net/QcloudCommunity/article/details/77719498
https://www.cnblogs.com/zhonghuasong/p/7493475.html
https://blog.csdn.net/jackytintin/article/details/53445280
https://zhuanlan.zhihu.com/p/27747628
https://blog.csdn.net/shuzfan/article/category/6271575
https://blog.csdn.net/cookie_234
https://www.jianshu.com/u/f5c90c3856bb
https://github.com/sun254/awesome-model-compression-and-acceleration
- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Model compression as constrained optimization, with application to neural nets. Part II: quantization -A Survey of Model Compression and Acceleration for Deep Neural Networks
- Dynamic Capacity Networks
- ResNeXt: Aggregated Residual Transformations for Deep Neural Networks
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- Xception: Deep Learning with Depthwise Separable Convolutions
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Residual Attention Network for Image Classification
- SEP-Nets: Small and Effective Pattern Networks
- Deep Networks with Stochastic Depth
- Learning Infinite Layer Networks Without the Kernel Trick
- Coordinating Filters for Faster Deep Neural Networks
- ResBinNet: Residual Binary Neural Network
- Squeezedet: Unified, small, low power fully convolutional neural networks
- Efficient Sparse-Winograd Convolutional Neural Networks
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
- Dark knowledge
- FitNets: Hints for Thin Deep Nets
- Net2net: Accelerating learning via knowledge transfer
- Distilling the Knowledge in a Neural Network
- MobileID: Face Model Compression by Distilling Knowledge from Neurons
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- Sequence-Level Knowledge Distillation
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Moonshine: Distilling with Cheap Convolutions
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
- Local Binary Convolutional Neural Networks
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Quantize weights and activations in Recurrent Neural Networks
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- Quantized Convolutional Neural Networks for Mobile Devices
- Compressing Deep Convolutional Networks using Vector Quantization
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- Loss-aware Binarization of Deep Networks
- Towards the Limit of Network Quantization
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Trained Ternary Quantization
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- Pruning Filters for Efficient ConvNets
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- Soft Weight-Sharing for Neural Network Compression
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Learning both Weights and Connections for Efficient Neural Networks
- Dynamic Network Surgery for Efficient DNNs
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Convolutional neural networks with low-rank regularization
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Speeding up convolutional neural networks with low rank expansions
https://github.com/memoiry/Awesome-model-compression-and-acceleration
Some papers I collected and deemed to be great to read, which is also what I'm about to read, raise a PR or issue if you have any suggestion regarding the list, Thank you.
- A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv '17]
- Recent Advances in Efficient Computation of Deep Convolutional Neural Networks [arXiv '18]
- MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [arXiv '18, Google]
- NasNet: Learning Transferable Architectures for Scalable Image Recognition [arXiv '17, Google]
- DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices [AAAI'18, Samsung]
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv '17, Megvii]
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv '17, Google]
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions [arXiv '17]
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video[arxiv'17]
- Shift-based Primitives for Efficient Convolutional Neural Networks [WACV'18]
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
- Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
- Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
- Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
- Loss-aware Binarization of Deep Networks [ICLR'17]
- Towards the Limit of Network Quantization [ICLR'17]
- Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]
- Training and Inference with Integers in Deep Neural Networks [ICLR'18]
- Deep Learning with Limited Numerical Precision[ICML'2015]
- Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
- Pruning Filters for Efficient ConvNets [ICLR'17]
- Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
- Soft Weight-Sharing for Neural Network Compression [ICLR'17]
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
- Dynamic Network Surgery for Efficient DNNs [NIPS'16]
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
- To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Learning Structured Sparsity in Deep Neural Networks
- Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Channel pruning for accelerating very deep neural networks [ICCV'17]
- Amc: Automl for model compression and acceleration on mobile devices [ECCV'18]
- RePr: Improved Training of Convolutional Filters [arXiv'18]
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
- Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
- Convolutional neural networks with low-rank regularization [arXiv'15]
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
- High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
- Speeding up convolutional neural networks with low rank expansions
- Dark knowledge
- FitNets: Hints for Thin Deep Nets
- Net2net: Accelerating learning via knowledge transfer
- Distilling the Knowledge in a Neural Network
- MobileID: Face Model Compression by Distilling Knowledge from Neurons
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- Sequence-Level Knowledge Distillation
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Moonshine: Distilling with Cheap Convolutions
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
- DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys '17]=
- DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys '17]
- MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL '17]
- DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys '16]
- DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN '16]
- EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA '16]
- MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys '16]
- DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE '16]
- Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
- An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
- CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM '16]
- fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS '17]
- 消灭重复计算
- 展开循环
- 利用SIMD指令
- OpenMP
- 定点化
- 避免非连续内存读写
- 纵览轻量化卷积神经网络:SqueezeNet、MobileNet、ShuffleNet、Xception
- An Introduction to different Types of Convolutions in Deep Learning
- CNN中千奇百怪的卷积方式大汇总
https://github.com/chester256/Model-Compression-Papers
Papers for neural network compression and acceleration. Partly based on link.
-
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks, [arxiv '18]
-
A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv '17]
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
- Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
- Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
- Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
- Loss-aware Binarization of Deep Networks [ICLR'17]
- Towards the Limit of Network Quantization [ICLR'17]
- Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]
- Training and Inference with Integers in Deep Neural Networks [ICLR'18]
- Deep Learning with Limited Numerical Precision[ICML'2015]
- Model compression via distillation and quantization [ICLR '18]
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy [ICLR '18]
- On the Universal Approximability of Quantized ReLU Neural Networks [arXiv '18]
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [CVPR '18]
- Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
- Pruning Filters for Efficient ConvNets [ICLR'17]
- Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
- Soft Weight-Sharing for Neural Network Compression [ICLR'17]
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
- Dynamic Network Surgery for Efficient DNNs [NIPS'16]
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
- To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
- Data-Driven Sparse Structure Selection for Deep Neural Networks [arXiv '17]
- Learning Structured Sparsity in Deep Neural Networks [NIPS '16]
- Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism [ISCA '17]
- Channel Pruning for Accelerating Very Deep Neural Networks [ICCV '17]
- Learning Efficient Convolutional Networks through Network Slimming [ICCV '17]
- NISP: Pruning Networks using Neuron Importance Score Propagation [CVPR '18]
- Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers [ICLR '18]
- MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks [arXiv '17]
- Efficient Sparse-Winograd Convolutional Neural Networks [ICLR '18]
- “Learning-Compression” Algorithms for Neural Net Pruning [CVPR '18]
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 [NIPS '16]
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks [ECCV '16]
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration [CVPR '17]
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
- Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
- Convolutional neural networks with low-rank regularization [arXiv'15]
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
- High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
- Speeding up convolutional neural networks with low rank expansions
- Coordinating Filters for Faster Deep Neural Networks [ICCV '17]
- Dark knowledge
- FitNets: Hints for Thin Deep Nets [ICLR '15]
- Net2net: Accelerating learning via knowledge transfer [ICLR '16]
- Distilling the Knowledge in a Neural Network [NIPS '15]
- MobileID: Face Model Compression by Distilling Knowledge from Neurons [AAAI '16]
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer [arXiv '17]
- Deep Model Compression: Distilling Knowledge from Noisy Teachers [arXiv '16]
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer [ICLR '17]
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer [arXiv '17]
- Learning Efficient Object Detection Models with Knowledge Distillation [NIPS '17]
- Data-Free Knowledge Distillation For Deep Neural Networks [NIPS '17]
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learnin [CVPR '17]
- Moonshine: Distilling with Cheap Convolutions [arXiv '17]
- Model compression via distillation and quantization [ICLR '18]
- Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy [ICLR '18]
- Beyond Filters: Compact Feature Map for Portable Deep Model [ICML '17]
- SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization [ICML '17]
https://github.com/ZhishengWang/Embedded-Neural-Network
- This is a collection of papers aiming at reducing model sizes or the ASIC/FPGA accelerator for Machine Learning, especially deep neural network related applications. (Inspiled by Neural-Networks-on-Silicon)
- Tutorials:
- Our Contributions
- Network Compression
- Parameter Sharing
- Teacher-Student Mechanism (Distilling)
- Fixed-precision training and storage
- Sparsity regularizers & Pruning
- Tensor Decomposition
- Conditional (Adaptive) Computing
- Hardware Accelerator
- Benchmark and Platform Analysis
- Recurrent Neural Networks
- Conference Papers
- TODO
This field is changing rapidly, belowing entries may be somewhat antiquated.
- structured matrices
- Structured Convolution Matrices for Energy-efficient Deep learning. (IBM Research–Almaden)
- Structured Transforms for Small-Footprint Deep Learning. (Google Inc)
- An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections.
- Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank.
- Hashing
- Functional Hashing for Compressing Neural Networks. (Baidu Inc)
- Compressing Neural Networks with the Hashing Trick. (Washington University + NVIDIA)
- Learning compact recurrent neural networks. (University of Southern California + Google)
- Distilling the Knowledge in a Neural Network. (Google Inc)
- Sequence-Level Knowledge Distillation. (Harvard University)
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. (TuSimple)
- Binary/Ternary Neural Networks
- XNOR-Net, Ternary Weight Networks (TWNs), Binary-net and their variants.
- Deep neural networks are robust to weight binarization and other non-linear distortions. (IBM Research–Almaden)
- Recurrent Neural Networks With Limited Numerical Precision. (ETH Zurich + Montréal@Yoshua Bengio)
- Neural Networks with Few Multiplications. (Montréal@Yoshua Bengio)
- 1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs. (Tsinghua University + Microsoft)
- Towards the Limit of Network Quantization. (Samsung US R&D Center)
- Incremental Network Quantization_Towards Lossless CNNs with Low-precision Weights. (Intel Labs China)
- Loss-aware Binarization of Deep Networks. (Hong Kong University of Science and Technology)
- Trained Ternary Quantization. (Tsinghua University + Stanford University + NVIDIA)
- Learning both Weights and Connections for Efficient Neural Networks. (SongHan, Stanford University)
- Deep Compression, EIE. (SongHan, Stanford University)
- Dynamic Network Surgery for Efficient DNNs. (Intel)
- Compression of Neural Machine Translation Models via Pruning. (Stanford University)
- Accelerating Deep Convolutional Networks using low-precision and sparsity. (Intel)
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning. (Intel)
- Exploring Sparsity in Recurrent Neural Networks. (Baidu Research)
- Pruning Convolutional Neural Networks for Resource Efficient Inference. (NVIDIA)
- Pruning Filters for Efficient ConvNets. (University of Maryland + NEC Labs America)
- Soft Weight-Sharing for Neural Network Compression. (University of Amsterdam, reddit discussion)
- Sparsely-Connected Neural Networks_Towards Efficient VLSI Implementation of Deep Neural Networks. (McGill University)
- Training Compressed Fully-Connected Networks with a Density-Diversity Penalty. (University of Washington)
- Bayesian Compression
- Bayesian Sparsification of Recurrent Neural Networks
- Bayesian Compression for Deep Learning
- Structured Bayesian Pruning via Log-Normal Multiplicative Noise
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications. (Samsung, etc)
- Learning compact recurrent neural networks. (University of Southern California + Google)
- Tensorizing Neural Networks. (Skolkovo Institute of Science and Technology, etc)
- Ultimate tensorization_compressing convolutional and FC layers alike. (Moscow State University, etc)
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks. (@CVPR2015)
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation. (New York University, etc.)
- Convolutional neural networks with low-rank regularization. (Princeton University, etc.)
- Learning with Tensors: Why Now and How? (Tensor-Learn Workshop @ NIPS'16)
- Adaptive Computation Time for Recurrent Neural Networks. (Google DeepMind@Alex Graves)
- Variable Computation in Recurrent Neural Networks. (New York University + Facebook AI Research)
- Spatially Adaptive Computation Time for Residual Networks. (github link, Google, etc.)
- Hierarchical Multiscale Recurrent Neural Networks. (Montréal)
- Outrageously Large Neural Networks_The Sparsely-Gated Mixture-of-Experts Layer. (Google Brain, etc.)
- Adaptive Neural Networks for Fast Test-Time Prediction. (Boston University, etc)
- Dynamic Deep Neural Networks_Optimizing Accuracy-Efficiency Trade-offs by Selective Execution. (University of Michigan)
- Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation. (@Yoshua Bengio)
- Multi-Scale Dense Convolutional Networks for Efficient Prediction. (Cornell University, etc)
- Fathom: Reference Workloads for Modern Deep Learning Methods. (Harvard University)
- DeepBench: Open-Source Tool for benchmarking DL operations. (svail.github.io-Baidu)
- BENCHIP: Benchmarking Intelligence Processors.
- DAWNBench: An End-to-End Deep Learning Benchmark and Competition. (Stanford)
- MLPerf: A broad ML benchmark suite for measuring performance of ML software frameworks, ML hardware accelerators, and ML cloud platforms.
- FPGA-based Low-power Speech Recognition with Recurrent Neural Networks. (Seoul National University)
- Accelerating Recurrent Neural Networks in Analytics Servers: Comparison of FPGA, CPU, GPU, and ASIC. (Intel)
- ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA. (FPGA 2017, Best Paper Award)
- DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for GeneralPurpose Deep Neural Networks. (KAIST, ISSCC 2017)
- Hardware Architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition. (University of Kaiserslautern, etc)
- Efficient Hardware Mapping of Long Short-Term Memory Neural Networks for Automatic Speech Recognition. (Master Thesis@Georgios N. Evangelopoulos)
- Hardware Accelerators for Recurrent Neural Networks on FPGA. (Purdue University, ISCAS 2017)
- Accelerating Recurrent Neural Networks: A Memory Efficient Approach. (Nanjing University)
- A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications.
- An Energy-Efficient Reconfigurable Architecture for RNNs Using Dynamically Adaptive Approximate Computing.
- A Systolically Scalable Accelerator for Near-Sensor Recurrent Neural Network Inference.
- A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications
- E-PUR: An Energy-Efficient Processing Unit for Recurrent Neural Networks
- C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs (FPGA 2018, Peking Univ, Syracuse Univ, CUNY)
- DeltaRNN: A Power-efficient Recurrent Neural Network Accelerator. (FPGA 2018, ETHZ, BenevolentAI)
- Towards Memory Friendly Long-Short Term Memory Networks (LSTMs) on Mobile GPUs (MACRO 2018)
- E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs (HPCA 2019)
- Please refer to Neural-Networks-on-Silicon
- Dynamic Network Surgery for Efficient DNNs. (Intel Labs China)
- Memory-Efficient Backpropagation Through Time. (Google DeepMind)
- PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions. (Moscow State University, etc.)
- Learning Structured Sparsity in Deep Neural Networks. (University of Pittsburgh)
- LightRNN: Memory and Computation-Efficient Recurrent Neural Networks. (Nanjing University + Microsoft Research)
- lognet: energy-efficient neural networks using logarithmic computation. (Stanford University)
- extended low rank plus diagonal adaptation for deep and recurrent neural networks. (Microsoft)
- fixed-point optimization of deep neural networks with adaptive step size retraining. (Seoul National University)
- implementation of efficient, low power deep neural networks on next-generation intel client platforms (Demos). (Intel)
- knowledge distillation for small-footprint highway networks. (TTI-Chicago, etc)
- automatic node selection for deep neural networks using group lasso regularization. (Doshisha University, etc)
- accelerating deep convolutional networks using low-precision and sparsity. (Intel Labs)
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. (MIT)
- Network Sketching: Exploiting Binary Structure in Deep CNNs. (Intel Labs China + Tsinghua University)
- Spatially Adaptive Computation Time for Residual Networks. (Google, etc)
- A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation. (University of Pittsburgh, etc)
- Deep Tensor Convolution on Multicores. (MIT)
- Beyond Filters: Compact Feature Map for Portable Deep Model. (Peking University + University of Sydney)
- Combined Group and Exclusive Sparsity for Deep Neural Networks. (UNIST)
- Delta Networks for Optimized Recurrent Network Computation. (Institute of Neuroinformatics, etc)
- MEC: Memory-efficient Convolution for Deep Neural Network. (IBM Research)
- Deciding How to Decide: Dynamic Routing in Artificial Neural Networks. (California Institute of Technology)
- Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning. (ETH Zurich, etc)
- Analytical Guarantees on Numerical Precision of Deep Neural Networks. (University of Illinois at Urbana-Champaign)
- Variational Dropout Sparsifies Deep Neural Networks. (Skoltech, etc)
- Adaptive Neural Networks for Fast Test-Time Prediction. (Boston University, etc)
- Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank. (The City University of New York, etc)
- Channel Pruning for Accelerating Very Deep Neural Networks. (Xi’an Jiaotong University + Megvii Inc.)
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression. (Nanjing University, etc)
- Learning Efficient Convolutional Networks through Network Slimming. (Intel Labs China, etc)
- Performance Guaranteed Network Acceleration via High-Order Residual Quantization. (Shanghai Jiao Tong University + Peking University)
- Coordinating Filters for Faster Deep Neural Networks. (University of Pittsburgh + Duke University, etc, github link)
- Towards Accurate Binary Convolutional Neural Network. (DJI)
- Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations. (ETH Zurich)
- TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning. (Duke University, etc, github link)
- Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. (Intel)
- Bayesian Compression for Deep Learning. (University of Amsterdam, etc)
- Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon. (Nanyang Technological Univ)
- Training Quantized Nets: A Deeper Understanding. (University of Maryland)
- Structured Bayesian Pruning via Log-Normal Multiplicative Noise. (Yandex, etc)
- Runtime Neural Pruning. (Tsinghua University)
- The Reversible Residual Network: Backpropagation Without Storing Activations. (University of Toronto, gihub link)
- Compression-aware Training of Deep Networks. (Toyota Research Institute + EPFL)
- Oral
- Training and Inference with Integers in Deep Neural Networks. (Tsinghua University)
- Poster
- Learning Sparse NNs Through L0 Regularization
- Learning Intrinsic Sparse Structures within Long Short-Term Memory
- Variantional Network Quantization
- Alternating Multi-BIT Quantization for Recurrent Neural Networks
- Mixed Precision Training
- Multi-Scale Dense Networks for Resource Efficient Image Classification
- efficient sparse-winograd CNNs
- Compressing Wrod Embedding via Deep Compositional Code Learning
- Mixed Precision Training of Convolutional Neural Networks using Integer Operations
- Adaptive Quantization of Neural Networks
- Espresso_Efficient Forward Propagation for Binary Deep Neural Networks
- WRPN_Wide Reduced-Precision Networks
- Deep Rewiring_Training very sparse deep networks
- Loss-aware Weight Quantization of Deep Network
- Learning to share_simultaneous parameter tying and sparsification in deep learning
- Deep Gradient Compression_Reducing the Communication Bandwidth for Distributed Training
- Large scale distributed neural network training through online distillation
- Learning Discrete Weights Using the Local Reparameterization Trick
- Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
- Training wide residual networks for deployment using a single bit for each weight
- The High-Dimensional Geometry of Binary Neural Networks
- workshop
- To Prune or Not to Prune_Exploring the Efficacy of Pruning for Model Compression
- Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
- BlockDrop: Dynamic Inference Paths in Residual Networks
- SYQ: Learning Symmetric Quantization for Efficient Deep Neural Networks
- Two-Step Quantization for Low-Bit Neural Networks
- Towards Effective Low-Bitwidth Convolutional Neural Networks
- Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks
- CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning-Quantization
- “Learning-Compression” Algorithms for Neural Net Pruning
- Wide Compression: Tensor Ring Nets
- NestedNet: Learning Nested Sparse Structures in Deep Neural Networks
- Interleaved Structured Sparse Convolutional Neural Networks
- NISP: Pruning Networks Using Neuron Importance Score Propagation
- Learning Compact Recurrent Neural Networks With Block-Term Tensor Decomposition
- HydraNets: Specialized Dynamic Architectures for Efficient Inference
- Learning Time/Memory-Efficient Deep Architectures With Budgeted Super Networks
- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
- A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers
- Learning Compression from Limited Unlabeled Data
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices
- Training Binary Weight Networks via Semi-Binary Decomposition
- Clustering Convolutional Kernels to Compress Deep Neural Networks
- Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Coreset-Based Neural Network Compression
- Convolutional Networks with Adaptive Inference Graphs
- Value-aware Quantization for Training and Inference of Neural Networks
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
- Deep Expander Networks: Efficient Deep Networks from Graph Theory
- Extreme Network Compression via Filter Group Approximation
- Constraint-Aware Deep Neural Network Compression
- Compressing Neural Networks using the Variational Information Bottleneck
- DCFNet_Deep Neural Network with Decomposed Convolutional Filters
- Deep k-Means Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions
- Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
- High Performance Zero-Memory Overhead Direct Convolutions
- Kronecker Recurrent Units
- Learning Compact Neural Networks with Regularization
- StrassenNets_Deep Learning with a Multiplication Budge
- Weightless_Lossy weight encoding for deep neural network compression
- WSNet_Compact and Efficient Networks Through Weight Sampling
- workshops
- 7761-scalable-methods-for-8-bit-training-of-neural-networks
- 7382-frequency-domain-dynamic-pruning-for-convolutional-neural-networks
- 7697-sparsified-sgd-with-memory
- 7994-training-deep-neural-networks-with-8-bit-floating-point-numbers
- 7358-kdgan-knowledge-distillation-with-generative-adversarial-networks
- 7980-knowledge-distillation-by-on-the-fly-native-ensemble
- 8292-multiple-instance-learning-for-efficient-sequential-data-classification-on-resource-constrained-devices
- 7553-moonshine-distilling-with-cheap-convolutions
- 7341-hitnet-hybrid-ternary-recurrent-neural-network
- 8116-fastgrnn-a-fast-accurate-stable-and-tiny-kilobyte-sized-gated-recurrent-neural-network
- 7327-training-dnns-with-hybrid-block-floating-point
- 8117-reversible-recurrent-neural-networks
- 485-norm-matters-efficient-and-accurate-normalization-schemes-in-deep-networks
- 8218-synaptic-strength-for-convolutional-neural-network
- 7666-tetris-tile-matching-the-tremendous-irregular-sparsity
- 7644-learning-sparse-neural-networks-via-sensitivity-driven-regularization
- 7466-pelee-a-real-time-object-detection-system-on-mobile-devices
- 7433-learning-versatile-filters-for-efficient-convolutional-neural-networks
- 7841-multi-task-zipping-via-layer-wise-neuron-sharing
- 7519-a-linear-speedup-analysis-of-distributed-deep-learning-with-sparse-and-quantized-communication
- 7759-gradiveq-vector-quantization-for-bandwidth-efficient-gradient-aggregation-in-distributed-cnn-training
- 8191-atomo-communication-efficient-learning-via-atomic-sparsification
- 7405-gradient-sparsification-for-communication-efficient-distributed-optimization
- Poster:
- SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY
- Rethinking the Value of Network Pruning
- Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach
- Dynamic Channel Pruning: Feature Boosting and Suppression
- Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking
- Slimmable Neural Networks
- RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks
- Dynamic Sparse Graph for Efficient Deep Learning
- Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
- Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds
- Learning Recurrent Binary/Ternary Weights
- Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network
- Relaxed Quantization for Discretized Neural Networks
- Integer Networks for Data Compression with Latent-Variable Models
- Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters
- A Systematic Study of Binary Neural Networks' Optimisation
- Analysis of Quantized Models
- Oral:
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
- All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification
- Towards Optimal Structured CNN Pruning via Generative Adversarial Learning
- T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor
- Fully Learnable Group Convolution for Acceleration of Deep Neural Networks
- others to be added
https://github.com/cedrickchee/awesome-ml-model-compression
An awesome style list that curates the best machine learning model compression and acceleration research papers, articles, tutorials, libraries, tools and more. PRs are welcome!
- A Survey of Model Compression and Acceleration for Deep Neural Networks
- Model compression as constrained optimization, with application to neural nets. Part I: general framework
- Model compression as constrained optimization, with application to neural nets. Part II: quantization
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
- MobileNetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
- Xception: Deep Learning with Depthwise Separable Convolutions
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
- Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video
- AddressNet: Shift-based Primitives for Efficient Convolutional Neural Networks
- ResNeXt: Aggregated Residual Transformations for Deep Neural Networks
- ResBinNet: Residual Binary Neural Network
- Residual Attention Network for Image Classification
- Squeezedet: Unified, small, low power fully convolutional neural networks
- SEP-Nets: Small and Effective Pattern Networks
- Dynamic Capacity Networks
- Learning Infinite Layer Networks Without the Kernel Trick
- Efficient Sparse-Winograd Convolutional Neural Networks
- DSD: Dense-Sparse-Dense Training for Deep Neural Networks
- Coordinating Filters for Faster Deep Neural Networks
- Deep Networks with Stochastic Depth
- Quantized Convolutional Neural Networks for Mobile Devices
- Towards the Limit of Network Quantization
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
- Compressing Deep Convolutional Networks using Vector Quantization
- Trained Ternary Quantization
- The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning
- ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
- Deep Learning with Low Precision by Half-wave Gaussian Quantization
- Loss-aware Binarization of Deep Networks
- Quantize weights and activations in Recurrent Neural Networks
- Fixed-Point Performance Analysis of Recurrent Neural Networks
- Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
- Local Binary Convolutional Neural Networks
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Faster CNNs with Direct Sparse Convolutions and Guided Pruning
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Pruning Convolutional Neural Networks for Resource Efficient Inference
- Pruning Filters for Efficient ConvNets
- Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning
- Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
- Fine-Pruning: Joint Fine-Tuning and Compression of a Convolutional Network with Bayesian Optimization
- Learning both Weights and Connections for Efficient Neural Networks
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- Data-Driven Sparse Structure Selection for Deep Neural Networks
- Soft Weight-Sharing for Neural Network Compression
- Dynamic Network Surgery for Efficient DNNs
- Channel pruning for accelerating very deep neural networks
- AMC: AutoML for model compression and acceleration on mobile devices
- ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
- Distilling the Knowledge in a Neural Network
- Deep Model Compression: Distilling Knowledge from Noisy Teachers
- Learning Efficient Object Detection Models with Knowledge Distillation
- Data-Free Knowledge Distillation For Deep Neural Networks
- Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
- Moonshine: Distilling with Cheap Convolutions
- Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
- Sequence-Level Knowledge Distillation
- Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
- Dark knowledge
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
- FitNets: Hints for Thin Deep Nets
- MobileID: Face Model Compression by Distilling Knowledge from Neurons
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
- Speeding up convolutional neural networks with low rank expansions
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications
- Convolutional neural networks with low-rank regularization
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
- Accelerating Very Deep Convolutional Networks for Classification and Detection
- Efficient and Accurate Approximations of Nonlinear Convolutional Networks
Content published on the Web.
- Why the Future of Machine Learning is Tiny
- Deep Learning Model Compression for Image Analysis: Methods and Architectures
- TensorFlow Model Optimization Toolkit. Accompanied blog post, TensorFlow Model Optimization Toolkit — Pruning API
To the extent possible under law, Cedric Chee has waived all copyright and related or neighboring rights to this work.
https://github.com/jnjaby/Model-Compression-Acceleration
- Product Quantization for Nearest Neighbor Search,TPAMI,2011 [paper]
- Compressing Deep Convolutional Networks using Vector Quantization,ICLR,2015 [paper]
- Deep Learning with Limited Numerical Precision, ICML, 2015 [paper]
- Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks, ArXiv, 2016 [paper]
- Fixed Point Quantization of Deep Convolutional Networks, ICML, 2016 [paper]
- Quantized Convolutional Neural Networks for Mobile Devices, CVPR, 2016 [paper]
- Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights, ICLR, 2017 [paper]
- BinaryConnect: Training Deep Neural Networks with binary weights during propagations, NIPS, 2015 [paper]
- BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1, ArXiV, 2016 [paper]
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks, ECCV, 2016 [paper]
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations, ArXiv, 2016 [paper]
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients, ArXiv, 2016 [paper]
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, ICLR, 2016 [paper]
- Optimal Brain Damage, NIPS, 1990 [paper]
- Learning both Weights and Connections for Efficient Neural Network, NIPS, 2015 [paper]
- Pruning Filters for Efficient ConvNets, ICLR, 2017 [paper]
- Sparsifying Neural Network Connections for Face Recognition, CVPR, 2016 [paper]
- Learning Structured Sparsity in Deep Neural Networks, NIPS, 2016 [paper]
- Pruning Convolutional Neural Networks for Resource Efficient Inference, ICLR, 2017 [paper]
- Distilling the Knowledge in a Neural Network, ArXiv, 2015 [paper]
- FitNets: Hints for Thin Deep Nets, ICLR, 2015 [paper]
- Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR, 2017 [paper]
- Face Model Compression by Distilling Knowledge from Neurons, AAAI, 2016 [paper]
- In Teacher We Trust: Learning Compressed Models for Pedestrian Detection, ArXiv, 2016 [paper]
- Like What You Like: Knowledge Distill via Neuron Selectivity Transfer, ArXiv, 2017 [paper]
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5MB model size, ArXiv, 2016 [paper]
- Convolutional Neural Networks at Constrained Time Cost, CVPR, 2015 [paper]
- Flattened Convolutional Neural Networks for Feedforward Acceleration, ArXiv, 2014 [paper]
- Going deeper with convolutions, CVPR, 2015 [paper]
- Rethinking the Inception Architecture for Computer Vision, CVPR, 2016 [paper]
- Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial "Bottleneck" Structure, ArXiv, 2016 [paper]
- Xception: Deep Learning with Depthwise Separable Convolutions, ArXiv, 2017 [paper]
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, ArXiv, 2017 [paper]
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices, ArXiv, 2017 [paper]
- Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation, NIPS,2014 [paper]
- Speeding up Convolutional Neural Networks with Low Rank Expansions, BMVC, 2014 [paper]
- Deep Fried Convnets, ICCV, 2015 [paper]
- Accelerating Very Deep Convolutional Networks for Classification and Detection, TPAMI, 2016 [paper]
- Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition, ICLR, 2015 [paper]
https://github.com/mapleam/model-compression-and-acceleration-4-DNN (进去看)