DNN Training, from theory to practice

Overview

DNN Training, from theory to practice

This repository is complementary to the deep learning training lesson given to les Mines ParisTech on the 11th of March 2022 as part of the Large Scale Machine Learning class.

You can find here the slides of the class.

Requirements

To get started, clone it and prepare a new virtual env.

git clone https://github.com/adefossez/dnn_theo_practice
cd dnn_theo_practice
python3 -m venv env
source env/bin/activate
python3 -m pip install -r requirements.txt

Note: it can be safer to install PyTorch through a conda environment to make sure all proper versions of CUDA realted libraries are installed and used. We use pip here for simplicity.

Basic training pipeline

To get started, you can run

python -m basic.train

You can tweak some hyper parameters:

python -m basic.train --lr 0.1 --epochs 30 --model mobilenet_v2

This basic pipeline provides all the essential tools for training a neural network:

  • automatic experiment naming,
  • logging and metric dumping,
  • checkpointing with automatic resume.

Looking at basic/train.py you will see that 90% of the code is not deep learning but pure engineering. Some frameworks like PyTorch Lightning can save you some of this trouble, at the cost of losing control and understanding over what happens. In any case it is good to have an idea of how things work under the hood!

PyTorch-Lightning training pipeline

Insite the pl_hydra folder, I provide the same pipeline, but using PyTorch-Lightning along with Hydra, as an alternative to argparse. Have a look at pl_hydra/train.py to see the differences with the previous implementation.

python -m pl_hydra.train optim.lr=0.1 model=mobilenet_v2

Using existing frameworks:

At this point, it is a good time to introduce a few frameworks you might want to use for your projects.

Hydra

Hydra handles things like logging, configuration parsing (based on YAML files, which is a bit nicer than argparse, especially for large projects), and also has support for some grid search scheduling with a dedicated language. It also supports meta-optimizers like Nevergrad (see after).

Nevergrad

Nevergrad is a framework for gradient free optimization. It can be used to automatically tune your model or optimization hyper-parameters with smart random search.

PyTorch-Lightning

PyTorch Lightning takes care of logging, distributed training, checkpointing and many more boilerplate parts of a deep learning research project. It is powerful but also quite complex, and you will lose some control over the training pipeline.

Dora

Dora is an experiment management framework:

  • Grid searches are expressed as pure python.
  • Experiments have an automatic signature assigned based on its args.
  • Keeps in sync experiments defined in grid files, and those running on the cluster.
  • Basic terminal based reporting of job states, metrics etc.

Dora allows you to scale up to hundreds of experiments without losing your sanity.

Plotting and monitoring utilities

While it is always good to have basic metric reporting inside logs, it can be more conveniant to track experimental progress through a web browser. TensorBoard, initially developed for TensorFlow provide just that. A fully hosted alternative is Wandb. Finally, HiPlot is a lightweight package to easily make sense of the impact of hyperparameters on the metrics of interest.

Unix tools

It is a good idea to learn to master the standard Unix/Linux tools! For large scale machine learning, you will often have to run experiments on a remote cluster, with only SSH access. tmux is a must have, as well as knowing at least of one terminal based file editor (nano is the simplest, emacs or vim are more complex but quite powerful). Take some time to learn about tuning your bashrc, setting up aliases for often used commands etc.

You will probably need tools like grep, less, find or ack. I personnaly really enjoy fd, an alternative to find with some intuitive interface. Similarly ag is a nice way to quickly look through a codebase in the terminal. If you need to go through a lot of logs, you will enjoy ripgreg.

License

This code in this repository is released into the public domain. You can freely reuse any part of it and you don't even need to say where you found it! See the LICENSE for more information.

The slides are released under Creative Commons CC-BY-NC.

You might also like...

This is a repository of the supplementary implementation for the 2022 summer course 'Mathematical Theory and Applications of Deep Learning', taught by Professor Haizhao Yang at Tianyuan Mathematical Center in Central China

2022-Summer-Course This is a repository of the supplementary implementation for the 2022 summer course 'Mathematical Theory and Applications of Deep L

Nov 13, 2022

electron-phonon relaxation time calculator from Deformation potential theory

Deformation-potential-theory electron-phonon relaxation time calculator from Deformation potential theory These codes (clean-cubic.py, quadratic.py an

Sep 19, 2022

Playing around with chaos theory simulations. Creating equilibrium graphs and visualizing the logistic maps.

Playing around with chaos theory simulations. Creating equilibrium graphs and visualizing the logistic maps.

chaos-theory Playing around with chaos theory simulations. Creating equilibrium graphs and visualizing the logistic maps. Chaos Theory The Logistic Ma

Nov 6, 2022

Coding Practice for first interview of Sprout.ai

sprout_coding_practice Coding Practice for the first interview of Sprout.ai Disclaimer A) As it was a little unclear from the problem description, I g

Apr 12, 2022

This repo is a collection of practice problems from different resources to learn, implement and master graph problems.

Graphs-DSA This repo is a collection of practice problems from different resources to learn, implement and master graph problems. The practice problem

Oct 4, 2022

Psycopg2 db-api practice project.

Python-Demos A test project for psycopg2 methods. Getting Started This repository includes python files used to interract with a postgreQL database: p

May 14, 2022

Just some random programming exercises to solve & practice in any programming language

Random Programming Exercises Just some random programming exercises to solve & practice in any programming language. Different topics, different domai

Jun 13, 2022

Documentation per-problem for the picoCTF practice gym. All challenges have been in previous competitions.

Documentation per-problem for the picoCTF practice gym. All challenges have been in previous competitions.

Documentation per-problem for the picoCTF practice gym. All challenges have been in previous competitions.

Sep 9, 2022

Implementations of some algorithms from scratch for practice.

Algo from Scratch Locality Sensitive Hashing (LSH) LSH 是一类用于近似估计相似度的方法。此处只记录了最经典的一种(Shingling & MinHash)。 思路 得到一篇文档的某种向量表示(如 one-hot)后,直接求相似度(如 cosine

Aug 8, 2022

A mini python practice by creating a popular game (rock, paper scissors), if-else, array, and a while loop were used to created this mini project

rock_paper_scissors A mini python practice by creating a popular game (rock, paper scissors), if-else, array, and a while loop were used to created th

Aug 12, 2022

N-Queens puzzle player. This tool will generate randomly or randomly locked puzzle, you may use this as "a boxing bag" to practice problem solving algorithm.

nqueenplay 1 2 3 4 5 6 7 8 --------------------------------- 8 | | | Q | | | | | | 8 ----------------------------

Sep 22, 2022

This repository includes research and practice assignments for newcomers.

This repository includes research and practice assignments for newcomers.

İTÜ RAKE - Yazılım Ekibi Araştırma ve Uygulama Görevleri Robot Operating System (ROS) Araştırması ROS'u Tanıma Araştırması ROS Elemanlarını Öğrenme Ar

Nov 8, 2022

A fun challenge to learn and practice using Streamlit.

A fun challenge to learn and practice using Streamlit.

Streamlit-30DaysOfStreamlit A fun challenge to learn and practice using Streamlit. The challenge will move through three levels of difficulty: beginne

Nov 26, 2022

A basic repository which contains all the Codes made by me for my Academic Practice and in-class Revision

Python-Practice-LPU A basic repository which contains all the Codes made by me for my Academic Practice and in-class Revision All programs done in the

Oct 1, 2022

Accelerate AI training in a few lines of code without changing the training setup.

Accelerate AI training in a few lines of code without changing the training setup.

Join the community | Contribute to the library How nebulgym works • Benchmarks • Installation • Get started • Tutorials & examples • Documentation Web

Nov 21, 2022

Codes for this paper: [CVPR 2022] The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy.

Codes for this paper: [CVPR 2022] The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy.

The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy Codes for this paper: [CVPR 2022] The Pr

Sep 15, 2022

A complete instruction for training a Persian spell checker and a language model based on SymSpell and KenLM, respectively using Wikipedia dataset.

Download Persian Wiki-Dump, Train Kenlm & Spell Checker In this project, I download persian wiki-dump dataset from wikipedia, preprocess it and finall

Oct 23, 2022

Training ImageNet / CIFAR models with sota strategies and fancy techniques such as ViT, KD, NAS, Rep, etc.

Image Classification SOTA Image Classification SOTA is an image classification toolbox based on PyTorch. Updates March 24, 2022 Support training strat

Nov 22, 2022

CLOOB Conditioned Latent Diffusion training and inference code (the latter is not available at this time)

Pretrained Models Coming soon! Training Training setup First recursively git clone this repo to get it and its submodules: git clone --recursive https

Nov 24, 2022
Owner
Alexandre Défossez
Alexandre Défossez
code for paper: Aggregate or Not? Exploring Where to Privatize in DNN Based Federated Learning Under Different Non-IID Scenes

FedPS-AggregateOrNot code for paper: Aggregate or Not? Exploring Where to Privatize in DNN Based Federated Learning Under Different Non-IID Scenes Bas

LiXinchun 5 Nov 18, 2022
Implementation of the Simulator-based Explanations of DNN-Errors (SEDE) approach.

SEDE (Simulator-based Explanations of DNN-Errors) Introduction SEDE is a tool that is used to generate explanations for DNN errors with any simulator.

null 1 Sep 26, 2022
Practical examples of "Flawed Machine Learning Security" together with ML Security best practice across the end to end stages of the machine learning model lifecycle from training, to packaging, to deployment.

Flawed Machine Learning Security (AKA Exploring Secure ML) About this repo This Repo contains a set of resources relevant to the talk "Secure Machine

The Institute for Ethical Machine Learning 50 Nov 24, 2022
Training a neural network practice for Undergraduate Research Program led by Dr. Hao Zheng in Summer 2022

Neural Network REU22 Training a neural network practice for Undergraduate Research Program led by Dr. Hao Zheng in Summer 2022 These programs have bee

Hetvi Shah 1 Sep 18, 2022
network theory of jazz scales version 2. Modularized and 100% python.

Brightness Scores for 28 Jazz Modes and 18 Rules for Modulation This computational music theory project assigns a brightness score for all 28 modes de

R Tyler McLaughlin 12 Apr 15, 2022
Catto extends Python with concepts from category theory, such as functors, applicatives and monads

Catto Catto extends Python with concepts from category theory, such as functors, applicatives and monads Example from catto.core import List from cat

rem 3 Jun 2, 2022
Algorithms from our course "Coding Theory & Applications", plus some I wrote to solve exercises from the book "Codes, Cryptology and Curves with Computer Algebra" (yes, that was a tad overkill).

CodingTheory A few of the following are algorithms from our MSc course "Coding Theory & Applications": they are not many. Most of these are algorithms

Leonardo Errati 1 Apr 30, 2022
Official code for "Improving Few-Shot Learning through Multi-task Representation Learning Theory" ECCV 2022.

MetaMTReg Official code for "Improving Few-Shot Learning through Multi-task Representation Learning Theory" ECCV 2022. How to run Install required pac

CEA LIST 3 Nov 8, 2022
This is a python program that reproduce the Butterfly effect of the chaos theory, discovered by Robert May on 1975, and started by Poincaré

The-Theory-Of-Chaos This is a python program that reproduce the Butterfly effect of the chaos theory, discovered by Robert May on 1975, and started by

Mouad Hajbaoui 1 Aug 1, 2022
A solution to the problem of finding five English words with 25 distinct characters, using graph theory.

A solution to the problem of finding five English words with 25 distinct characters, using graph theory.

Scott Mansell 8 Nov 6, 2022