Github Link: https://github.com/dingguanglei/jdit

Guide List:

Jdit is a research processing oriented framework based on pytorch. Only care about your ideas. You don't need to build a long boring code to run a deep learning project to verify your ideas.

You only need to implement you ideas and don't do anything with training framework, multiply-gpus, checkpoint, process visualization, performance evaluation and so on.

If you have any problems, or you find bugs you can contact the author.

E-mail: dingguanglei.bupt@qq.com

Parallel Trainer

When you read this part. Assuming that you have known how to build a complete jdit trainer object and training the model. If not, please learn the guide and build a jdit trainer.

For one trainer, it can be used only if you apply for the same task. However, if you want to do the same task with different hyper-parameters, it will be little redundant because of you need to change the hyper-parameters manually and run the script again and again.
So, this part, parallel trainer, will deal with this problem. It supplies a simple class to apply parallel tasks and distributes tasks on different devices easily.
Yout only need to care about what parameters group you want to test. Then, just go to sleep, all results of these tasks will give you tomorrow morning.

Usage SupParallelTrainer

For SupParallelTrainer, there are only two parameters you need to
pass. They are unfixed_params_list and train_func.

SupParallelTrainer(unfixed_params_list, train_func)

unfixed_params_list

This contains the hyper-parameters which you want to test.
For example:

 unfixed_params = [
        {'depth': 4, 'lr': 1e-3},
        {'depth': 8, 'lr': 1e-2},
        {'depth': 4, 'lr': 1e-2},
        {'depth': 8, 'lr': 1e-3},
        ]

It's obvious that you want test four groups hyper-parameters.
Besides, another parameter you can not forget, is 'task_id' that controls the training order and type of parallel or sequential.

time 'task_id':1 'task_id':2
t {'depth': 4, 'lr': 1e-3} {'depth': 4, 'lr': 1e-2} executed parallelly
t+1 {'depth': 8, 'lr': 1e-2}
t+2 {'depth': 8, 'lr': 1e-3}
executed sequentially

If you set the same task_id, the task that belongs to the same task id will be executed one by one, sequentially.
If there exist more than one task_id, these will be executed parallelly.
In short, the tasks that have same task_id will be executed sequentially.
The tasks that have different task_id will be executed parallelly.

For now we change the unfixed_params and add task_id.

 unfixed_params = [
        {'depth': 4, 'lr': 1e-3, 'task_id':1},
        {'depth': 8, 'lr': 1e-2, 'task_id':1},
        {'depth': 4, 'lr': 1e-2, 'task_id':2},
        {'depth': 8, 'lr': 1e-3, 'task_id':1},
        ]

That'it! But there still some problems, because these all run on cpu, because you haven't pass parameters to control training device.
Assuming that you have 2 gpus with id equal to 0 and 1. Useing 'gpu_ids_abs' to control the device id.
The groups of hyper-parameter are

 unfixed_params = [
        {'depth': 4, 'lr': 1e-3, 'task_id':1, 'gpu_ids_abs':[]},
        {'depth': 8, 'lr': 1e-2, 'task_id':1, 'gpu_ids_abs':[0]},
        {'depth': 4, 'lr': 1e-2, 'task_id':2, 'gpu_ids_abs':[1]},
        {'depth': 8, 'lr': 1e-3, 'task_id':1, 'gpu_ids_abs':[0]},
        ]

It means that run the first line task on CPU.
Run the second line and last line tasks on GPU 0.
Run the third line task on GPU 1.
Up to now you get the parameter of unfixed_params_list.

train_func

This is funciton that the input is one dict in the unfixed_params so it contains the hypter-parameters.
'logdir' is the log directory it will be generated automated if you don't pass a logdir.
Then, by using these parameters to build you jdit trainer and return it.

For example:

def build_task_trainer(unfixed_params):
    logdir = unfixed_params['logdir']
    gpu_ids_abs = unfixed_params["gpu_ids_abs"]
    depth = unfixed_params["depth"]
    lr = unfixed_params["lr"]

    batch_size = 32
    opt_name = "RMSprop"
    lr_decay = 0.94
    decay_position= 1
    decay_type = "epoch"
    weight_decay = 2e-5
    momentum = 0
    nepochs = 100
    num_class = 10
    torch.backends.cudnn.benchmark = True
    mnist = FashionMNIST(root="datasets/fashion_data", batch_size=batch_size, num_workers=2)
    net = Model(SimpleModel(depth), gpu_ids_abs=gpu_ids_abs, init_method="kaiming", verbose=False)
    opt = Optimizer(net.parameters(), opt_name, lr_decay, decay_position, decay_type,
                    lr=lr, weight_decay=weight_decay, momentum=momentum)
    Trainer = FashingClassTrainer(logdir, nepochs, gpu_ids_abs, net, opt, mnist, num_class)
    return Trainer

For some hyper-parameters, such as batch_size, we don't want to change and test them, so we just fix them. The other hyper-parameters that we want to test are in the unfixed_params, so quarry it from this dict().
Finally, return this trainer.

Example

Here I give you the whole code of an example by using classification trainer.

Quick start:

from  jdit.trainer.instances import start_fashingClassPrarallelTrainer
start_fashingClassPrarallelTrainer()

Code:

# coding=utf-8
import torch
import torch.nn as nn
import torch.nn.functional as F
from jdit.trainer.classification import ClassificationTrainer
from jdit.model import Model
from jdit.optimizer import Optimizer
from jdit.dataset import FashionMNIST
from jdit.parallel import SupParallelTrainer


class SimpleModel(nn.Module):
    def __init__(self, depth=64, num_class=10):
        super(SimpleModel, self).__init__()
        self.num_class = num_class
        self.layer1 = nn.Conv2d(1, depth, 3, 1, 1)
        self.layer2 = nn.Conv2d(depth, depth * 2, 4, 2, 1)
        self.layer3 = nn.Conv2d(depth * 2, depth * 4, 4, 2, 1)
        self.layer4 = nn.Conv2d(depth * 4, depth * 8, 4, 2, 1)
        self.layer5 = nn.Conv2d(depth * 8, num_class, 4, 1, 0)

    def forward(self, input):
        out = F.relu(self.layer1(input))
        out = F.relu(self.layer2(out))
        out = F.relu(self.layer3(out))
        out = F.relu(self.layer4(out))
        out = self.layer5(out)
        out = out.view(-1, self.num_class)
        return out


class FashingClassTrainer(ClassificationTrainer):
    def __init__(self, logdir, nepochs, gpu_ids, net, opt, dataset, num_class):
        super(FashingClassTrainer, self).__init__(logdir, nepochs, gpu_ids, net, opt, dataset, num_class)

    def compute_loss(self):
        var_dic = {}
        var_dic["CEP"] = loss = nn.CrossEntropyLoss()(self.output, self.labels.squeeze().long())
        _, predict = torch.max(self.output.detach(), 1)  # 0100=>1  0010=>2
        total = predict.size(0) * 1.0
        labels = self.labels.squeeze().long()
        correct = predict.eq(labels).cpu().sum().float()
        acc = correct / total
        var_dic["ACC"] = acc
        return loss, var_dic

    def compute_valid(self):
        var_dic = {}
        var_dic["CEP"] = nn.CrossEntropyLoss()(self.output, self.labels.squeeze().long())

        _, predict = torch.max(self.output.detach(), 1)  # 0100=>1  0010=>2
        total = predict.size(0) * 1.0
        labels = self.labels.squeeze().long()
        correct = predict.eq(labels).cpu().sum().float()
        acc = correct / total
        var_dic["ACC"] = acc
        return var_dic


def build_task_trainer(unfixed_params):
    logdir = unfixed_params['logdir']
    gpu_ids_abs = unfixed_params["gpu_ids_abs"]
    depth = unfixed_params["depth"]
    lr = unfixed_params["lr"]

    batch_size = 32
    opt_name = "RMSprop"
    lr_decay = 0.94
    decay_position= 1
    decay_type = "epoch"
    weight_decay = 2e-5
    momentum = 0
    nepochs = 100
    num_class = 10
    torch.backends.cudnn.benchmark = True
    mnist = FashionMNIST(root="datasets/fashion_data", batch_size=batch_size, num_workers=2)
    net = Model(SimpleModel(depth), gpu_ids_abs=gpu_ids_abs, init_method="kaiming", verbose=False)
    opt = Optimizer(net.parameters(), opt_name, lr_decay, decay_position, decay_type,
                    lr=lr, weight_decay=weight_decay, momentum=momentum)
    Trainer = FashingClassTrainer(logdir, nepochs, gpu_ids_abs, net, opt, mnist, num_class)
    return Trainer


def trainerParallel():
    unfixed_params = [
        {'task_id': 1, 'gpu_ids_abs': [],
         'depth': 4, 'lr': 1e-3,
         },
        {'task_id': 1, 'gpu_ids_abs': [],
         'depth': 8, 'lr': 1e-2,
         },

        {'task_id': 2, 'gpu_ids_abs': [],
         'depth': 4, 'lr': 1e-2,
         },
        {'task_id': 2, 'gpu_ids_abs': [],
         'depth': 8, 'lr': 1e-3,
         },
        ]
    tp = SupParallelTrainer(unfixed_params, build_task_trainer)
    return tp


def start_fashingClassPrarallelTrainer():
    tp = trainerParallel()
    tp.train()

if __name__ == '__main__':
    start_fashingClassPrarallelTrainer()

More