How to Create Your Custom PyTorch Model¶
The Fed-BioMed framework allows you to perform model training without changing your PyTorch model class completely. It only requires extra attributes and methods to train your model based on a federated approach. In this tutorial, you will learn how to write/define your model in Fed-BioMed using the PyTorch framework.
Note: Before starting this tutorial we highly recommend you to follow the previous tutorial to understand the basics of Fed-BioMed.
In this tutorial, you will learn,
- What is Fed-BioMed's training plan
- How to initialize your custom model
- How to create your
forward
method - What is the method
training_data
and how to make your custom PyTorch DataLoader to use intarning_data
. - How to prepare your model file to make it executable by the nodes.
During this tutorial, we will be working on Celaba (CelebaFaces) dataset. You can see details of the dataset here. In the following sections, you will have the instructions for downloading and configuring Celeba dataset for Fed-BioMed framework.
1. Fed-BioMed Training Plan¶
In this section, you are going to learn how to write your custom training plan.
What is Training Plan?¶
The training plan is the class that will be constructed by each node during every round of training. In short, it defines the attributes and methods of your network to be able to train your model. TorchTrainingPlan has been designed by considering the model class of the PyTorch framework. It inherits the class called TorchTrainingPlan which has been created for extending PyTorch nn.Module
. For more details, you can visit documentation for training plan. The following code snippet shows a basic training plan of Fed-BioMed for PyTorch.
class Net(TorchTrainingPlan):
def __init__(self, kwargs):
# ....
pass
def forward(self, x):
# ...
return
def training_data(self, batch_size = 48):
# ...
return
def training_step(self, data, target):
# ...
return
__init__
Method of Training Plan¶
__init__
method of the training plan is where you initialize your neural network layers same as PyTorch. This is also where you can initialize model arguments for defining layers of neural networks. In addition, you can define extra dependencies that you will need in your model class using the add_dependency
method which comes from TorchTrainingPlan
.
As mentioned before, we will be working on a classification model on the CelebA image dataset. The model will be able to predict if the person smiles or not. Therefore, you need to define the network's layers for this classification problem.
def __init__(self, kwargs):
super(Net, self).__init__()
# Convolutional layers
self.conv1 = nn.Conv2d(3, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 32, 3, 1)
self.conv3 = nn.Conv2d(32, 32, 3, 1)
self.conv4 = nn.Conv2d(32, 32, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
# Classifier
self.fc1 = nn.Linear(3168, 128)
self.fc2 = nn.Linear(128, 2)
# Here we define the custom dependencies that will be needed by our custom Dataloader
deps = ["from torch.utils.data import Dataset, DataLoader",
"from torchvision import transforms",
"import pandas as pd",
"from PIL import Image",
"import os",
"import numpy as np"]
self.add_dependency(deps)
forward()
Method¶
Next, you should define the forward method using the layers that are defined in __init__
. In the forward method, we create the forwarding process from input layer to output layer of network.
def forward(self, x):
x = self.conv1(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv2(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv3(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv4(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
training_data() and Custom Dataset
¶
training_data
is an important part of the model class. Since the training plan will be performed in different nodes, training_data
should process and return the data stored in the node. During each round of training, every node builds your model; gets the data using the method training_data
, and performs the training_step
.
The dataset that we propose to use for training is a custom image dataset. Therefore, you need to define a custom Dataset for PyTorch. To do so, a new class in the training plan using PyTorch's Dataset module has to be created.
Thanks to the Dataset module we don't load the full data of the images, we retrieve the image with the __getitem__
. This doesn't impact the ram usage as much as loading every image in the dataset.
class CelebaDataset(Dataset):
"""Custom Dataset for loading CelebA face images"""
def __init__(self, txt_path, img_dir, transform=None):
# Read the csv file that includes classes for each image
df = pd.read_csv(txt_path, sep="\t", index_col=0)
self.img_dir = img_dir
self.txt_path = txt_path
self.img_names = df.index.values
self.y = df['Smiling'].values
self.transform = transform
def __getitem__(self, index):
img = np.asarray(Image.open(os.path.join(self.img_dir, self.img_names[index])))
img = transforms.ToTensor()(img)
label = self.y[index]
return img, label
def __len__(self):
return self.y.shape[0]
Now, you need to define a training_data
method that will create a Pytorch DataLoader using the custom CelebaDataset
class.
def training_data(self, batch_size = 48):
# The training_data creates the Dataloader to be used for training in the general class Torchnn of Fed-BioMed
dataset = self.CelebaDataset(self.dataset_path + "/target.csv", self.dataset_path + "/data/")
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
data_loader = DataLoader(dataset, **train_kwargs)
return data_loader
training_step()
¶
The last method that needs to be defined is the training_step
. This method is responsible of executing the forward method and calculating the loss value for the backward process of the network.
def training_step(self, data, target):
output = self.forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
You are now ready to create your training plan class. All you need to do is to locate every method that has been explained in the previous sections. In the next steps we will;
- download the CelebA dataset and deploy it on the nodes
- define our complete model and save it as a python file
- create an experiment and run it
- evaluate our model using a test dataset
2.Configuring Nodes¶
We will be working with CelebA (CelebFaces) dataset. Therefore, please visit here and download the files img/img_align_celeba.zip
and Anno/list_attr_celeba.txt
. After the download operation is completed;
- Please go to
./notebooks/data/Celeba
in Fed-BioMed project. - Create
Celeba_raw/raw
directory and copy thelist_attr_celeba.txt
file. - Extract the zip file
img_align_celeba.zip
Your folder should be like the tree below;
Celeba
README.md
create_node_data.py
.gitignore
Celeba_raw
raw
list_attr_celeba.txt
img_align_celeba.zip
img_align_celeba
lots of images
Now, the dataset has to be processed and splitted to create three distinct datasets for Node 1, Node 2, and Node 3. You can do it easily by running the following script in your notebook. Please make sure that you start your notebook in the notebooks
directory of fedbiomed. Otherwise, the path that is defined in the following scripts may not work. If you are working in a different directory please make sure that you define the correct path in the following example.
Running the following scripts might take some time, please be patient.
import os
import numpy as np
import pandas as pd
import shutil
# Celeba folder
parent_dir = os.path.join(".", "data", "Celeba")
celeba_raw_folder = os.path.join("Celeba_raw", "raw")
img_dir = os.path.join(parent_dir, celeba_raw_folder, 'img_align_celeba') + os.sep
out_dir = os.path.join(".", "data", "Celeba", "celeba_preprocessed")
# Read attribute CSV and only load Smilling column
df = pd.read_csv(os.path.join(parent_dir, celeba_raw_folder, 'list_attr_celeba.txt'),
sep="\s+", skiprows=1, usecols=['Smiling'])
# data is on the form : 1 if the person is smiling, -1 otherwise. we set all -1 to 0 for the model to train faster
df.loc[df['Smiling'] == -1, 'Smiling'] = 0
# Split csv in 3 part
length = len(df)
data_node_1 = df.iloc[:int(length/3)]
data_node_2 = df.iloc[int(length/3):int(length/3) * 2]
data_node_3 = df.iloc[int(length/3) * 2:]
# Create folder for each node
if not os.path.exists(os.path.join(out_dir, "data_node_1")):
os.makedirs(os.path.join(out_dir, "data_node_1", "data"))
if not os.path.exists(os.path.join(out_dir, "data_node_2")):
os.makedirs(os.path.join(out_dir, "data_node_2", "data"))
if not os.path.exists(os.path.join(out_dir, "data_node_3")):
os.makedirs(os.path.join(out_dir, "data_node_3", "data"))
# Save each node's target CSV to the corect folder
data_node_1.to_csv(os.path.join(out_dir, 'data_node_1', 'target.csv'), sep='\t')
data_node_2.to_csv(os.path.join(out_dir, 'data_node_2', 'target.csv'), sep='\t')
data_node_3.to_csv(os.path.join(out_dir, 'data_node_3', 'target.csv'), sep='\t')
# Copy all images of each node in the correct folder
for im in data_node_1.index:
shutil.copy(img_dir+im, os.path.join(out_dir,"data_node_1", "data", im))
print("data for node 1 succesfully created")
for im in data_node_2.index:
shutil.copy(img_dir+im, os.path.join(out_dir, "data_node_2", "data", im))
print("data for node 2 succesfully created")
for im in data_node_3.index:
shutil.copy(img_dir+im, os.path.join(out_dir, "data_node_3", "data", im))
print("data for node 3 succesfully created")
Now if you go to the ${FEDBIOMED_DIR}/notebooks/data/Celaba
directory you can see the folder called celeba_preprocessed
. There will be three different folders that contain an image dataset for 3 nodes. The next step will be configuring the nodes and adding these datasets. We will configure only two nodes. The dataset for the third node is going to be used for testing.
Create 2 nodes for training :
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config node1.ini start
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config node2.ini start
Add data to each node :
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config node1.ini add
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config node2.ini add
Note: ${FEDBIOMED_DIR}
is a path relative to based directory of the cloned Fed-BioMed repository. You can set it by running command export FEDBIOMED_DIR=/path/to/fedbiomed
. This is not required for Fed-BioMed to work but enables you to run the tutorials more easily.
2.1. Configuration Steps¶
It is necessary to previously configure at least a node:
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config (ini file) add
- Select option 3 (images) to add an image dataset to the node
- Add a name and the tag for the dataset (tag should contain '#celeba' as it is the tag used for this training) and finally add the description
- Pick a data folder from the 3 generated datasets inside
data/Celeba/celeba_preprocessed
(eg:data_node_1
) - Data must have been added (if you get a warning saying that data must be unique is because it's been already added)
Check that your data has been added by executing
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config (ini file) list
- Run the node using
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config <ini file> start
. Wait until you getStarting task manager
. it means you are online.
After these steps, you are ready to train your classification model over two different nodes.
3. Defining Custom PyTorch Model¶
You should set a file path where you want to save your model file. By default, in the fedbiomed.researcher.environ path is defined as 'tmp' in the base fedbiomed directory.
from fedbiomed.researcher.environ import environ
import tempfile
import os
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+os.sep)
model_file = os.path.join(tmp_dir_model.name, 'CelebaClass.py') # name of the model class
Now, it is time to create our Net
class based on the methods that have been explained in the previous section. Please do not forget to add %%writefile "$model_file"
command at the beginning of the following cell. This command allows to write the script into the file. Thanks to that, experiment can access the model file and uploads it to the file repository to make it accessible for the nodes. The nodes get the model file from the file repository and do the training part based on the model defined in the Net
class.
%%writefile "$model_file"
import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import numpy as np
from PIL import Image
import os
class Net(TorchTrainingPlan):
def __init__(self):
super(Net, self).__init__()
# Convolutional layers
self.conv1 = nn.Conv2d(3, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 32, 3, 1)
self.conv3 = nn.Conv2d(32, 32, 3, 1)
self.conv4 = nn.Conv2d(32, 32, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
# Classifier
self.fc1 = nn.Linear(3168, 128)
self.fc2 = nn.Linear(128, 2)
deps = ["from torch.utils.data import Dataset, DataLoader",
"from torchvision import transforms",
"import pandas as pd",
"from PIL import Image",
"import os",
"import numpy as np"]
self.add_dependency(deps)
def forward(self, x):
x = self.conv1(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv2(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv3(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.conv4(x)
x = F.max_pool2d(x, 2)
x = F.relu(x)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
class CelebaDataset(Dataset):
"""Custom Dataset for loading CelebA face images"""
def __init__(self, txt_path, img_dir, transform=None):
df = pd.read_csv(txt_path, sep="\t", index_col=0)
self.img_dir = img_dir
self.txt_path = txt_path
self.img_names = df.index.values
self.y = df['Smiling'].values
self.transform = transform
print("celeba dataset finished")
def __getitem__(self, index):
img = np.asarray(Image.open(os.path.join(self.img_dir,
self.img_names[index])))
img = transforms.ToTensor()(img)
label = self.y[index]
return img, label
def __len__(self):
return self.y.shape[0]
def training_data(self, batch_size = 48):
# The training_data creates the Dataloader to be used for training in the general class Torchnn of fedbiomed
dataset = self.CelebaDataset(os.path.join(self.dataset_path, "target.csv"),
os.path.join(self.dataset_path, "data")+os.sep)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
data_loader = DataLoader(dataset, **train_kwargs)
return data_loader
def training_step(self, data, target):
#this function must return the loss to backward it
output = self.forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
This group of arguments corresponds respectively to:
model_args
: a dictionary with the arguments related to the model (e.g. number of layers, features, etc.). This will be passed to the model class on the node-side.training_args
: a dictionary containing the arguments for the training routine (e.g. batch size, learning rate, epochs, etc.). This will be passed to the routine on the node-side.
Note: Typos and/or lack of positional (required) arguments might raise an error.
training_args = {
'batch_size': 32,
'lr': 1e-3,
'epochs': 1,
'dry_run': False,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
4. Training Federated Model¶
To provide training orchestration over two nodes we need to difene an experiment. The experiment:
- searches nodes serving data for the
tags
, - define the local training on nodes with the model saved in
model_path
, and federate all local updates at each round withaggregator
- runs training for
rounds
.
You can visit user guide to know much more about experiment.
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
tags = ['#celeba']
rounds = 3
exp = Experiment(tags=tags,
model_path=model_file,
model_class='Net',
training_args=training_args,
rounds=rounds,
aggregator=FedAverage(),
node_selection_strategy=None)
Let's start the experiment.
By default, this function doesn't stop until all the rounds
are done for all the nodes. While the experiment runs you can open the terminals where you have started the nodes and see the training progress. However, the loss values obtained from each node during the training will be printed as output in real time. Since we are working on an image dataset, training might take some time.
exp.run()
Loading Training Parameters¶
After all the rounds have been completed, you retrieve the aggregated parameters from the last round and load them.
fed_model = exp.model_instance
fed_model.load_state_dict(exp.aggregated_params[rounds - 1]['params'])
5. Testing Federated Model¶
We will define a testing routine to extract the accuracy metrics on the testing dataset. We will use the dataset that has been extracted into data_node_3
.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import numpy as np
from PIL import Image
import os
def testing_Accuracy(model, data_loader):
model.eval()
test_loss = 0
correct = 0
device = "cpu"
loader_size = len(data_loader)
with torch.no_grad():
for idx, (data, target) in enumerate(data_loader):
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item()
# Only uses 10% of the dataset, results are similar but faster
if idx >= loader_size / 10:
pass
break
pred = output.argmax(dim=1, keepdim=True)
test_loss /= len(data_loader.dataset)
accuracy = 100* correct/(data_loader.batch_size * idx)
return(test_loss, accuracy)
We also need to define a custom Dataset class for the test dataset in order to load it using PyTorch's DataLoader
. This will be the same class that has been already defined in the training plan.
test_dataset_path = "./data/Celeba/celeba_preprocessed/data_node_3"
class CelebaDataset(Dataset):
"""Custom Dataset for loading CelebA face images"""
def __init__(self, txt_path, img_dir, transform=None):
df = pd.read_csv(txt_path, sep="\t", index_col=0)
self.img_dir = img_dir
self.txt_path = txt_path
self.img_names = df.index.values
self.y = df['Smiling'].values
self.transform = transform
print("celeba dataset finished")
def __getitem__(self, index):
img = np.asarray(Image.open(os.path.join(self.img_dir,
self.img_names[index])))
img = transforms.ToTensor()(img)
label = self.y[index]
return img, label
def __len__(self):
return self.y.shape[0]
dataset = CelebaDataset(os.path.join(test_dataset_path, "target.csv"),
os.path.join(test_dataset_path, "data") +os.sep)
train_kwargs = {'batch_size': 128, 'shuffle': True}
data_loader = DataLoader(dataset, **train_kwargs)
acc_federated = testing_Accuracy(fed_model, data_loader)
acc_federated[1]
Conclusions¶
In this tutorial, running a custom model on Fed-BioMed using the PyTorch framework has been explained. Because the examples are designed for the development environment, we have been running nodes in the same host machine. In production, the nodes that you need to use to train your model will serve in remote servers. Since Fed-BioMed is still in the development phase, in future there might be updates in the function and the methods of these tutorials. Therefore, please keep you updated from our GitLab repository.