Training with Approved Models Files¶
Fed-BioMed offers a feature to run only the pre-approved models on the nodes. The nodes that you will be sending your model might require approved models. Therefore, the model files that are sent by a researcher should be approved by the node side in advance. In this workflow, the approval process is done by a real user/person who will review the code contained in the model file. The reviewer make sure the model doesn't contain any code that might cause privacy issues or harm the node.
In this tutorial, we will be creating a node with activated model approval option and run getting started MNIST example.
Setting Up a Node¶
Enabling model approval can be done from configuration file or Fed-BioMed CLI (command Line Interface) while starting the node. The process of creating and starting a node with model approval option is not so different than setting up a normal node. By default, if any option is not specified for the CLI, the node disables model approval option. Default security section of configuration file looks like the configuration below (under [security]
sub-section).
[security]
hashing_algorithm = SHA256
allow_default_models = True
model_approval = False
The Fed-BioMed CLI can get two additional parameters as --enable-model-approval
and --allow-default-models
to activate model approval;
--enable-model-approval
: This parameter enables model approval for the node. If there isn't a config file for the node while running CLI, it creates a new config file with enabled model approval modemodel_approval = True
.--allow-default-models
: This parameter allows default models for train requests. These are the models that comes for Fed-BioMed tutorials. For example, the model for MNIST dataset that we will be using for this tutorial. If the default models are enabled, node updates/registers model file which is located inenvs/developments/default_models
directory during starting process of the node.
You can visit documentation for model manager to have more information about managing model in the nodes.
Adding MNIST Dataset to The Node.¶
In this section we will add MNIST dataset to a new node. While adding the dataset through CLI, we'll also specify --enable-model-approval
and --allow-default-models
options. Now, let's run the following command.
$ {FEDBIOMED_DIR}/scripts/fedbiomed_run node config config-n1.ini --enable-model-approval --allow-default-models add
The CLI will ask you to select the dataset type. Since we will be working on MNIST dataset, please select 2
(default) and continue by typing y
for the next prompt and select folder that you want to store MNIST dataset. Afterward, if you go to etc
directory of fedbiomed, you can see config-n1.ini
file.
The above shell command will create new config-n1.ini
file with following configurations:
[security]
hashing_algorithm = SHA256
allow_default_models = True
model_approval = True
Starting the Node¶
Now you can start your node by running following command;
$ {FEDBIOMED_DIR}/scripts/fedbiomed_run node config config-n1.ini start
Since, config file has been configured to enable model approval mode, you do not need to specifiy any extra parameter while starting the node. But it is also possible to start node with --enable-model-approval
, --allow-default-models
or --disable-model-approval
, --disable-default-models
. If you start your node with --disable-model-approval
it will disable model approval even it is enabled in the config file.
Creating a Experiment with Approved Model File¶
In this section, we will be using default MNIST model which has been already registered by the node. We'll be creating experiment and controling whether the model file is approved or not.
from fedbiomed.researcher.environ import environ
import tempfile
import os
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+os.sep)
model_file = os.path.join(tmp_dir_model.name, 'class_export_mnist.py')
The following model is the model that will be sent to the node for traning. Since the model files are processed by the Experiment to configure dependencies, the part that imports modules might be different than this one. Therefore, it is important to get final model after initializing the experiment.
%%writefile "$model_file"
import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# Here we define the model to be used.
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
def __init__(self):
super(MyTrainingPlan, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
# Here we define the custom dependencies that will be needed by our custom Dataloader
# In this case, we need the torch DataLoader classes
# Since we will train on MNIST, we need datasets and transform from torchvision
deps = ["from torchvision import datasets, transforms",
"from torch.utils.data import DataLoader"]
self.add_dependency(deps)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
def training_data(self, batch_size = 48):
# Custom torch Dataloader for MNIST data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
return data_loader
def training_step(self, data, target):
output = self.forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
To be able to get/see the final model file we need to initialize the experiment.
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
tags = ['#MNIST', '#dataset']
rounds = 2
model_args = {}
training_args = {
'batch_size': 48,
'lr': 1e-3,
'epochs': 1,
'dry_run': False,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
exp = Experiment(tags=tags,
#nodes=None,
model_path=model_file,
model_args=model_args,
model_class='MyTrainingPlan',
training_args=training_args,
rounds=rounds,
aggregator=FedAverage(),
node_selection_strategy=None)
Getting Final Model File From Experiment¶
model_file()
Exeperiment
's methd displays the model file that will be sent to the node. Even the experiment couldn't find any node to train your model, you should be able to get your final model.
exp.model_file(display=False)
# or, to get only the path where model file is saved.
# exp.model_file(display=False)
Checking Status of The Model¶
The exp.get_model_status()
sends request to the nodes to check whether the model is approved or not. This method send request only to the nodes that has been found after dataset search.
status = exp.check_model_status()
The logs should indicate that the model is approved. You can also get status object from the result of the check_model_status()
. it returns list of status objects each for different node. Since we have only launched single node. For this example, it will return only one status object since we have only one node.
approval_obligation
: Indicates whether the model approval option is enabled in the node.is_approved
: Indicates whether the models has been approved by the node or not.
status
Changing Model And Testing Model Approval Status¶
Let's change our previous defaul model and test whether it is approved or not. We will be changing the network structure.
from fedbiomed.researcher.environ import environ
import tempfile
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+os.sep)
model_file_2 = os.path.join(tmp_dir_model.name, 'class_export_mnist_2.py')
%%writefile "$model_file_2"
import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
class MyTrainingPlan(TorchTrainingPlan):
def __init__(self):
super(MyTrainingPlan, self).__init__()
self.conv1 = nn.Conv2d(1, 16, 5, 1, 2)
self.conv2 = nn.Conv2d(16, 32, 5, 1, 2)
self.fc1 = nn.Linear(32 * 7 * 7, 10)
deps = ["from torchvision import datasets, transforms",
"from torch.utils.data import DataLoader"]
self.add_dependency(deps)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = torch.flatten(x, 1)
x = self.fc1(x)
output = F.log_softmax(x, dim=1)
return output
def training_data(self, batch_size = 48):
# Custom torch Dataloader for MNIST data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
return data_loader
def training_step(self, data, target):
output = self.forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
tags = ['#MNIST', '#dataset']
rounds = 2
model_args = {}
training_args = {
'batch_size': 48,
'lr': 1e-3,
'epochs': 1,
'dry_run': False,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
exp2 = Experiment(tags=tags,
model_path=model_file_2,
model_args=model_args,
model_class='MyTrainingPlan',
training_args=training_args,
rounds=rounds,
aggregator=FedAverage(),
node_selection_strategy=None)
Since we changed the model articheture (we removed dropouts and one dense layer fc2
) in this experiment, the output of the following method should say that the model is not approved by the node and is_approved
key of the result object should be equal to False
.
status = exp2.check_model_status()
status
Since the model is not approved, you won't be able to train your model in the node.
exp2.run()
In that case, you should contact the node owner and ask for model approval
Registering/Approving the Model¶
To register/approve the model that has been created in the previous section, we can use Fed-BioMed CLI. You do not need to stop your node to register new models you can perfom registration process in a different terminal window. However, first we need to create another experiment as exp3
and get the model file.
In the previous notebook cells, we tried to run a model which is not approved by the node. Therefore, your notebook kernel should have been killed. You might need to restart your kernel to be able to run your expirement. After restarting, please follow the tutorial directly from this section.
from fedbiomed.researcher.environ import environ
import tempfile, os
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+os.sep)
model_file_3 = os.path.join(tmp_dir_model.name, 'class_export_mnist_3.py')
%%writefile "$model_file_3"
import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
class MyTrainingPlan(TorchTrainingPlan):
def __init__(self):
super(MyTrainingPlan, self).__init__()
self.conv1 = nn.Conv2d(1, 16, 5, 1, 2)
self.conv2 = nn.Conv2d(16, 32, 5, 1, 2)
self.fc1 = nn.Linear(32 * 7 * 7, 10)
deps = ["from torchvision import datasets, transforms",
"from torch.utils.data import DataLoader"]
self.add_dependency(deps)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = torch.flatten(x, 1)
x = self.fc1(x)
output = F.log_softmax(x, dim=1)
return output
def training_data(self, batch_size = 48):
# Custom torch Dataloader for MNIST data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
return data_loader
def training_step(self, data, target):
output = self.forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
tags = ['#MNIST', '#dataset']
rounds = 2
model_args = {}
training_args = {
'batch_size': 48,
'lr': 1e-3,
'epochs': 1,
'dry_run': False,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
exp3 = Experiment(tags=tags,
model_path=model_file_3,
model_args=model_args,
model_class='MyTrainingPlan',
training_args=training_args,
rounds=rounds,
aggregator=FedAverage(),
node_selection_strategy=None)
exp3.model_file()
The output of the exp3.model_file
is a file path that show where the final model is saved. It also prints the content of the model file. You can either get the content of model from the output cell or the path where it is save. Anyway, you need to create a new txt
file and copy the model content in it. You can create new directory in Fedi-BioMed call models
and inside it you can create new my-model.txt
file and copy the model content into it.
$ mkdir {FEDBIOMED_DIR}/my_approved_model
$ cp <model_path_file> {FEDBIOMED_DIR}/my_approved_model/my_model.txt
Where <model_path_file>
is the path file of the model outputed by command exp3.model_file(display=False)
Afterward, please run following command in other terminal to register model file.
$ {FEDBIOMED_DIR}/scripts/fedbiomed_run node config config-n1.ini --register-model
You should type a unique name for your model e.g. 'MyTestModel-1' and a description. The CLI will ask you select model file you want to register. Select the file that you saved and continue.
Now, you should be able to train your model.
exp3.check_model_status()
exp3.run()