In Depth Experiment Configuration¶
Introduction¶
The Experiment class provides an interface that you can manage your experiment with backward compatibility. It means that even if your Experiment has been built/defined you will be able to configure its parameters, and allow you to run your notebooks created using previous FedBiomed versions (<3.4). This feature will provide more control over your experiment even after you have been running your experiment for several rounds. In this tutorial, the experiment interface will be explained using MNIST basic example.
1. Configuring Fed-BioMed Environment¶
Before running this notebook, you need to configure your environment by completing the following steps:
1.1. Starting the Network Component¶
Please run following command to start Network component that provided communication between your notebook and the node;
{FEDBIOMED_DIR}/scripts/fedbiomed_run network
This command will launch docker containers. Therefore, please make sure that your Docker engine is up and running.
1.2. Deploying MNIST Dataset in the Node¶
Please run following command to add MNIST dataset into your Node. This command will deploy MNIST dataset in your default node whose config file is located in {FEDBIOMED_DIR}/etc
directory as config_node.ini
After running following command, please select data type 2) default
, use default tags
and select the folder where MNIST dataset will be saved.
{FEDBIOMED_DIR}/scripts/fedbiomed_run node add
1.3. Starting the Node¶
After you have successfully completed previous step, please run following command to start your node.
{FEDBIOMED_DIR}/scripts/fedbiomed_run node start
2. Creating a Model¶
Before declaring an experiment, the model that will be used for federated training should be defined. The model that is goıng to be used is exactly the same model that has been created in the Basic MNIST tutorial. We recommend you to follow Basic MNIST tutorial on PyTorch Framework to understand following steps.
import os
import tempfile
from fedbiomed.researcher.environ import environ
tmp_dir_model = tempfile.TemporaryDirectory(dir=environ['TMP_DIR'])
model_file = os.path.join(tmp_dir_model.name, 'class_export_mnist.py')
%%writefile "$model_file"
import torch
import torch.nn as nn
from fedbiomed.common.torchnn import TorchTrainingPlan
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# Here we define the model to be used.
# You can use any class name (here 'Net')
class MyTrainingPlan(TorchTrainingPlan):
def __init__(self):
super(MyTrainingPlan, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
# Here we define the custom dependencies that will be needed by our custom Dataloader
# In this case, we need the torch DataLoader classes
# Since we will train on MNIST, we need datasets and transform from torchvision
deps = ["from torchvision import datasets, transforms",
"from torch.utils.data import DataLoader"]
self.add_dependency(deps)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
def training_data(self, batch_size = 48):
# Custom torch Dataloader for MNIST data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])
dataset1 = datasets.MNIST(self.dataset_path, train=True, download=False, transform=transform)
train_kwargs = {'batch_size': batch_size, 'shuffle': True}
data_loader = torch.utils.data.DataLoader(dataset1, **train_kwargs)
return data_loader
def training_step(self, data, target):
output = self.forward(data)
loss = torch.nn.functional.nll_loss(output, target)
return loss
After running the cells above, your model code containing your TrainingPlan
will be saved in path which is defined in the variable model_file
. This path will be used while declaring an experiment.
3. Creating an Experiment Step by Step¶
The experiment class can be created without passing any argument. This will just build an empty experiment object. Afterwards, you will be able to define your arguments using setters provided by Experiment
class.
It is always possible to create a fully configured experiment by passing all arguments during the initialization. You can also create your experiment with some of the arguments and set the other arguments afterwards.
3.1. Building an Empty Experiment¶
After building an empty experiment you won't be able to perform federated training, since it is not fully configured. That's why the output of the initialization of Experiment
will always remind you that the experiment is not fully configured.
from fedbiomed.researcher.experiment import Experiment
exp = Experiment()
3.2. Displaying Current Status of Experiment¶
As an addition to output of the initialization, to find out more about the current status of the experiment, you can call the info()
method of your experiment object. This method will print the information about your experiment and what you should complete to be able to start your federated training.
exp.info()
Based on the output, some arguments are defined with default values, while others are not. Model arguments, training arguments, tags, round limit, training data etc. have no default value, and therefore are required to be set in order to run an experiment. However, these arguments are related to each other. For example, to be able to define your federated training data you need to define the tags
first, and then while setting your training data argument, experiment will be able to send search request to the nodes to receive information about the datasets. These relations between the arguments will be explained in the following steps.
3.3. Setting Model for The Experiment¶
The model that is going to be used for training can be set in the experiment using the methods set_model_path
and set_model_class
. The model_path
is the path your model is saved as a python script. As you remember, in the previous section, the model class has been created and saved in the path which is defined in the variable model_file
. However, the experiment also need to now your class name. Hence, when working with Jupyter-Notebooks, you can set your class name as a string
with set_model_class
. Since it is a python script (module), class name will be used for importing operation at the back-end. Therefore, it is always better to define both argument successively.
If you are not running your code in Jupyter notebook (IPython kernel), you can directly set your class as it is with set_model_class()
(as a python class not string). The experiment will be able to extract source of your class and you won't need to provide the argument model_path
.
exp.set_model_class(model_class="MyTrainingPlan")
exp.set_model_path(model_path=model_file)
If you set your model path first, setter will log a debug message which will inform you about the model is not defined yet. This is because the model class has not been set yet
3.4. Setting Model and Training Arguments¶
In the previous step, the model has been defined for your experiment. Now, you can define your model arguments and training arguments that will be used respectively for building your model class and training your model on the node side. The methods set_model_args
and set_training_args
of the experiment class will allow you to set these arguments.
There isn't any requirement on the order of defining model class and mode/training arguments. It is also possible to define model/training arguments first and model class after.
# Model arguments should be an empty Dict, since our model does not require
# any argument for initialization
model_args = {}
# Training Arguments
training_args = {
'batch_size': 48,
'lr': 1e-3,
'epochs': 1,
'dry_run': False,
'batch_maxnum': 100 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
exp.set_model_args(model_args=model_args)
exp.set_training_args(training_args=training_args)
3.5. Setting Tags¶
The tags for the dataset search request can be set using set_tags
method of experiment object.
Setting tags does not mean sending dataset search request. Search request is sent while setting training data. `tags` is the argument that is required for the search request.
The arguments tags
of set_tags
method should be an array of tags which are in string
type or just a tag in string
type.
tags = ['#MNIST', '#dataset']
exp.set_tags(tags = tags)
To see the tags that are set, you can run tags()
method of experiment object.
exp.tags()
3.6. Setting Nodes¶
The nodes
arguments indicates the nodes that are going to be used for the experiment. By default, it is equal to None
which means every node up and running will be part of the experiment as long as they have the dataset that is going to be used for training (and that has been registered under the tags). If the nodes
argument has been set in advance when configuring Experiment
, the search request for the dataset search will be sent only to nodes that have been indicated. You can set nodes using the method exp.set_nodes(noes=nodes)
. This method takes nodes
argument which should be an array of node ids which are of type string
or just a single node id passed as a string
.
Since each node id is created randomly to the node when they are configured, we won't be setting nodes
for this experiment, so it is possible to run this notebook regardless of the environment.
3.7. Setting Training Data¶
Training data is a FederatedDataset
instance which comes from the module fedbiomed.researcher.datasets
. There are several ways to define your training data.
- You can run
set_training_data(training_data=None, from_tags=True)
. This will send search request to the nodes to get dataset information by using thetags
which are defined before. - You can provide
training_data
argument which is an instance ofFederatedDataSet
. - You can provide
training_data
argument as python dictionarydict
and setter will create aFederatedDataSet
object by itself.
While using the last option please make sure that your `dict` object is configured accordingly to `FederatedDataSet` schema. Otherwise, you might get error while running your experiment.
If you run set_training_data(training_data=None)
, this means that no training data is defined yet for the experiment (training_data
is set to None
).
training_data = exp.set_training_data(training_data=None, from_tags=True)
Since it will send search request to the nodes, the output will inform you about selected nodes for training. It means that those nodes have the dataset and able to train your model.
set_training_data
will return a FederatedDataSet
object. You can either use the return value of the setter or the getter for training data which is training_data()
.
training_data = exp.training_data()
To inspect the result in detail you can call the method data()
of the FederatedDataSet
object. This will return a python dictionary that includes information about the datasets that has been found in the nodes.
training_data.data()
As it is mentioned before, setting training data once doesn't mean that you can't change it, for you can create a new FederatedDataSet
with a dict
that includes the information about the datasets. This will allow you to select the datasets that will be used for federated training.
Since the dataset information will be provided, there will be no need to send request to the nodes
from fedbiomed.researcher.datasets import FederatedDataSet
tr_data = training_data.data()
federated_dataset = FederatedDataSet(tr_data)
exp.set_training_data(training_data = federated_dataset)
Or, you can directly use tr_data
in set_training_data()
exp.set_training_data(training_data = tr_data)
If you change the tags for the dataset by using set_tags
and if there is already a defined training data in your experiment object, you have to update your training data by running exp.set_training_data(training_data=None)
.
3.8. Setting an Aggregator¶
An aggregator is one of the required arguments for the experiment. It is used for aggregating model parameters that are received from the nodes after every round (ie once training is done on each node). By default, when the experiment is initialized without passing any aggregator, it will automatically use the default FedAverage
aggregator class. However, it is also possible to set a different aggregation algorithm with the method set_aggregator
. Currently, Fed-BioMed has only FedAverage
but it is possible to create custom aggregator classes.
You can get the current aggregator by running exp.aggregator()
. It will return the aggregator object that will be used for aggregation.
exp.aggregator()
Let's supposed that you have created your own aggregator: then, you can set it as follows:
from fedbiomed.researcher.aggregators.fedavg import FedAverage
exp.set_aggregator(aggregator=FedAverage)
If your aggregator class needs initialization parameters, you can build your class and pass as an instance of an object.
fed_average = FedAverage()
exp.set_aggregator(aggregator=fed_average)
3.9. Setting Node Selection Strategy¶
Node selection Strategy is also one of the required arguments for the experiment. It is used for selecting nodes before each round of training. Since the strategy will be used for selecting nodes, thus, training data should be already set before setting any strategies. Then, strategy will be able to select for training nodes that are currently available regarding their dataset.
By default, set_strategy(node_selection_strategy=None)
will use the default DefaultStrategy
strategy. It is the default strategy in FedBiomed that selects for the training all the nodes available regardless their datasets. However, it is also possible to set different strategies. Currently, Fed-BioMed only provides DefaultStrategy
but you can create your custom strategy classes.
exp.set_strategy(node_selection_strategy=None)
Or, you can directly pass DefaultStrategy
(or any Strategy class) as an argument
from fedbiomed.researcher.strategies.default_strategy import DefaultStrategy
exp.set_strategy(node_selection_strategy=DefaultStrategy)
# To make sure the strategy has been set
exp.strategy()
3.10. Setting Round Limit¶
round_limit
argument is the limit that indicates max number of rounds of the training. By default, it is None
and it needs to be set before running your experiment. You can set the round limit with the method set_round_limit
. round_limit
can be changed after running one or several rounds of training. You can always execute exp.round_limit()
to see current round limit.
exp.set_round_limit(round_limit=2)
exp.round_limit()
3.11. Setting Job to Manage Federated Training Rounds¶
Job is a class that manages federated training rounds. Before setting job, strategy for selecting nodes, model and training data should be set. Therefore, please make sure that they are all defined before setting job. The method set_job
creates the Job instance and it does not take any argument.
exp.set_job()
exp.job()
3.12. Controlling Experiment Status Before Starting Training Rounds¶
Now, let's see if our experiment is ready for the training.
exp.info()
If the experiment is ready, you will see the message that says Experiment can be run now (fully defined)
at the bottom of the output. So now, we can run the experiment
4. Running The Experiment¶
As long as info()
says that the experiment is fully defined you will be able to run your experiment. Experiment has two methods: run()
and run_once()
for running training rounds.
run()
runs the experiment rounds from current round to round limit. If the round limit is reached it will indicate that the round limit has been reached. However, the methodrun
takes 2 arguments asround
andincrease
.round
is an integer that indicates number of rounds that are going to be run. If the experiment is at round0
, the round limit is4
, and if you passround
as 3, it will run the experiment only for3
rounds.increase
is a boolean that indicates whether round limit should be increased if the givenround
passes over the round limit. For example, if the current round is3
, the round limit is4
, and theround
argument is2
, the experiment will increase round limit to5
run_once()
runs the experiment for single round of training. If the round limit is reached it will indicate that the round limit has been reached. However, if it is executed asrun_once(increase=True)
when the round limit is reached, it increases the round limit for one round.### 4.1. Running the Experiment once
exp.run_once()
After running the experiment for once, you can check the current round. It returns 1
which means only one round has been run.
exp.round_current()
Now, let's run the experiment with run_once()
again.
exp.run_once()
Since the round limit has been set to 2
the round limit had been reached. If you try to run run()
or run_once()
the experiment will indicate that the round limit has been reached.
exp.run_once()
exp.run()
After this point, if you would like to run the experiment you can increase round limit with set_round_limit(round)
exp.set_round_limit(4)
print('Round Limit : ' , exp.round_limit())
print('Current Round : ' , exp.round_current())
The round limit of the experiment has been set to 4
and the completed number of rounds is 2
. It means if you run the experiment with method run()
without passing any argument, it will run the experiment for 2
rounds.
exp.run()
Let's check the current round status of the experiment.
print('Round Limit : ' , exp.round_limit())
print('Current Round : ' , exp.round_current())
Another way to run your experiment if the round limit is reached is by passing rounds
argument to the method run()
. For example, following cell will run the experiment for 2
more rounds.
exp.run(rounds=2, increase=True) # increase is True by default
If the argument increase
is False
, it will not increase the round limit automatically.
exp.run(rounds=2, increase=False)
print('Round Limit : ' , exp.round_limit())
print('Current Round : ' , exp.round_current())
It is also possible to increase number of rounds while running the experiment with run_once()
by passing increase
argument as True
exp.run_once(increase=True)
print('Round Limit : ' , exp.round_limit())
print('Current Round : ' , exp.round_current())
4.2. Changing Training Arguments for the Next Round¶
The method set_training_args()
allows you to change the training arguments even if you've already run your experiment several times. Thanks to the method set_training_args()
you will be able to configure your training from one round to another. For example, we can change our batch_size
to 64
and batch_maxnum
to 50
for the next round.
# Training Arguments
training_args = {
'batch_size': 64,
'lr': 1e-3,
'epochs': 1,
'dry_run': False,
'batch_maxnum': 50
}
exp.set_training_args(training_args=training_args)
exp.run_once(increase=True)
Conclusions¶
The Experiment
class is the interface and the orchestrator of the whole processes behind federated training on the researcher side. It allows you to manage your federated training experiment easily. It has been extended with setter and getter methods to ease its declaration. This also provides more control before, during or after the training rounds. The purpose of the experiment class is to provide a robust interface for end-user to make them able to easily perform their federated training on Fed-BioMed nodes.