Implementing other Scikit Learn models for Federated Learning¶

In this tutorial, you will learn how to define and run any Scikit Learn Supervised and Unsupervised model, as well as Data reduction methods, in Fed-BioMed.

1. Introduction¶

Like in previous tutorials with Pytorch, you can implement custom Scikit Learn models in Fed-BioMed. In this tutorial, we are summarizing all the steps to set up a Scikit Learn model in Fed-BioMed.

Current Scikit-Learn Methods implemented in Fed-BioMed¶

Classifiers:
- SGDClassifier
- Perceptron

Regressor:
- SGDRegressor

Clustering:
- Coming Soon!

Check out our User Guide for further information about Scikit Learn models available in Fed-BioMed.

2. Scikit-Learn training plan¶

As you could have seen in the previous tutorials concerning Scikit-Learn, you should define a "Scikit-Learn training plan". We provide here a template to create a TrainingPlan for Scikit Learn. As for PyTorch training plan, every Scikit-Learn Training Plan class should be inherited from SGDSkLearnModel class.

Please note that method training_data returned different values depending on if the learning algorithm used is supervised or unsupervised.

2.1 Training Plan for supervised Learning (Regressor and Classifier)¶

A template of a Supervised Learning algorithm for Scikit-Learn models.

In [ ]:

  Copied!     
 
from fedbiomed.common.fedbiosklearn import SGDSkLearnModel


class SkLearnTrainingPlan(SGDSkLearnModel):
    def __init__(self, model_args: dict = {}):
        super(SkLearnClassifierTrainingPlan,self).__init__(model_args)
        # define here your dependency (any Numpy, Pytorch and Sklearn
        # module that should be imported for methods defined in the
        # Training Plan)
        self.add_dependency(["from sklearn.linear_model import MyModel"])
    
    
    def training_data(self):
        # Define here how data are handled and/or shuffled
        # You MUST return a tuple containing training data `X_train`
        # and targets `Y_train`.
        return X_train, Y_train
from fedbiomed.common.fedbiosklearn import SGDSkLearnModel class SkLearnTrainingPlan(SGDSkLearnModel): def __init__(self, model_args: dict = {}): super(SkLearnClassifierTrainingPlan,self).__init__(model_args) # define here your dependency (any Numpy, Pytorch and Sklearn # module that should be imported for methods defined in the # Training Plan) self.add_dependency(["from sklearn.linear_model import MyModel"]) def training_data(self): # Define here how data are handled and/or shuffled # You MUST return a tuple containing training data `X_train` # and targets `Y_train`. return X_train, Y_train

2.2 Training Plan for unsupervised Learning (Clustering) and Data Reduction Methods¶

A template for unsupervised algorithms or data reduction methods provided by Scikit-Learn.

In [ ]:

  Copied!     
 
from fedbiomed.common.fedbiosklearn import SGDSkLearnModel


class SkLearnTrainingPlan(SGDSkLearnModel):
    def __init__(self, model_args: dict = {}):
        super(SkLearnClassifierTrainingPlan,self).__init__(model_args)
        # define here your dependency (any Numpy, Pytorch and Sklearn
        # module that should be imported for methods defined in the
        # Training Plan)
        self.add_dependency(["from sklearn.linear_model import MyModel"])
    
    
    def training_data(self):
        # Define here how data are handled and /or shuffled
        # You MUST return a tuple containing training data `X_train`
        
        return X_train
from fedbiomed.common.fedbiosklearn import SGDSkLearnModel class SkLearnTrainingPlan(SGDSkLearnModel): def __init__(self, model_args: dict = {}): super(SkLearnClassifierTrainingPlan,self).__init__(model_args) # define here your dependency (any Numpy, Pytorch and Sklearn # module that should be imported for methods defined in the # Training Plan) self.add_dependency(["from sklearn.linear_model import MyModel"]) def training_data(self): # Define here how data are handled and /or shuffled # You MUST return a tuple containing training data `X_train` return X_train

3. Federated Scikit Learn model training¶

Training a Scikit Learn model is pretty similar to training a Pytorch model. The only difference is the selection of model hyperparameters (contained in model_args) and training parameters (in training_args). Initializing the class Experiment will allow the Researcher to search for active nodes tagged with defined tags.

In [ ]:

  Copied!     
 
from fedbiomed.researcher.experiment import Experiment

tags =  ['#MNIST', '#dataset']

# select nodes participing to this experiment
exp = Experiment(tags=tags,
                 model_path=model_file,
                 model_args=model_args,
                 model_class='SkLearnTrainingPlan',
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None)
from fedbiomed.researcher.experiment import Experiment tags = ['#MNIST', '#dataset'] # select nodes participing to this experiment exp = Experiment(tags=tags, model_path=model_file, model_args=model_args, model_class='SkLearnTrainingPlan', training_args=training_args, round_limit=rounds, aggregator=FedAverage(), node_selection_strategy=None)

3.1 Arguments for model definition and model training:¶

tags: a list containing tags that will be used for finding models. Same as for PyTorch models.
model_file: a file in which model should be defined and written in text format. Same as for Pytorch models.
model_args: a Python dictionary containing all arguments related to the model (ie all Scikit Learn model parameters). In addition, it MUST include the following fields:
- model: the name of Scikit-Learn model used.
- n_features: number of features in the dataset
- n_classes: number of classes (for classification or clustering algorithms only, ignored if a Regression algorithm is used).
model_class: the name of the Scikit-Learn training Plan class. Same as for Pytorch models.
training_args: a dictionary containing training parameter. For the moment, it contains the following entries:
- epochs: the number of epoch to be performed locally (ie on each node).
rounds: the number of rounds (ie global aggregations) to be performed. Same as for PyTorch models.
aggregator: the aggregation strategy, here Federated Average. More information on User Guide/Aggregators. Same as for PyTorch models.
node_selection_startegy: how to select/sample nodes among all available nodes. Same as for Pytorch models.

3.2 Traing the model¶

Calling the run method from Experiment will train the Federated Model.

In [ ]:

  Copied!     
 
exp.run()
 exp.run()

3.3 Retrieve model weigths for each Federated round.¶

The history of each round is accessed via aggregated_params() attribute of Experiment class. In fact, aggregated model at each round is contained in a dictionary, where each key corresponds to a specific round. Each key is mapping an aggregated model obtained through the round.

To extract all the history, enter :

In [ ]:

  Copied!     
 
exp.aggregated_params()
exp.aggregated_params()

More algorithms from Scikit-Learn are coming soon ! Stay Tuned !

Download Notebook