Federated 2d image classification with MONAI¶

Introduction¶

This tutorial shows how to deploy in Fed-BioMed the 2d image classification example provided in the project MONAI (https://monai.io/):

https://github.com/Project-MONAI/tutorials/blob/master/2d_classification/mednist_tutorial.ipynb

Being MONAI based on PyTorch, the deployment within Fed-BioMed follows seamlessy the same general structure of general PyTorch training plans.

Following the MONAI example, this tutorial is based on the MedNIST dataset.

Creating MedNIST nodes¶

MedNIST provides an artificial 2d classification dataset created by gathering different medical imaging datasets from TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset. The dataset is kindly made available by Dr. Bradley J. Erickson M.D., Ph.D. (Department of Radiology, Mayo Clinic) under the Creative Commons CC BY-SA 4.0 license.

To proceed with the tutorial, we created an iid partitioning of the MedNIST dataset between 3 clients. Each client has 3000 image samples for each class. The training partitions are availables at the following link:

https://drive.google.com/file/d/1vLIcBdtdAhh6K-vrgCFy_0Y55dxOWZwf/view

The dataset owned by each client has structure:

└── client_*/

├── AbdomenCT/

└── BreastMRI/

└── CXR/

└── ChestCT/

└── Hand/

└── HeadCT/

To create the federated dataset, we follow the standard procedure for node creation/population of Fed-BioMed. After activating the fedbiomed network with the commands

source ./scripts/fedbiomed_environment network

and

./scripts/fedbiomed_run network

we create a first node by using the commands

source ./scripts/fedbiomed_environment node

./scripts/fedbiomed_run node start

We then poulate the node with the data of first client:

./scripts/fedbiomed_run node add

We select option 3 (images) to add MedNIST partition of client 1, by just picking the folder of client 1. Assign tag #MEDNIST, #dataset to the data when asked.

We can further check that the data has been added by executing ./scripts/fedbiomed_run node list

Following the same procedure, we create the other two nodes with the datasets of client 2 and client 3 respectively.

Running Fed-BioMed Researcher¶

We are now ready to start the reseracher enviroment with the command source ./scripts/fedbiomed_environment researcher, and open the Jupyter notebook.

We can first quesry the network for the mednist dataset. In this case, the nodes are sharing the respective partitions unsing the same tag mednist:

In [ ]:

  Copied!     
 
from fedbiomed.researcher.requests import Requests
req = Requests()
req.list(verbose=True)
from fedbiomed.researcher.requests import Requests req = Requests() req.list(verbose=True) 

Create an experiment to train a model on the data found¶

The code for network and data loader of the MONAI tutorial can now be deployed in Fed-BioMed. We first import the necessary modules from fedbiomed and monai libraries:

We can now define the training plan. Note that we can simply use the standard TorchTrainingPlan natively provided in Fed-BioMed. We reuse the MedNISTDataset data loader defined in the original MONAI tutorial, which is returned by the method training_data, which also implements the data parsing from the nodes dataset_path. Following the MONAI tutorial, the model is the DenseNet121.

In [ ]:

  Copied!     
 
import os
import numpy as np
import torch
import torch.nn as nn
from fedbiomed.common.training_plans import TorchTrainingPlan
from fedbiomed.common.data import DataManager
from torchvision import datasets, transforms

from monai.apps import download_and_extract
from monai.config import print_config
from monai.data import decollate_batch
from monai.metrics import ROCAUCMetric
from monai.networks.nets import DenseNet121
from monai.transforms import (
    Activations,
    AddChannel,
    AsDiscrete,
    Compose,
    LoadImage,
    RandFlip,
    RandRotate,
    RandZoom,
    ScaleIntensity,
    EnsureType,
)
from monai.utils import set_determinism



# Here we define the training plan to be used. 
# You can use any class name (here 'MyTrainingPlan')
class MyTrainingPlan(TorchTrainingPlan):

    # Declare dependencies
    def init_dependencies(self):
        deps = ["import numpy as np",
                "import os",
                "from monai.apps import download_and_extract",
                "from monai.config import print_config",
                "from monai.data import decollate_batch",
                "from monai.metrics import ROCAUCMetric",
                "from monai.networks.nets import DenseNet121",
                "from monai.transforms import ( Activations, AddChannel, AsDiscrete, Compose, LoadImage, RandFlip, RandRotate, RandZoom, ScaleIntensity, EnsureType, )",
                "from monai.utils import set_determinism"]
        
        return deps
    
    # Define and return model
    def init_model(self):

        model = DenseNet121(spatial_dims=2, in_channels=1,
                    out_channels = self.model_args()["num_class"])
        
        return model 
        
    class MedNISTDataset(torch.utils.data.Dataset):
            def __init__(self, image_files, labels, transforms):
                self.image_files = image_files
                self.labels = labels
                self.transforms = transforms

            def __len__(self):
                return len(self.image_files)

            def __getitem__(self, index):
                return self.transforms(self.image_files[index]), self.labels[index]
    
    def parse_data(self, path):
        
        class_names = sorted(x for x in os.listdir(path)
                     if os.path.isdir(os.path.join(path, x)))
        num_class = len(class_names)
        image_files = [
                        [
                            os.path.join(path, class_names[i], x)
                            for x in os.listdir(os.path.join(path, class_names[i]))
                        ]
                        for i in range(num_class)
                      ]
        
        return image_files, num_class
    
    def training_data(self, batch_size = 48):
        self.image_files, num_class = self.parse_data(self.dataset_path)
        
        if self.model_args()["num_class"] != num_class:
                raise Exception('number of available classes does not match declared classes')
        
        num_each = [len(self.image_files[i]) for i in range(self.model_args()["num_class"])]
        image_files_list = []
        image_class = []
        
        for i in range(self.model_args()["num_class"]):
            image_files_list.extend(self.image_files[i])
            image_class.extend([i] * num_each[i])
        num_total = len(image_class)
        
        
        length = len(image_files_list)
        indices = np.arange(length)
        np.random.shuffle(indices)

        val_split = int(1. * length) 
        train_indices = indices[:val_split]

        train_x = [image_files_list[i] for i in train_indices]
        train_y = [image_class[i] for i in train_indices]


        train_transforms = Compose(
            [
                LoadImage(image_only=True),
                AddChannel(),
                ScaleIntensity(),
                RandRotate(range_x=np.pi / 12, prob=0.5, keep_size=True),
                RandFlip(spatial_axis=0, prob=0.5),
                RandZoom(min_zoom=0.9, max_zoom=1.1, prob=0.5),
                EnsureType(),
            ]
        )

        val_transforms = Compose(
            [LoadImage(image_only=True), AddChannel(), ScaleIntensity(), EnsureType()])

        y_pred_trans = Compose([EnsureType(), Activations(softmax=True)])
        y_trans = Compose([EnsureType(), AsDiscrete(to_onehot=num_class)])
                
        train_ds = self.MedNISTDataset(train_x, train_y, train_transforms)
        
        return DataManager(dataset=train_ds, batch_size=batch_size, shuffle=True)
    
    def training_step(self, data, target):
        output = self.model().forward(data)
        loss   = torch.nn.functional.cross_entropy(output, target)
        return loss
import os import numpy as np import torch import torch.nn as nn from fedbiomed.common.training_plans import TorchTrainingPlan from fedbiomed.common.data import DataManager from torchvision import datasets, transforms from monai.apps import download_and_extract from monai.config import print_config from monai.data import decollate_batch from monai.metrics import ROCAUCMetric from monai.networks.nets import DenseNet121 from monai.transforms import ( Activations, AddChannel, AsDiscrete, Compose, LoadImage, RandFlip, RandRotate, RandZoom, ScaleIntensity, EnsureType, ) from monai.utils import set_determinism # Here we define the training plan to be used. # You can use any class name (here 'MyTrainingPlan') class MyTrainingPlan(TorchTrainingPlan): # Declare dependencies def init_dependencies(self): deps = ["import numpy as np", "import os", "from monai.apps import download_and_extract", "from monai.config import print_config", "from monai.data import decollate_batch", "from monai.metrics import ROCAUCMetric", "from monai.networks.nets import DenseNet121", "from monai.transforms import ( Activations, AddChannel, AsDiscrete, Compose, LoadImage, RandFlip, RandRotate, RandZoom, ScaleIntensity, EnsureType, )", "from monai.utils import set_determinism"] return deps # Define and return model def init_model(self): model = DenseNet121(spatial_dims=2, in_channels=1, out_channels = self.model_args()["num_class"]) return model class MedNISTDataset(torch.utils.data.Dataset): def __init__(self, image_files, labels, transforms): self.image_files = image_files self.labels = labels self.transforms = transforms def __len__(self): return len(self.image_files) def __getitem__(self, index): return self.transforms(self.image_files[index]), self.labels[index] def parse_data(self, path): class_names = sorted(x for x in os.listdir(path) if os.path.isdir(os.path.join(path, x))) num_class = len(class_names) image_files = [ [ os.path.join(path, class_names[i], x) for x in os.listdir(os.path.join(path, class_names[i])) ] for i in range(num_class) ] return image_files, num_class def training_data(self, batch_size = 48): self.image_files, num_class = self.parse_data(self.dataset_path) if self.model_args()["num_class"] != num_class: raise Exception('number of available classes does not match declared classes') num_each = [len(self.image_files[i]) for i in range(self.model_args()["num_class"])] image_files_list = [] image_class = [] for i in range(self.model_args()["num_class"]): image_files_list.extend(self.image_files[i]) image_class.extend([i] * num_each[i]) num_total = len(image_class) length = len(image_files_list) indices = np.arange(length) np.random.shuffle(indices) val_split = int(1. * length) train_indices = indices[:val_split] train_x = [image_files_list[i] for i in train_indices] train_y = [image_class[i] for i in train_indices] train_transforms = Compose( [ LoadImage(image_only=True), AddChannel(), ScaleIntensity(), RandRotate(range_x=np.pi / 12, prob=0.5, keep_size=True), RandFlip(spatial_axis=0, prob=0.5), RandZoom(min_zoom=0.9, max_zoom=1.1, prob=0.5), EnsureType(), ] ) val_transforms = Compose( [LoadImage(image_only=True), AddChannel(), ScaleIntensity(), EnsureType()]) y_pred_trans = Compose([EnsureType(), Activations(softmax=True)]) y_trans = Compose([EnsureType(), AsDiscrete(to_onehot=num_class)]) train_ds = self.MedNISTDataset(train_x, train_y, train_transforms) return DataManager(dataset=train_ds, batch_size=batch_size, shuffle=True) def training_step(self, data, target): output = self.model().forward(data) loss = torch.nn.functional.cross_entropy(output, target) return loss 

We now set the model and training parameters. Note that we use only 1 epoch for this experiment, and perform the training on ~26% of the locally available training data.

In [ ]:

  Copied!     
 
model_args = {
    'num_class': 6,  
}

training_args = {
    'batch_size': 20, 
    'optimizer_args': {
        'lr': 1e-5
    }, 
    'epochs': 1, 
    'dry_run': False,  
    'batch_maxnum':250 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples
}
model_args = { 'num_class': 6, } training_args = { 'batch_size': 20, 'optimizer_args': { 'lr': 1e-5 }, 'epochs': 1, 'dry_run': False, 'batch_maxnum':250 # Fast pass for development : only use ( batch_maxnum * batch_size ) samples }

The experiment can be now defined, by providing the mednist tag, and running the local training on nodes with training plan defined in training_plan_path, standard aggregator (FedAvg) and client_selection_strategy (all nodes used). Federated learning is going to be perfomed through 3 optimization rounds.

WARNING:¶

For running this experiment, you need a computer with the following specifications:

more than 16 GB of RAM
2.5 GHz processor or higher, with at least 4 cores

If your computer specification are lower, you can reduce the number of data passed when training model (set batchnum from 250 to 25) and the number of rounds (from 3 to 1) but model performances may decrease dramatically

In [ ]:

  Copied!     
 
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage

tags =  ['#MEDNIST', '#dataset']
rounds = 3

exp = Experiment(tags=tags,
                 model_args=model_args,
                 training_plan_class=MyTrainingPlan,
                 training_args=training_args,
                 round_limit=rounds,
                 aggregator=FedAverage(),
                 node_selection_strategy=None
                )
from fedbiomed.researcher.experiment import Experiment from fedbiomed.researcher.aggregators.fedavg import FedAverage tags = ['#MEDNIST', '#dataset'] rounds = 3 exp = Experiment(tags=tags, model_args=model_args, training_plan_class=MyTrainingPlan, training_args=training_args, round_limit=rounds, aggregator=FedAverage(), node_selection_strategy=None )

Let's start the experiment.

By default, this function doesn't stop until all the round_limit rounds are done for all the clients

In [ ]:

  Copied!     
 
exp.run()
exp.run()

Testing¶

Once the federated model is obtained, it is possible to test it locally on an independent testing partition. The test dataset is available at this link:

https://drive.google.com/file/d/1YbwA0WitMoucoIa_Qao7IC1haPfDp-XD/

In [ ]:

  Copied!     
 
!pip install gdown
!pip install gdown

In [ ]:

  Copied!     
 
import os
import shutil
import tempfile
import PIL
import torch
import numpy as np
from sklearn.metrics import classification_report

from monai.config import print_config
from monai.data import decollate_batch
from monai.metrics import ROCAUCMetric
from monai.networks.nets import DenseNet121
import zipfile
from monai.transforms import (
    Activations,
    AddChannel,
    AsDiscrete,
    Compose,
    LoadImage,
    RandFlip,
    RandRotate,
    RandZoom,
    ScaleIntensity,
    EnsureType,
)
from monai.utils import set_determinism

print_config()
import os import shutil import tempfile import PIL import torch import numpy as np from sklearn.metrics import classification_report from monai.config import print_config from monai.data import decollate_batch from monai.metrics import ROCAUCMetric from monai.networks.nets import DenseNet121 import zipfile from monai.transforms import ( Activations, AddChannel, AsDiscrete, Compose, LoadImage, RandFlip, RandRotate, RandZoom, ScaleIntensity, EnsureType, ) from monai.utils import set_determinism print_config()

Download the testing dataset on the local temporary folder.

In [ ]:

  Copied!     
 
import gdown
import zipfile
import tempfile
import os
from fedbiomed.researcher.environ import environ

tmp_dir = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+os.sep)

resource = "https://drive.google.com/uc?id=1YbwA0WitMoucoIa_Qao7IC1haPfDp-XD"
base_dir = tmp_dir.name
test_file = os.path.join(base_dir, "MedNIST_testing.zip")

gdown.download(resource, test_file, quiet=False)

zf = zipfile.ZipFile(test_file)

for file in zf.infolist():
    zf.extract(file, base_dir)
    
data_dir = os.path.join(base_dir, "MedNIST_testing")
import gdown import zipfile import tempfile import os from fedbiomed.researcher.environ import environ tmp_dir = tempfile.TemporaryDirectory(dir=environ['TMP_DIR']+os.sep) resource = "https://drive.google.com/uc?id=1YbwA0WitMoucoIa_Qao7IC1haPfDp-XD" base_dir = tmp_dir.name test_file = os.path.join(base_dir, "MedNIST_testing.zip") gdown.download(resource, test_file, quiet=False) zf = zipfile.ZipFile(test_file) for file in zf.infolist(): zf.extract(file, base_dir) data_dir = os.path.join(base_dir, "MedNIST_testing")

Parse the data and create the testing data loader:

In [ ]:

  Copied!     
 
class_names = sorted(x for x in os.listdir(data_dir)
                     if os.path.isdir(os.path.join(data_dir, x)))
num_class = len(class_names)
image_files = [
    [
        os.path.join(data_dir, class_names[i], x)
        for x in os.listdir(os.path.join(data_dir, class_names[i]))
    ]
    for i in range(num_class)
]

num_each = [len(image_files[i]) for i in range(num_class)]
image_files_list = []

image_class = []
for i in range(num_class):
    image_files_list.extend(image_files[i])
    image_class.extend([i] * num_each[i])
num_total = len(image_class)
image_width, image_height = PIL.Image.open(image_files_list[0]).size

print(f"Total image count: {num_total}")
print(f"Image dimensions: {image_width} x {image_height}")
print(f"Label names: {class_names}")
print(f"Label counts: {num_each}")
class_names = sorted(x for x in os.listdir(data_dir) if os.path.isdir(os.path.join(data_dir, x))) num_class = len(class_names) image_files = [ [ os.path.join(data_dir, class_names[i], x) for x in os.listdir(os.path.join(data_dir, class_names[i])) ] for i in range(num_class) ] num_each = [len(image_files[i]) for i in range(num_class)] image_files_list = [] image_class = [] for i in range(num_class): image_files_list.extend(image_files[i]) image_class.extend([i] * num_each[i]) num_total = len(image_class) image_width, image_height = PIL.Image.open(image_files_list[0]).size print(f"Total image count: {num_total}") print(f"Image dimensions: {image_width} x {image_height}") print(f"Label names: {class_names}") print(f"Label counts: {num_each}")

In [ ]:

  Copied!     
 
length = len(image_files_list)
indices = np.arange(length)
np.random.shuffle(indices)


test_split = int(0.1 * length)
test_indices = indices[:test_split]

test_x = [image_files_list[i] for i in test_indices]
test_y = [image_class[i] for i in test_indices]

val_transforms = Compose(
    [LoadImage(image_only=True), AddChannel(), ScaleIntensity(), EnsureType()])

y_pred_trans = Compose([EnsureType(), Activations(softmax=True)])
y_trans = Compose([EnsureType(), AsDiscrete(to_onehot=num_class)])
length = len(image_files_list) indices = np.arange(length) np.random.shuffle(indices) test_split = int(0.1 * length) test_indices = indices[:test_split] test_x = [image_files_list[i] for i in test_indices] test_y = [image_class[i] for i in test_indices] val_transforms = Compose( [LoadImage(image_only=True), AddChannel(), ScaleIntensity(), EnsureType()]) y_pred_trans = Compose([EnsureType(), Activations(softmax=True)]) y_trans = Compose([EnsureType(), AsDiscrete(to_onehot=num_class)])

In [ ]:

  Copied!     
 
class MedNISTDataset(torch.utils.data.Dataset):
    def __init__(self, image_files, labels, transforms):
        self.image_files = image_files
        self.labels = labels
        self.transforms = transforms

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, index):
        return self.transforms(self.image_files[index]), self.labels[index]


test_ds = MedNISTDataset(test_x, test_y, val_transforms)
test_loader = torch.utils.data.DataLoader(
    test_ds, batch_size=300)
class MedNISTDataset(torch.utils.data.Dataset): def __init__(self, image_files, labels, transforms): self.image_files = image_files self.labels = labels self.transforms = transforms def __len__(self): return len(self.image_files) def __getitem__(self, index): return self.transforms(self.image_files[index]), self.labels[index] test_ds = MedNISTDataset(test_x, test_y, val_transforms) test_loader = torch.utils.data.DataLoader( test_ds, batch_size=300)

Define testing metric:

In [ ]:

  Copied!     
 
auc_metric = ROCAUCMetric()
auc_metric = ROCAUCMetric()

To test the federated model we need to create a model instance and assign to it the model parameters estimated at the last federated optimization round.

In [ ]:

  Copied!     
 
model = exp.training_plan().model()
model.load_state_dict(exp.aggregated_params()[rounds - 1]['params'])
model = exp.training_plan().model() model.load_state_dict(exp.aggregated_params()[rounds - 1]['params'])

Compute the testing performance:

In [ ]:

  Copied!     
 
y_true = []
y_pred = []
with torch.no_grad():
    for test_data in test_loader:
        test_images, test_labels = (
            test_data[0],
            test_data[1],
        )
        pred = model(test_images).argmax(dim=1)
        for i in range(len(pred)):
            y_true.append(test_labels[i].item())
            y_pred.append(pred[i].item())
y_true = [] y_pred = [] with torch.no_grad(): for test_data in test_loader: test_images, test_labels = ( test_data[0], test_data[1], ) pred = model(test_images).argmax(dim=1) for i in range(len(pred)): y_true.append(test_labels[i].item()) y_pred.append(pred[i].item()) 

In [ ]:

  Copied!     
 
print(classification_report(
    y_true, y_pred, target_names=class_names, digits=4))
print(classification_report( y_true, y_pred, target_names=class_names, digits=4))

In spite of the relatively small training performed on the data shared in the 3 nodes, the performance of the federated model seems pretty good. Well done!

Download Notebook