fedbiomed.common.data

Module: fedbiomed.common.data

Classes that simplify imports from fedbiomed.common.data

Classes

DataLoadingBlock

CLASS

DataLoadingBlock()

Bases: ABC

The building blocks of a DataLoadingPlan.

A DataLoadingBlock describes an intermediary layer between the researcher and the node's filesystem. It allows the node to specify a customization in the way data is "perceived" by the data loaders during training.

A DataLoadingBlock is identified by its type_id attribute. Thus, this attribute should be unique among all DataLoadingBlockTypes in the same DataLoadingPlan. Moreover, we may test equality between a DataLoadingBlock and a string by checking its type_id, as a means of easily testing whether a DataLoadingBlock is contained in a collection.

Correct usage of this class requires creating ad-hoc subclasses. The DataLoadingBlock class is not intended to be instantiated directly.

Subclasses of DataLoadingBlock must respect the following conditions:

implement a default constructor
the implemented constructor must call super().__init__()
extend the serialize(self) and the deserialize(self, load_from: dict) functions
both serialize and deserialize must call super's serialize and deserialize respectively
the deserialize function must always return self
the serialize function must update the dict returned by super's serialize
implement an apply function that takes arbitrary arguments and applies the logic of the loading_block
update the _validation_scheme to define rules for all new fields returned by the serialize function

Attributes:

Name	Type	Description
`__serialization_id`		(str) identifies one serialized instance of the DataLoadingBlock

Source code in fedbiomed/common/data/_data_loading_plan.py

def __init__(self):
    self.__serialization_id = 'serialized_dlb_' + str(uuid.uuid4())
    self._serialization_validator = SerializationValidation()
    self._serialization_validator.update_validation_scheme(SerializationValidation.dlb_default_scheme())

Functions

apply(args, kwargs)

abstractmethod

Abstract method representing an application of the DataLoadingBlock

Source code in fedbiomed/common/data/_data_loading_plan.py@abstractmethod
def apply(self, *args, **kwargs):
    """Abstract method representing an application of the DataLoadingBlock
    """
    pass

deserialize(load_from)

Reconstruct the DataLoadingBlock from a serialized version.

Parameters:

Name	Type	Description	Default
`load_from`	`dict`	a dictionary as obtained by the serialize function.	required

Returns:

Type	Description
`TDataLoadingBlock`	the self instance

Source code in fedbiomed/common/data/_data_loading_plan.pydef deserialize(self, load_from: dict) -> TDataLoadingBlock:
    """Reconstruct the DataLoadingBlock from a serialized version.

    Args:
        load_from (dict): a dictionary as obtained by the serialize function.
    Returns:
        the self instance
    """
    self._serialization_validator.validate(load_from, FedbiomedLoadingBlockValueError)
    self.__serialization_id = load_from['dlb_id']
    return self

get_serialization_id()

Expose serialization id as read-only

Source code in fedbiomed/common/data/_data_loading_plan.pydef get_serialization_id(self):
    """Expose serialization id as read-only"""
    return self.__serialization_id

instantiate_class(loading_block)

staticmethod

Instantiate one DataLoadingBlock object of the type defined in the arguments.

Uses the loading_block_module and loading_block_class fields of the loading_block argument to identify the type of DataLoadingBlock to be instantiated, then calls its default constructor. Note that this function does not call deserialize.

Parameters:

Name	Type	Description	Default
`loading_block`	`dict`	DataLoadingBlock metadata in the format returned by the serialize function.	required

Returns:

Type	Description
`TDataLoadingBlock`	A default-constructed instance of a DataLoadingBlock of the type defined in the metadata.

Raises:

Type	Description
`FedbiomedLoadingBlockError`	if the instantiation process raised any exception.

Source code in fedbiomed/common/data/_data_loading_plan.py@staticmethod
def instantiate_class(loading_block: dict) -> TDataLoadingBlock:
    """Instantiate one [DataLoadingBlock][fedbiomed.common.data._data_loading_plan.DataLoadingBlock]
    object of the type defined in the arguments.

    Uses the `loading_block_module` and `loading_block_class` fields of the loading_block argument to
    identify the type of [DataLoadingBlock][fedbiomed.common.data._data_loading_plan.DataLoadingBlock]
    to be instantiated, then calls its default constructor.
    Note that this function **does not call deserialize**.

    Args:
        loading_block (dict): [DataLoadingBlock][fedbiomed.common.data._data_loading_plan.DataLoadingBlock]
            metadata in the format returned by the serialize function.
    Returns:
        A default-constructed instance of a
            [DataLoadingBlock][fedbiomed.common.data._data_loading_plan.DataLoadingBlock]
            of the type defined in the metadata.
    Raises:
       FedbiomedLoadingBlockError: if the instantiation process raised any exception.
    """
    try:
        dlb_module = import_module(loading_block['loading_block_module'])
        dlb = eval(f"dlb_module.{loading_block['loading_block_class']}()")
    except Exception as e:
        msg = f"{ErrorNumbers.FB614.value}: could not instantiate DataLoadingBlock from the following metadata: " +\
              f"{loading_block} because of {type(e).__name__}: {e}"
        logger.debug(msg)
        raise FedbiomedLoadingBlockError(msg)
    return dlb

instantiate_key(key_module, key_classname, loading_block_key_str)

staticmethod

Imports and loads DataLoadingBlockTypes regarding the passed arguments

Parameters:

Name	Type	Description	Default
`key_module`	`str`	description	required
`key_classname`	`str`	description	required
`loading_block_key_str`	`str`	description	required

Raises:

Type	Description
`FedbiomedDataLoadingPlanError`	description

Returns:

Name	Type	Description
`DataLoadingBlockTypes`	`DataLoadingBlockTypes`	description

Source code in fedbiomed/common/data/_data_loading_plan.py@staticmethod
def instantiate_key(key_module: str, key_classname: str, loading_block_key_str: str) -> DataLoadingBlockTypes:
    """Imports and loads [DataLoadingBlockTypes][fedbiomed.common.constants.DataLoadingBlockTypes]
    regarding the passed arguments

    Args:
        key_module (str): _description_
        key_classname (str): _description_
        loading_block_key_str (str): _description_

    Raises:
        FedbiomedDataLoadingPlanError: _description_

    Returns:
        DataLoadingBlockTypes: _description_
    """
    try:
        keys = import_module(key_module)
        loading_block_key = eval(f"keys.{key_classname}('{loading_block_key_str}')")
    except Exception as e:
        msg = f"{ErrorNumbers.FB615.value} Error deserializing loading block key " + \
              f"{loading_block_key_str} with path {key_module}.{key_classname} " + \
              f"because of {type(e).__name__}: {e}"
        logger.debug(msg)
        raise FedbiomedDataLoadingPlanError(msg)
    return loading_block_key

serialize()

Serializes the class in a format similar to json.

Returns:

Type	Description
`dict`	a dictionary of key-value pairs sufficient for reconstructing
`dict`	the DataLoadingBlock.

Source code in fedbiomed/common/data/_data_loading_plan.pydef serialize(self) -> dict:
    """Serializes the class in a format similar to json.

    Returns:
        a dictionary of key-value pairs sufficient for reconstructing
        the DataLoadingBlock.
    """
    return dict(
        loading_block_class=self.__class__.__qualname__,
        loading_block_module=self.__module__,
        dlb_id=self.__serialization_id
    )

DataLoadingPlan

CLASS

DataLoadingPlan(args, kwargs)

Bases: Dict[DataLoadingBlockTypes, DataLoadingBlock]

Customizations to the way the data is loaded and presented for training.

A DataLoadingPlan is a dictionary of {name: DataLoadingBlock} pairs. Each DataLoadingBlock represents a customization to the way data is loaded and presented to the researcher. These customizations are defined by the node, but they operate on a Dataset class, which is defined by the library and instantiated by the researcher.

To exploit this functionality, a Dataset must be modified to accept the customizations provided by the DataLoadingPlan. To simplify this process, we provide the DataLoadingPlanMixin class below.

The DataLoadingPlan class should be instantiated directly, no subclassing is needed. The DataLoadingPlan is a dict, and exposes the same interface as a dict.

Attributes:

Name	Type	Description
`dlp_id`		str representing a unique plan id (auto-generated)
`desc`		str representing an optional user-friendly short description
`target_dataset_type`		a DatasetTypes enum representing the type of dataset targeted by this DataLoadingPlan

Source code in fedbiomed/common/data/_data_loading_plan.py

def __init__(self, *args, **kwargs):
    super(DataLoadingPlan, self).__init__(*args, **kwargs)
    self.dlp_id = 'dlp_' + str(uuid.uuid4())
    self.desc = ""
    self.target_dataset_type = DatasetTypes.NONE
    self._serialization_validation = SerializationValidation()
    self._serialization_validation.update_validation_scheme(SerializationValidation.dlp_default_scheme())

Attributes

desc `instance-attribute`

desc = ''

dlp_id `instance-attribute`

dlp_id = 'dlp_' + str(uuid.uuid4())

target_dataset_type `instance-attribute`

target_dataset_type = DatasetTypes.NONE

Functions

deserialize(serialized_dlp, serialized_loading_blocks)

Reconstruct the DataLoadingPlan][fedbiomed.common.data._data_loading_plan.DataLoadingPlan] from a serialized version.

Calling this function will clear the contained [DataLoadingBlockTypes].

This function may not be used to "update" nor to "append to" a DataLoadingPlan.

Parameters:

Name	Type	Description	Default
`serialized_dlp`	`dict`	a dictionary of data loading plan metadata, as obtained from the first output of the serialize function	required
`serialized_loading_blocks`	`List[dict]`	a list of dictionaries of loading_block metadata, as obtained from the second output of the serialize function	required

Returns:

Type	Description
`TDataLoadingPlan`	the self instance

Source code in fedbiomed/common/data/_data_loading_plan.pydef deserialize(self, serialized_dlp: dict, serialized_loading_blocks: List[dict]) -> TDataLoadingPlan:
    """Reconstruct the DataLoadingPlan][fedbiomed.common.data._data_loading_plan.DataLoadingPlan] from a serialized version.

    !!! warning "Calling this function will *clear* the contained [DataLoadingBlockTypes]."
        This function may not be used to "update" nor to "append to"
        a [DataLoadingPlan][fedbiomed.common.data._data_loading_plan.DataLoadingPlan].

    Args:
        serialized_dlp: a dictionary of data loading plan metadata, as obtained from the first output of the
            serialize function
        serialized_loading_blocks: a list of dictionaries of loading_block metadata, as obtained from the
            second output of the serialize function
    Returns:
        the self instance
    """
    self._serialization_validation.validate(serialized_dlp, FedbiomedDataLoadingPlanValueError)

    self.clear()
    self.dlp_id = serialized_dlp['dlp_id']
    self.desc = serialized_dlp['dlp_name']
    self.target_dataset_type = DatasetTypes(serialized_dlp['target_dataset_type'])
    for loading_block_key_str, dlb_id in serialized_dlp['loading_blocks'].items():
        key_module, key_classname = serialized_dlp['key_paths'][loading_block_key_str]
        loading_block_key = DataLoadingBlock.instantiate_key(key_module, key_classname, loading_block_key_str)
        loading_block = next(filter(lambda x: x['dlb_id'] == dlb_id,
                                    serialized_loading_blocks))
        dlb = DataLoadingBlock.instantiate_class(loading_block)
        self[loading_block_key] = dlb.deserialize(loading_block)
    return self

infer_dataset_type(dataset)

staticmethod

Infer the type of a given dataset.

This function provides the mapping between a dataset's class and the DatasetTypes enum. If the dataset exposes the correct interface (i.e. the get_dataset_type method) then it directly calls that, otherwise it tries to apply some heuristics to guess the type of dataset.

Parameters:

Name	Type	Description	Default
`dataset`	`Any`	the dataset whose type we want to infer.	required

Returns:

Type	Description
`DatasetTypes`	a DatasetTypes enum element which identifies the type of the dataset.

Raises:

Type	Description
`FedbiomedDataLoadingPlanValueError`	if the dataset does not have a `get_dataset_type` method and moreover the type could not be guessed.

Source code in fedbiomed/common/data/_data_loading_plan.py@staticmethod
def infer_dataset_type(dataset: Any) -> DatasetTypes:
    """Infer the type of a given dataset.

    This function provides the mapping between a dataset's class and the DatasetTypes enum. If the dataset exposes
    the correct interface (i.e. the get_dataset_type method) then it directly calls that, otherwise it tries to
    apply some heuristics to guess the type of dataset.

    Args:
        dataset: the dataset whose type we want to infer.
    Returns:
        a DatasetTypes enum element which identifies the type of the dataset.
    Raises:
        FedbiomedDataLoadingPlanValueError: if the dataset does not have a `get_dataset_type` method and moreover
            the type could not be guessed.
    """
    if hasattr(dataset, 'get_dataset_type'):
        return dataset.get_dataset_type()
    elif dataset.__class__.__name__ == 'ImageFolder':
        # ImageFolder could be both an images type or mednist. Try to identify mednist with some heuristic.
        if hasattr(dataset, 'classes') and \
                all([x in dataset.classes for x in ['AbdomenCT', 'BreastMRI', 'CXR', 'ChestCT', 'Hand', 'HeadCT']]):
            return DatasetTypes.MEDNIST
        else:
            return DatasetTypes.IMAGES
    elif dataset.__class__.__name__ == 'MNIST':
        return DatasetTypes.DEFAULT
    msg = f"{ErrorNumbers.FB615.value} Trying to infer dataset type of {dataset} is not supported " + \
        f"for datasets of type {dataset.__class__.__qualname__}"
    logger.debug(msg)
    raise FedbiomedDataLoadingPlanValueError(msg)

serialize()

Serializes the class in a format similar to json.

Returns:

Type	Description
`Tuple[dict, List]`	a tuple sufficient for reconstructing the DataLoading plan. It includes: - a dictionary of key-value pairs with the DataLoadingPlan parameters. - a list of dict containing the data for reconstruction all the DataLoadingBlock of the DataLoadingPlan

Source code in fedbiomed/common/data/_data_loading_plan.pydef serialize(self) -> Tuple[dict, List]:
    """Serializes the class in a format similar to json.

    Returns:
        a tuple sufficient for reconstructing the DataLoading plan. It includes:
            - a dictionary of key-value pairs with the
            [DataLoadingPlan][fedbiomed.common.data._data_loading_plan.DataLoadingPlan] parameters.
            - a list of dict containing the data for reconstruction all the DataLoadingBlock
                of the [DataLoadingPlan][fedbiomed.common.data._data_loading_plan.DataLoadingPlan] 
    """
    return dict(
        dlp_id=self.dlp_id,
        dlp_name=self.desc,
        target_dataset_type=self.target_dataset_type.value,
        loading_blocks={key.value: dlb.get_serialization_id() for key, dlb in self.items()},
        key_paths={key.value: (f"{key.__module__}", f"{key.__class__.__qualname__}") for key in self.keys()}
    ), [dlb.serialize() for dlb in self.values()]

DataLoadingPlanMixin

CLASS

DataLoadingPlanMixin()

Utility class to enable DLP functionality in a dataset.

Any Dataset class that inherits from [DataLoadingPlanMixin] will have the basic tools necessary to support a DataLoadingPlan. Typically, the logic of each specific DataLoadingBlock in the DataLoadingPlan will be implemented in the form of hooks that are called within the Dataset's implementation using the helper function apply_dlb defined below.

Source code in fedbiomed/common/data/_data_loading_plan.py

def __init__(self):
    self._dlp = None

Functions

apply_dlb(default_ret_value, dlb_key, args, kwargs)

Apply one DataLoadingBlock identified by its key.

Note that we want to easily support the case where the DataLoadingPlan is not activated, or the requested loading block is not contained in the DataLoadingPlan. This is achieved by providing a default return value to be returned when the above conditions are met. Hence, most of the calls to apply_dlb will look like this:

value = self.apply_dlb(value, 'my-loading-block', my_apply_args)

This will ensure that value is not changed if the DataLoadingPlan is not active.

Parameters:

Name	Type	Description	Default
`default_ret_value`	`Any`	the value to be returned in case that the dlp functionality is not required	required
`dlb_key`	`DataLoadingBlockTypes`	the key of the DataLoadingBlock to be applied	required
`*args`	`Optional[Any]`	forwarded to the DataLoadingBlock's apply function	`()`
`**kwargs`	`Optional[Any]`	forwarded to the DataLoadingBlock's apply function	`{}`

Returns:

Type	Description
`Any`	the output of the DataLoadingBlock's apply function, or the default_ret_value when dlp is None or it does not contain the requested loading block

Source code in fedbiomed/common/data/_data_loading_plan.pydef apply_dlb(self, default_ret_value: Any, dlb_key: DataLoadingBlockTypes,
              *args: Optional[Any], **kwargs: Optional[Any]) -> Any:
    """Apply one DataLoadingBlock identified by its key.

    Note that we want to easily support the case where the DataLoadingPlan
    is not activated, or the requested loading block is not contained in the
    DataLoadingPlan. This is achieved by providing a default return value
    to be returned when the above conditions are met. Hence, most of the
    calls to apply_dlb will look like this:
    ```
    value = self.apply_dlb(value, 'my-loading-block', my_apply_args)
    ```
    This will ensure that value is not changed if the DataLoadingPlan is
    not active.

    Args:
        default_ret_value: the value to be returned in case that the dlp
            functionality is not required
        dlb_key: the key of the DataLoadingBlock to be applied
        *args: forwarded to the DataLoadingBlock's apply function
        **kwargs: forwarded to the DataLoadingBlock's apply function
    Returns:
        the output of the DataLoadingBlock's apply function, or
            the default_ret_value when dlp is None or it does not contain
            the requested loading block
    """
    if not isinstance(dlb_key, DataLoadingBlockTypes):
        raise FedbiomedDataLoadingPlanValueError(f"Key {dlb_key} is not of enum type DataLoadingBlockTypes"
                                                 f" in DataLoadingPlanMixin.apply_dlb")
    if self._dlp is not None and dlb_key in self._dlp:
        return self._dlp[dlb_key].apply(*args, **kwargs)
    else:
        return default_ret_value

clear_dlp()

Source code in fedbiomed/common/data/_data_loading_plan.pydef clear_dlp(self):
    self._dlp = None

set_dlp(dlp)

Sets the dlp if the target dataset type is appropriate

Source code in fedbiomed/common/data/_data_loading_plan.pydef set_dlp(self, dlp: DataLoadingPlan):
    """Sets the dlp if the target dataset type is appropriate"""
    if not isinstance(dlp, DataLoadingPlan):
        msg = f"{ErrorNumbers.FB615.value} Trying to set a DataLoadingPlan but the argument is of type " + \
              f"{type(dlp).__name__}"
        logger.debug(msg)
        raise FedbiomedDataLoadingPlanValueError(msg)

    dataset_type = DataLoadingPlan.infer_dataset_type(self)  # `self` here will refer to the Dataset instance
    if dlp.target_dataset_type != DatasetTypes.NONE and dataset_type != dlp.target_dataset_type:
        raise FedbiomedDataLoadingPlanValueError(f"Trying to set {dlp} on dataset of type {dataset_type.value} but "
                                                 f"the target type is {dlp.target_dataset_type}")
    elif dlp.target_dataset_type == DatasetTypes.NONE:
        dlp.target_dataset_type = dataset_type
    self._dlp = dlp

DataManager

CLASS

DataManager(dataset, target=None, kwargs)

Bases: object

Factory class that build different data loader/datasets based on the type of dataset. The argument dataset should be provided as torch.utils.data.Dataset object for to be used in PyTorch training.

Parameters:

Name	Type	Description	Default
`dataset`	`Union[np.ndarray, pd.DataFrame, pd.Series, Dataset]`	Dataset object. It can be an instance, PyTorch Dataset or Tuple.	required
`target`	`Union[np.ndarray, pd.DataFrame, pd.Series]`	Target variable or variables.	`None`
`**kwargs`	`dict`	Additional parameters that are going to be used for data loader	`{}`

Source code in fedbiomed/common/data/_data_manager.py

def __init__(self,
             dataset: Union[np.ndarray, pd.DataFrame, pd.Series, Dataset],
             target: Union[np.ndarray, pd.DataFrame, pd.Series] = None,
             **kwargs: dict) -> None:

    """Constructor of DataManager,

    Args:
        dataset: Dataset object. It can be an instance, PyTorch Dataset or Tuple.
        target: Target variable or variables.
        **kwargs: Additional parameters that are going to be used for data loader
    """

    # TODO: Improve datamanager for auto loading by given dataset_path and other information
    # such as inputs variable indexes and target variables indexes

    self._dataset = dataset
    self._target = target
    self._loader_arguments = kwargs
    self._data_manager_instance = None

Functions

load(tp_type)

Loads proper DataManager based on given TrainingPlan and dataset, target attributes.

Parameters:

Name	Type	Description	Default
`tp_type`	`TrainingPlans`	Enumeration instance of TrainingPlans that stands for type of training plan.	required

Raises:

Type	Description
`FedbiomedDataManagerError`	If requested DataManager does not match with given arguments.

Source code in fedbiomed/common/data/_data_manager.pydef load(self, tp_type: TrainingPlans):
    """Loads proper DataManager based on given TrainingPlan and
    `dataset`, `target` attributes.

    Args:
        tp_type: Enumeration instance of TrainingPlans that stands for type of training plan.

    Raises:
        FedbiomedDataManagerError: If requested DataManager does not match with given arguments.

    """

    # Training plan is type of TorcTrainingPlan
    if tp_type == TrainingPlans.TorchTrainingPlan:
        if self._target is None and isinstance(self._dataset, Dataset):
            # Create Dataset for pytorch
            self._data_manager_instance = TorchDataManager(dataset=self._dataset, **self._loader_arguments)
        elif isinstance(self._dataset, (pd.DataFrame, pd.Series, np.ndarray)) and \
                isinstance(self._target, (pd.DataFrame, pd.Series, np.ndarray)):
            # If `dataset` and `target` attributes are array-like object
            # create TabularDataset object to instantiate a TorchDataManager
            torch_dataset = TabularDataset(inputs=self._dataset, target=self._target)
            self._data_manager_instance = TorchDataManager(dataset=torch_dataset, **self._loader_arguments)
        else:
            raise FedbiomedDataManagerError(f"{ErrorNumbers.FB607.value}: Invalid arguments for torch based "
                                            f"training plan, either provide the argument  `dataset` as PyTorch "
                                            f"Dataset instance, or provide `dataset` and `target` arguments as "
                                            f"an instance one of pd.DataFrame, pd.Series or np.ndarray ")

    elif tp_type == TrainingPlans.SkLearnTrainingPlan:
        # Try to convert `torch.utils.Data.Dataset` to SkLearnBased dataset/datamanager
        if self._target is None and isinstance(self._dataset, Dataset):
            torch_data_manager = TorchDataManager(dataset=self._dataset)
            try:
                self._data_manager_instance = torch_data_manager.to_sklearn()
            except Exception as e:
                raise FedbiomedDataManagerError(f"{ErrorNumbers.FB607.value}: PyTorch based `Dataset` object "
                                                "has been instantiated with DataManager. An error occurred while"
                                                "trying to convert torch.utils.data.Dataset to numpy based "
                                                f"dataset: {str(e)}")

        # For scikit-learn based training plans, the arguments `dataset` and `target` should be an instance
        # one of `pd.DataFrame`, `pd.Series`, `np.ndarray`
        elif isinstance(self._dataset, (pd.DataFrame, pd.Series, np.ndarray)) and \
                isinstance(self._target, (pd.DataFrame, pd.Series, np.ndarray)):
            # Create Dataset for SkLearn training plans
            self._data_manager_instance = SkLearnDataManager(inputs=self._dataset, target=self._target,
                                                             **self._loader_arguments)
        else:
            raise FedbiomedDataManagerError(f"{ErrorNumbers.FB607.value}: The argument `dataset` and `target` "
                                            f"should be instance of pd.DataFrame, pd.Series or np.ndarray ")
    else:
        raise FedbiomedDataManagerError(f"{ErrorNumbers.FB607.value}: Undefined training plan")

FlambyDataset

CLASS

FlambyDataset()

Bases: DataLoadingPlanMixin, Dataset

A federated Flamby dataset.

A FlambyDataset is a wrapper around a flamby FedClass instance, adding functionalities and interfaces that are specific to Fed-BioMed.

A FlambyDataset is always created in an empty state, and it requires a DataLoadingPlan to be finalized to a correct state. The DataLoadingPlan must contain at least the following DataLoadinBlock key-value pair: - FlambyLoadingBlockTypes.FLAMBY_DATASET_METADATA : FlambyDatasetMetadataBlock

The lifecycle of the DataLoadingPlan and the wrapped FedClass are tightly interlinked: when the DataLoadingPlan is set, the wrapped FedClass is initialized and instantiated. When the DataLoadingPlan is cleared, the wrapped FedClass is also cleared. Hence, an invariant of this class is that the self._dlp and self.__flamby_fed_class should always be either both None, or both set to some value.

Attributes:

Name	Type	Description
`_transform`		a transform function of type MonaiTransform or TorchTransform that will be applied to every sample when data is loaded.
`__flamby_fed_class`		a private instance of the wrapped Flamby FedClass

Source code in fedbiomed/common/data/_flamby_dataset.py

def __init__(self):
    super().__init__()
    self.__flamby_fed_class = None
    self._transform = None

Functions

clear_dlp()

Clears dlp and automatically clears the FedClass

Tries to guarantee some semblance of integrity by also clearing the FedClass, since setting the dlp initializes it.

Source code in fedbiomed/common/data/_flamby_dataset.pydef clear_dlp(self):
    """Clears dlp and automatically clears the FedClass

    Tries to guarantee some semblance of integrity by also clearing the FedClass, since setting the dlp
    initializes it.
    """
    super().clear_dlp()
    self._clear()

get_center_id()

Returns the center id. Requires that the DataLoadingPlan has already been set.

Returns:

Type	Description
`int`	the center id (int).

Raises:

Type	Description
`FedbiomedDatasetError`	in one of the two scenarios below - if the data loading plan is not set or is malformed. - if the wrapped FedClass is not initialized but the dlp exists

Source code in fedbiomed/common/data/_flamby_dataset.py@_check_fed_class_initialization_status(require_initialized=True,
                                        require_uninitialized=False,
                                        message="Flamby dataset is in an inconsistent state: a Data Loading Plan "
                                                "is set but the wrapped FedClass was not initialized.")
@_requires_dlp
def get_center_id(self) -> int:
    """Returns the center id. Requires that the DataLoadingPlan has already been set.

    Returns:
        the center id (int).
    Raises:
        FedbiomedDatasetError: in one of the two scenarios below
            - if the data loading plan is not set or is malformed.
            - if the wrapped FedClass is not initialized but the dlp exists
    """
    return self.apply_dlb(None, FlambyLoadingBlockTypes.FLAMBY_DATASET_METADATA)['flamby_center_id']

get_dataset_type()

staticmethod

Returns the Flamby DatasetType

Source code in fedbiomed/common/data/_flamby_dataset.py@staticmethod
def get_dataset_type() -> DatasetTypes:
    """Returns the Flamby DatasetType"""
    return DatasetTypes.FLAMBY

get_flamby_fed_class()

Returns the instance of the wrapped Flamby FedClass

Source code in fedbiomed/common/data/_flamby_dataset.pydef get_flamby_fed_class(self):
    """Returns the instance of the wrapped Flamby FedClass"""
    return self.__flamby_fed_class

get_transform()

Gets the transform attribute

Source code in fedbiomed/common/data/_flamby_dataset.pydef get_transform(self):
    """Gets the transform attribute"""
    return self._transform

init_transform(transform)

Initializes the transform attribute. Must be called before initialization of the wrapped FedClass.

Parameters:

Name	Type	Description	Default
`transform`	`Union[MonaiCompose, TorchCompose]`	a composed transform of type torchvision.transforms.Compose or monai.transforms.Compose	required

Raises:

Type	Description
`FedbiomedDatasetError`	if the wrapped FedClass was already initialized.
`FedbiomedDatasetValueError`	if the input is not of the correct type.

Source code in fedbiomed/common/data/_flamby_dataset.py@_check_fed_class_initialization_status(require_initialized=False,
                                        require_uninitialized=True,
                                        message="Calling init_transform is not allowed if the wrapped FedClass "
                                                "has already been initialized. At your own risk, you may call "
                                                "clear_dlp to reset the full FlambyDataset")
def init_transform(self, transform: Union[MonaiCompose, TorchCompose]) -> Union[MonaiCompose, TorchCompose]:
    """Initializes the transform attribute. Must be called before initialization of the wrapped FedClass.

    Arguments:
        transform: a composed transform of type torchvision.transforms.Compose or monai.transforms.Compose

    Raises:
        FedbiomedDatasetError: if the wrapped FedClass was already initialized.
        FedbiomedDatasetValueError: if the input is not of the correct type.
    """
    if not isinstance(transform, (MonaiCompose, TorchCompose)):
        msg = f"{ErrorNumbers.FB618.value}. FlambyDataset transform must be of type " \
              f"torchvision.transforms.Compose or monai.transforms.Compose"
        logger.critical(msg)
        raise FedbiomedDatasetValueError(msg)

    self._transform = transform
    return self._transform

set_dlp(dlp)

Sets the Data Loading Plan and ensures that the flamby_fed_class is initialized

Overrides the set_dlp function from the DataLoadingPlanMixin to make sure that self._init_flamby_fed_class is also called immediately after.

Source code in fedbiomed/common/data/_flamby_dataset.pydef set_dlp(self, dlp):
    """Sets the Data Loading Plan and ensures that the flamby_fed_class is initialized

    Overrides the set_dlp function from the DataLoadingPlanMixin to make sure that self._init_flamby_fed_class
    is also called immediately after.
    """
    super().set_dlp(dlp)
    try:
        self._init_flamby_fed_class()
    except FedbiomedDatasetError as e:
        # clean up
        super().clear_dlp()
        raise FedbiomedDatasetError from e

shape()

Returns the shape of the flamby_fed_class

Source code in fedbiomed/common/data/_flamby_dataset.py@_check_fed_class_initialization_status(require_initialized=True,
                                        require_uninitialized=False,
                                        message="Cannot compute shape because FedClass was not initialized.")
def shape(self) -> List[int]:
    """Returns the shape of the flamby_fed_class"""
    return [len(self)] + list(self.__getitem__(0)[0].shape)

FlambyDatasetMetadataBlock

CLASS

FlambyDatasetMetadataBlock()

Bases: DataLoadingBlock

Metadata about a Flamby Dataset.

Includes information on: - identity of the type of flamby dataset (e.g. fed_ixi, fed_heart, etc...) - the ID of the center of the flamby dataset

Source code in fedbiomed/common/data/_flamby_dataset.py

def __init__(self):
    super().__init__()
    self.metadata = {
        "flamby_dataset_name": None,
        "flamby_center_id": None
    }
    self._serialization_validator.update_validation_scheme(
        FlambyDatasetMetadataBlock._extra_validation_scheme())

Attributes

metadata `instance-attribute`

metadata = {
    "flamby_dataset_name": None,
    "flamby_center_id": None,
}

Functions

apply()

Returns a dictionary of dataset metadata.

The metadata dictionary contains: - flamby_dataset_name: (str) the name of the selected flamby dataset. - flamby_center_id: (int) the center id selected at dataset add time.

Note that the flamby_dataset_name will be the same as the module name required to instantiate the FedClass. However, it will not contain the full module path, hence to properly import this module it must be prepended with flamby.datasets, for example import flamby.datasets.flamby_dataset_name

Returns:

Type	Description
`dict`	this data loading block's metadata

Source code in fedbiomed/common/data/_flamby_dataset.pydef apply(self) -> dict:
    """Returns a dictionary of dataset metadata.

    The metadata dictionary contains:
    - flamby_dataset_name: (str) the name of the selected flamby dataset.
    - flamby_center_id: (int) the center id selected at dataset add time.

    Note that the flamby_dataset_name will be the same as the module name required to instantiate the FedClass.
    However, it will not contain the full module path, hence to properly import this module it must be
    prepended with `flamby.datasets`, for example `import flamby.datasets.flamby_dataset_name`

    Returns:
        this data loading block's metadata
    """
    if any([v is None for v in self.metadata.values()]):
        msg = f"{ErrorNumbers.FB316}. Attempting to read Flamby dataset metadata, but " \
              f"the {[k for k,v in self.metadata.items() if v is None]} keys were not previously set."
        logger.critical(msg)
        raise FedbiomedLoadingBlockError(msg)
    return self.metadata

deserialize(load_from)

Reconstruct the DataLoadingBlock from a serialized version.

Parameters:

Name	Type	Description	Default
`load_from`	`dict`	a dictionary as obtained by the serialize function.	required

Returns:

Type	Description
`DataLoadingBlock`	the self instance

Source code in fedbiomed/common/data/_flamby_dataset.pydef deserialize(self, load_from: dict) -> DataLoadingBlock:
    """Reconstruct the DataLoadingBlock from a serialized version.

    Args:
        load_from: a dictionary as obtained by the serialize function.
    Returns:
        the self instance
    """
    super().deserialize(load_from)
    self.metadata['flamby_dataset_name'] = load_from['flamby_dataset_name']
    self.metadata['flamby_center_id'] = load_from['flamby_center_id']
    return self

serialize()

Serializes the class in a format similar to json.

Returns:

Type	Description
`dict`	a dictionary of key-value pairs sufficient for reconstructing
`dict`	the DataLoadingBlock.

Source code in fedbiomed/common/data/_flamby_dataset.pydef serialize(self) -> dict:
    """Serializes the class in a format similar to json.

    Returns:
         a dictionary of key-value pairs sufficient for reconstructing
         the DataLoadingBlock.
    """
    ret = super().serialize()
    ret.update({'flamby_dataset_name': self.metadata['flamby_dataset_name'],
                'flamby_center_id': self.metadata['flamby_center_id']
                })
    return ret

FlambyLoadingBlockTypes

Bases: DataLoadingBlockTypes, Enum

Additional DataLoadingBlockTypes specific to Flamby data

Attributes

FLAMBY_DATASET_METADATA `class-attribute`

FLAMBY_DATASET_METADATA: str = 'flamby_dataset_metadata'

MapperBlock

CLASS

MapperBlock()

Bases: DataLoadingBlock

A DataLoadingBlock for mapping values.

This DataLoadingBlock can be used whenever an "indirect mapping" is needed. For example, it can be used to implement a correspondence between a set of "logical" abstract names and a set of folder names on the filesystem.

The apply function of this DataLoadingBlock takes a "key" as input (a str) and returns the mapped value corresponding to map[key]. Note that while the constructor of this class sets a value for type_id, developers are recommended to set a more meaningful value that better speaks to their application.

Multiple instances of this loading_block may be used in the same DataLoadingPlan, provided that they are given different type_id via the constructor.

Source code in fedbiomed/common/data/_data_loading_plan.py

def __init__(self):
    super(MapperBlock, self).__init__()
    self.map = {}
    self._serialization_validator.update_validation_scheme(MapperBlock._extra_validation_scheme())

Attributes

map `instance-attribute`

map = {}

Functions

apply(key)

Returns the value mapped to the key, if it exists.

Raises:

Type	Description
`FedbiomedLoadingBlockError`	if map is not a dict or the key does not exist.

Source code in fedbiomed/common/data/_data_loading_plan.pydef apply(self, key):
    """Returns the value mapped to the key, if it exists.

    Raises:
        FedbiomedLoadingBlockError: if map is not a dict or the key does not exist.
    """
    if not isinstance(self.map, dict) or key not in self.map:
        msg = f"{ErrorNumbers.FB614.value} Mapper block error: no key '{key}' in mapping dictionary"
        logger.debug(msg)
        raise FedbiomedLoadingBlockError(msg)
    return self.map[key]

deserialize(load_from)

Reconstruct the DataLoadingBlock from a serialized version.

Parameters:

Name	Type	Description	Default
`load_from`	`dict`	a dictionary as obtained by the serialize function.	required

Returns:

Type	Description
`DataLoadingBlock`	the self instance

Source code in fedbiomed/common/data/_data_loading_plan.pydef deserialize(self, load_from: dict) -> DataLoadingBlock:
    """Reconstruct the [DataLoadingBlock][fedbiomed.common.data._data_loading_plan.DataLoadingBlock]
    from a serialized version.

    Args:
        load_from (dict): a dictionary as obtained by the serialize function.
    Returns:
        the self instance
    """
    super(MapperBlock, self).deserialize(load_from)
    self.map = load_from['map']
    return self

serialize()

Serializes the class in a format similar to json.

Returns:

Type	Description
`dict`	a dictionary of key-value pairs sufficient for reconstructing
`dict`	the DataLoadingBlock.

Source code in fedbiomed/common/data/_data_loading_plan.pydef serialize(self) -> dict:
    """Serializes the class in a format similar to json.

    Returns:
        a dictionary of key-value pairs sufficient for reconstructing
        the [DataLoadingBlock][fedbiomed.common.data._data_loading_plan.DataLoadingBlock].
    """
    ret = super(MapperBlock, self).serialize()
    ret.update({'map': self.map})
    return ret

MedicalFolderBase

CLASS

MedicalFolderBase(root=None)

Bases: DataLoadingPlanMixin

Controller class for Medical Folder dataset.

Contains methods to validate the MedicalFolder folder hierarchy and extract folder-base metadata information such as modalities, number of subject etc.

Parameters:

Name	Type	Description	Default
`root`	`Union[str, Path, None]`	path to Medical Folder root folder.	`None`

Source code in fedbiomed/common/data/_medical_datasets.py

def __init__(self, root: Union[str, Path, None] = None):
    """Constructs MedicalFolderBase

    Args:
        root: path to Medical Folder root folder.
    """
    super(MedicalFolderBase, self).__init__()

    if root is not None:
        root = self.validate_MedicalFolder_root_folder(root)

    self._root = root

Attributes

default_modality_names `class-attribute`

default_modality_names = ['T1', 'T2', 'label']

Functions

available_subjects(subjects_from_index, subjects_from_folder=None)

Checks missing subject folders and missing entries in demographics

Parameters:

Name	Type	Description	Default
`subjects_from_index`	`Union[list, pd.Series]`	Given subject folder names in demographics	required
`subjects_from_folder`	`list`	List of subject folder names to get intersection of given subject_from_index	`None`

Returns:

Name	Type	Description
`available_subjects`	`list[str]`	subjects that have an imaging data folder and are also present in the demographics file
`missing_subject_folders`	`list[str]`	subjects that are in the demographics file but do not have an imaging data folder
`missing_entries`	`list[str]`	subjects that have an imaging data folder but are not present in the demographics file

Source code in fedbiomed/common/data/_medical_datasets.pydef available_subjects(self,
                       subjects_from_index: Union[list, pd.Series],
                       subjects_from_folder: list = None) -> tuple[list[str], list[str], list[str]]:
    """Checks missing subject folders and missing entries in demographics

    Args:
        subjects_from_index: Given subject folder names in demographics
        subjects_from_folder: List of subject folder names to get intersection of given subject_from_index

    Returns:
        available_subjects: subjects that have an imaging data folder and are also present in the demographics file
        missing_subject_folders: subjects that are in the demographics file but do not have an imaging data folder
        missing_entries: subjects that have an imaging data folder but are not present in the demographics file
    """

    # Select all subject folders if it is not given
    if subjects_from_folder is None:
        subjects_from_folder = self.subjects_with_imaging_data_folders()

    # Missing subject that will cause warnings
    missing_subject_folders = list(set(subjects_from_index) - set(subjects_from_folder))

    # Missing entries that will cause errors
    missing_entries = list(set(subjects_from_folder) - set(subjects_from_index))

    # Intersection
    available_subjects = list(set(subjects_from_index).intersection(set(subjects_from_folder)))

    return available_subjects, missing_subject_folders, missing_entries

complete_subjects(subjects, modalities)

Retrieves subjects that have given all the modalities.

Parameters:

Name	Type	Description	Default
`subjects`	`List[str]`	List of subject folder names	required
`modalities`	`List[str]`	List of required modalities	required

Returns:

Type	Description
`List[str]`	List of subject folder names that have required modalities

Source code in fedbiomed/common/data/_medical_datasets.pydef complete_subjects(self, subjects: List[str], modalities: List[str]) -> List[str]:
    """Retrieves subjects that have given all the modalities.

    Args:
        subjects: List of subject folder names
        modalities: List of required modalities

    Returns:
        List of subject folder names that have required modalities
    """
    return [subject for subject in subjects if all(self.is_modalities_existing(subject, modalities))]

demographics_column_names(path)

staticmethod

Source code in fedbiomed/common/data/_medical_datasets.py@staticmethod
def demographics_column_names(path: Union[str, Path]):
    return MedicalFolderBase.read_demographics(path).columns.values

get_dataset_type()

staticmethod

Source code in fedbiomed/common/data/_medical_datasets.py@staticmethod
def get_dataset_type() -> DatasetTypes:
    return DatasetTypes.MEDICAL_FOLDER

is_modalities_existing(subject, modalities)

Checks whether given modalities exists in the subject directory

Parameters:

Name	Type	Description	Default
`subject`	`str`	Subject ID or subject folder name	required
`modalities`	`List[str]`	List of modalities to check	required

Returns:

Type	Description
`List[bool]`	List of `bool` that represents whether modality is existing respectively for each of modality.

Raises:

Type	Description
`FedbiomedDatasetError`	bad argument type

Source code in fedbiomed/common/data/_medical_datasets.pydef is_modalities_existing(self, subject: str, modalities: List[str]) -> List[bool]:
    """Checks whether given modalities exists in the subject directory

    Args:
        subject: Subject ID or subject folder name
        modalities: List of modalities to check

    Returns:
        List of `bool` that represents whether modality is existing respectively for each of modality.

    Raises:
        FedbiomedDatasetError: bad argument type
    """
    if not isinstance(subject, str):
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Expected string for subject folder/ID, "
                                    f"but got {type(subject)}")
    if not isinstance(modalities, list):
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Expected a list for modalities, "
                                    f"but got {type(modalities)}")
    if not all([type(m) is str for m in modalities]):
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Expected a list of string for modalities, "
                                    f"but some modalities are "
                                    f"{' '.join([ str(type(m) for m in modalities if type(m) != str)])}")
    are_modalities_existing = list()
    for modality in modalities:
        modality_folder = self._subject_modality_folder(subject, modality)
        are_modalities_existing.append(bool(modality_folder) and
                                       self._root.joinpath(subject, modality_folder).is_dir())
    return are_modalities_existing

modalities()

Gets all modalities based either on all possible candidates or those provided by the DataLoadingPlan.

Returns:

Type	Description
`list`	List of unique available modalities
`list`	List of all encountered modality folders in each subject folder, appearing once per folder

Source code in fedbiomed/common/data/_medical_datasets.pydef modalities(self) -> Tuple[list, list]:
    """Gets all modalities based either on all possible candidates or those provided by the DataLoadingPlan.

    Returns:
         List of unique available modalities
         List of all encountered modality folders in each subject folder, appearing once per folder
    """
    modality_candidates, modality_folders_list = self.modalities_candidates_from_subfolders()
    if self._dlp is not None and MedicalFolderLoadingBlockTypes.MODALITIES_TO_FOLDERS in self._dlp:
        modalities = list(self._dlp[MedicalFolderLoadingBlockTypes.MODALITIES_TO_FOLDERS].map.keys())
        return modalities, modality_folders_list
    else:
        return modality_candidates, modality_folders_list

modalities_candidates_from_subfolders()

Gets all possible modality folders under root directory

Returns:

Type	Description
`list`	List of unique available modality folders appearing at least once
`list`	List of all encountered modality folders in each subject folder, appearing once per folder

Source code in fedbiomed/common/data/_medical_datasets.pydef modalities_candidates_from_subfolders(self) -> Tuple[list, list]:
    """ Gets all possible modality folders under root directory

    Returns:
         List of unique available modality folders appearing at least once
         List of all encountered modality folders in each subject folder, appearing once per folder
    """

    # Accept only folders that don't start with "." and "_"
    modalities = [f.name for f in self._root.glob("*/*") if f.is_dir() and not f.name.startswith((".", "_"))]
    return sorted(list(set(modalities))), modalities

read_demographics(path, index_col=None)

staticmethod

Read demographics tabular file for Medical Folder dataset

Raises:

Type	Description
`FedbiomedDatasetError`	bad file format

Source code in fedbiomed/common/data/_medical_datasets.py@staticmethod
def read_demographics(path: Union[str, Path], index_col: Optional[int] = None):
    """ Read demographics tabular file for Medical Folder dataset

    Raises:
        FedbiomedDatasetError: bad file format
    """
    path = Path(path)
    if not path.is_file() or path.suffix.lower() not in [".csv", ".tsv"]:
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Demographics should be CSV or TSV files")

    return pd.read_csv(path, index_col=index_col, engine='python')

root()

writable property

Root property of MedicalFolderController

Source code in fedbiomed/common/data/_medical_datasets.py@property
def root(self):
    """Root property of MedicalFolderController"""
    return self._root

subjects_with_imaging_data_folders()

Retrieves subject folder names under Medical Folder root directory.

Returns:

Type	Description
`List[str]`	subject folder names under Medical Folder root directory.

Source code in fedbiomed/common/data/_medical_datasets.pydef subjects_with_imaging_data_folders(self) -> List[str]:
    """Retrieves subject folder names under Medical Folder root directory.

    Returns:
        subject folder names under Medical Folder root directory.
    """
    return [f.name for f in self._root.iterdir() if f.is_dir() and not f.name.startswith(".")]

validate_MedicalFolder_root_folder(path)

staticmethod

Validates Medical Folder root directory by checking folder structure

Parameters:

Name	Type	Description	Default
`path`	`Union[str, Path]`	path to root directory	required

Returns:

Type	Description
`Path`	Path to root folder of Medical Folder dataset

Raises:

Type	Description
`FedbiomedDatasetError`	If path is not an instance of `str` or `pathlib.Path` - If path is not a directory

Source code in fedbiomed/common/data/_medical_datasets.py@staticmethod
def validate_MedicalFolder_root_folder(path: Union[str, Path]) -> Path:
    """ Validates Medical Folder root directory by checking folder structure

    Args:
        path: path to root directory

    Returns:
        Path to root folder of Medical Folder dataset

    Raises:
        FedbiomedDatasetError: - If path is not an instance of `str` or `pathlib.Path`
                               - If path is not a directory
    """
    if not isinstance(path, (Path, str)):
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: The argument root should an instance of "
                                    f"`Path` or `str`, but got {type(path)}")

    if not isinstance(path, Path):
        path = Path(path)

    path = Path(path).expanduser().resolve()

    if not path.exists():
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Folder or file {path} not found on system")
    if not path.is_dir():
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Root for Medical Folder dataset "
                                    f"should be a directory.")

    directories = [f for f in path.iterdir() if f.is_dir()]
    if len(directories) == 0:
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Root folder of Medical Folder should "
                                    f"contain subject folders, but no sub folder has been found. ")

    modalities = [f for f in path.glob("*/*") if f.is_dir()]
    if len(modalities) == 0:
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value} Subject folders for Medical Folder should "
                                    f"contain modalities as folders. Folder structure should be "
                                    f"root/<subjects>/<modalities>")

    return path

MedicalFolderController

CLASS

MedicalFolderController(root=None)

Bases: MedicalFolderBase

Utility class to construct and verify Medical Folder datasets without knowledge of the experiment.

The purpose of this class is to enable key functionalities related to the MedicalFolderDataset at the time of dataset deployment, i.e. when the data is being added to the node's database.

Specifically, the MedicalFolderController class can be used to: - construct a MedicalFolderDataset with all available data modalities, without knowing which ones will be used as targets or features during an experiment - validate that the proper folder structure has been respected by the data managers preparing the data - identify which subjects have which modalities

Parameters:

Name	Type	Description	Default
`root`	`str`	Folder path to dataset. Defaults to None.	`None`

Source code in fedbiomed/common/data/_medical_datasets.py

def __init__(self, root: str = None):
    """Constructs MedicalFolderController

    Args:
        root: Folder path to dataset. Defaults to None.
    """
    super(MedicalFolderController, self).__init__(root=root)

Functions

load_MedicalFolder(tabular_file=None, index_col=None)

Load Medical Folder dataset with given tabular_file and index_col

Parameters:

Name	Type	Description	Default
`tabular_file`	`Union[str, Path]`	File path to demographics data set	`None`
`index_col`	`Union[str, int]`	Column index that represents subject folder names	`None`

Returns:

Type	Description
`MedicalFolderDataset`	MedicalFolderDataset object

Raises:

Type	Description
`FedbiomedDatasetError`	If Medical Folder dataset is not successfully loaded

Source code in fedbiomed/common/data/_medical_datasets.pydef load_MedicalFolder(self,
                       tabular_file: Union[str, Path] = None,
                       index_col: Union[str, int] = None) -> MedicalFolderDataset:
    """ Load Medical Folder dataset with given tabular_file and index_col

    Args:
        tabular_file: File path to demographics data set
        index_col: Column index that represents subject folder names

    Returns:
        MedicalFolderDataset object

    Raises:
        FedbiomedDatasetError: If Medical Folder dataset is not successfully loaded
    """
    if self._root is None:
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Can not load Medical Folder dataset without "
                                    f"declaring root directory. Please set root or build MedicalFolderController "
                                    f"with by providing `root` argument use")

    modalities, _ = self.modalities()

    try:
        dataset = MedicalFolderDataset(root=self._root,
                                       tabular_file=tabular_file,
                                       index_col=index_col,
                                       data_modalities=modalities,
                                       target_modalities=modalities)
    except FedbiomedError as e:
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Can not create Medical Folder dataset. {e}")

    if self._dlp is not None:
        dataset.set_dlp(self._dlp)
    return dataset

subject_modality_status(index=None)

Scans subjects and checks which modalities are existing for each subject

Parameters:

Name	Type	Description	Default
`index`	`Union[List, pd.Series]`	Array-like index that comes from reference csv file of Medical Folder dataset. It represents subject folder names. Defaults to None.	`None`

Returns:

Type	Description
`Dict`	Modality status for each subject that indicates which modalities are available

Source code in fedbiomed/common/data/_medical_datasets.pydef subject_modality_status(self, index: Union[List, pd.Series] = None) -> Dict:
    """Scans subjects and checks which modalities are existing for each subject

    Args:
        index: Array-like index that comes from reference csv file of Medical Folder dataset. It represents subject
            folder names. Defaults to None.
    Returns:
        Modality status for each subject that indicates which modalities are available
    """

    modalities, _ = self.modalities()
    subjects = self.subjects_with_imaging_data_folders()
    modality_status = {"columns": [*modalities], "data": [], "index": []}

    if index is not None:
        _, missing_subjects, missing_entries = self.available_subjects(subjects_from_index=index)
        modality_status["columns"].extend(["in_folder", "in_index"])

    for subject in subjects:
        modality_report = self.is_modalities_existing(subject, modalities)
        status_list = [status for status in modality_report]
        if index is not None:
            status_list.append(False if subject in missing_subjects else True)
            status_list.append(False if subject in missing_entries else True)

        modality_status["data"].append(status_list)
        modality_status["index"].append(subject)

    return modality_status

MedicalFolderDataset

CLASS

MedicalFolderDataset(
    root,
    data_modalities="T1",
    transform=None,
    target_modalities="label",
    target_transform=None,
    demographics_transform=None,
    tabular_file=None,
    index_col=None,
)

Bases: Dataset, MedicalFolderBase

Torch dataset following the Medical Folder Structure.

The Medical Folder structure is loosely inspired by the BIDS standard [1]. It should respect the following pattern:

└─ MedicalFolder_root/
    └─ demographics.csv
    └─ sub-01/
        ├─ T1/
        │  └─ sub-01_xxx.nii.gz
        └─ T2/
            ├─ sub-01_xxx.nii.gz

where the first-level subfolders or the root correspond to the subjects, and each subject's folder contains subfolders for each imaging modality. Images should be in Nifti format, with either the .nii or .nii.gz extensions. Finally, within the root folder there should also be a demographics file containing at least one index column with the names of the subject folders. This column will be used to explore the data and load the images. The demographics file may contain additional information about each subject and will be loaded alongside the images by our framework.

[1] https://bids.neuroimaging.io/

Parameters:

Name	Type	Description	Default
`root`	`Union[str, PathLike, Path]`	Root folder containing all the subject directories.	required
`data_modalities`	`str, Iterable`	Modality or modalities to be used as data sources.	`'T1'`
`transform`	`Union[Callable, Dict[str, Callable]]`	A function or dict of function transform(s) that preprocess each data source.	`None`
`target_modalities`	`Optional[Union[str, Iterable[str]]]`	(str, Iterable): Modality or modalities to be used as target sources.	`'label'`
`target_transform`	`Union[Callable, Dict[str, Callable]]`	A function or dict of function transform(s) that preprocess each target source.	`None`
`demographics_transform`	`Optional[Callable]`	TODO	`None`
`tabular_file`	`Union[str, PathLike, Path, None]`	Path to a CSV or Excel file containing the demographic information from the patients.	`None`
`index_col`	`Union[int, str, None]`	Column name in the tabular file containing the subject ids which mush match the folder names.	`None`

Source code in fedbiomed/common/data/_medical_datasets.py

def __init__(self,
             root: Union[str, PathLike, Path],
             data_modalities: Optional[Union[str, Iterable[str]]] = 'T1',
             transform: Union[Callable, Dict[str, Callable]] = None,
             target_modalities: Optional[Union[str, Iterable[str]]] = 'label',
             target_transform: Union[Callable, Dict[str, Callable]] = None,
             demographics_transform: Optional[Callable] = None,
             tabular_file: Union[str, PathLike, Path, None] = None,
             index_col: Union[int, str, None] = None,
             ):
    """Constructor for class `MedicalFolderDataset`.

    Args:
        root: Root folder containing all the subject directories.
        data_modalities (str, Iterable): Modality or modalities to be used as data sources.
        transform: A function or dict of function transform(s) that preprocess each data source.
        target_modalities: (str, Iterable): Modality or modalities to be used as target sources.
        target_transform: A function or dict of function transform(s) that preprocess each target source.
        demographics_transform: TODO
        tabular_file: Path to a CSV or Excel file containing the demographic information from the patients.
        index_col: Column name in the tabular file containing the subject ids which mush match the folder names.
    """
    super(MedicalFolderDataset, self).__init__(root=root)

    self._tabular_file = tabular_file
    self._index_col = index_col

    self._data_modalities = [data_modalities] if isinstance(data_modalities, str) else data_modalities
    self._target_modalities = [target_modalities] if isinstance(target_modalities, str) else target_modalities

    self._transform = self._check_and_reformat_transforms(transform, data_modalities)
    self._target_transform = self._check_and_reformat_transforms(target_transform, target_modalities)
    self._demographics_transform = demographics_transform if demographics_transform is not None else lambda x: {}

    # Image loader
    self._reader = Compose([
        LoadImage(ITKReader(), image_only=True),
        ToTensor()
    ])

Attributes

ALLOWED_EXTENSIONS `class-attribute`

ALLOWED_EXTENSIONS = ['.nii', '.nii.gz']

Functions

demographics()

cached property

Loads tabular data file (supports excel, csv, tsv and colon separated value files).

Source code in fedbiomed/common/data/_medical_datasets.py@property
@cache
def demographics(self) -> pd.DataFrame:
    """Loads tabular data file (supports excel, csv, tsv and colon separated value files)."""

    if self._tabular_file is None or self._index_col is None:
        # If there is no tabular file return empty data frame
        return None

    # Read demographics CSV
    try:
        demographics = self.read_demographics(self._tabular_file, self._index_col)
    except Exception as e:
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Can not load demographics tabular file. "
                                    f"Error message is: {e}")

    # Keep the first one in duplicated subjects
    return demographics.loc[~demographics.index.duplicated(keep="first")]

get_nontransformed_item(item)

Source code in fedbiomed/common/data/_medical_datasets.pydef get_nontransformed_item(self, item):
    # For the first item retrieve complete subject folders
    subjects = self.subject_folders()

    if not subjects:
        # case where subjects is an empty list (subject folders have not been found)
        raise FedbiomedDatasetError(
            f"{ErrorNumbers.FB613.value}: Cannot find complete subject folders with all the modalities")
    # Get subject folder
    subject_folder = subjects[item]

    # Load data modalities
    data = self.load_images(subject_folder, modalities=self._data_modalities)

    # Load target modalities
    targets = self.load_images(subject_folder, modalities=self._target_modalities)

    # Demographics
    demographics = self._get_from_demographics(subject_id=subject_folder.name)
    return (data, demographics), targets

index_col()

writable property

Getter/setter of the column containing folder's name (in the tabular file)

Source code in fedbiomed/common/data/_medical_datasets.py@property
def index_col(self):
    """Getter/setter of the column containing folder's name (in the tabular file)"""
    return self._index_col

load_images(subject_folder, modalities)

Loads modality images in given subject folder

Parameters:

Name	Type	Description	Default
`subject_folder`	`Path`	Subject folder where modalities are stored	required
`modalities`	`list`	List of available modalities	required

Returns:

Type	Description
`Dict[str, torch.Tensor]`	Subject image data as victories where keys represent each modality.

Source code in fedbiomed/common/data/_medical_datasets.pydef load_images(self, subject_folder: Path, modalities: list) -> Dict[str, torch.Tensor]:
    """Loads modality images in given subject folder

    Args:
        subject_folder: Subject folder where modalities are stored
        modalities: List of available modalities

    Returns:
        Subject image data as victories where keys represent each modality.
    """
    subject_data = {}

    for modality in modalities:
        modality_folder = self._subject_modality_folder(subject_folder, modality)
        image_folder = subject_folder.joinpath(modality_folder)
        nii_files = [p.resolve() for p in image_folder.glob("**/*")
                     if ''.join(p.suffixes) in self.ALLOWED_EXTENSIONS]

        # Load the first, we assume there is going to be a single image per modality for now.
        img_path = nii_files[0]
        img = self._reader(img_path)
        subject_data[modality] = img

    return subject_data

set_dataset_parameters(parameters)

Sets dataset parameters.

Parameters:

Name	Type	Description	Default
`parameters`	`dict`	Parameters to initialize	required

Raises:

Type	Description
`FedbiomedDatasetError`	If given parameters are not of `dict` type

Source code in fedbiomed/common/data/_medical_datasets.pydef set_dataset_parameters(self, parameters: dict):
    """Sets dataset parameters.

    Args:
        parameters: Parameters to initialize

    Raises:
        FedbiomedDatasetError: If given parameters are not of `dict` type
    """
    if not isinstance(parameters, dict):
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Expected type for `parameters` is `dict, "
                                    f"but got {type(parameters)}`")

    for key, value in parameters.items():
        if hasattr(self, key):
            setattr(self, key, value)
        else:
            raise FedbiomedDatasetError(f"{ErrorNumbers.FB613.value}: Trying to set non existing attribute '{key}'")

shape()

Retrieves shape information for modalities and demographics csv

Source code in fedbiomed/common/data/_medical_datasets.pydef shape(self) -> dict:
    """Retrieves shape information for modalities and demographics csv"""

    # Get all modalities
    data_modalities = list(set(self._data_modalities))
    target_modalities = list(set(self._target_modalities))
    modalities = list(set(self._data_modalities + self._target_modalities))
    (image, _), targets = self.get_nontransformed_item(0)

    result = {modality: list(image[modality].shape) for modality in data_modalities}

    result.update({modality: list(targets[modality].shape) for modality in target_modalities})
    num_modalities = len(modalities)
    demographics_shape = self.demographics.shape if self.demographics is not None else None
    result.update({"demographics": demographics_shape, "num_modalities": num_modalities})

    return result

subject_folders()

Retrieves subject folder names of only those who have their complete modalities

Returns:

Type	Description
`List[Path]`	List of subject directories that has all requested modalities

Source code in fedbiomed/common/data/_medical_datasets.pydef subject_folders(self) -> List[Path]:
    """Retrieves subject folder names of only those who have their complete modalities

    Returns:
        List of subject directories that has all requested modalities
    """

    # If demographics are present
    if self._tabular_file and self._index_col is not None:
        complete_subject_folders = self.subjects_registered_in_demographics
    else:
        complete_subject_folders = self.subjects_has_all_modalities

    return [self._root.joinpath(folder) for folder in complete_subject_folders]

subjects_has_all_modalities()

property

Gets only the subjects that have all required modalities

Source code in fedbiomed/common/data/_medical_datasets.py@property
def subjects_has_all_modalities(self):
    """Gets only the subjects that have all required modalities"""

    all_modalities = list(set(self._data_modalities + self._target_modalities))
    subject_folder_names = self.subjects_with_imaging_data_folders()

    # Get subject that has all requested modalities
    complete_subjects = self.complete_subjects(subject_folder_names, all_modalities)

    return complete_subjects

subjects_registered_in_demographics()

cached property

Gets the subject only those who are present in the demographics file.

Source code in fedbiomed/common/data/_medical_datasets.py@property
@cache
def subjects_registered_in_demographics(self):
    """Gets the subject only those who are present in the demographics file."""

    complete_subject_folders, *_ = self.available_subjects(
        subjects_from_folder=self.subjects_has_all_modalities,
        subjects_from_index=self.demographics.index)

    return complete_subject_folders

tabular_file()

writable property

Source code in fedbiomed/common/data/_medical_datasets.py@property
def tabular_file(self):
    return self._tabular_file

MedicalFolderLoadingBlockTypes

Bases: DataLoadingBlockTypes, Enum

Attributes

MODALITIES_TO_FOLDERS `class-attribute`

MODALITIES_TO_FOLDERS: str = 'modalities_to_folders'

NIFTIFolderDataset

CLASS

NIFTIFolderDataset(
    root, transform=None, target_transform=None
)

Bases: Dataset

A Generic class for loading NIFTI Images using the folder structure as the target classes' labels.

Supported formats: - NIFTI and compressed NIFTI files: .nii, .nii.gz

This is a Dataset useful in classification tasks. Its usage is quite simple, quite similar to torchvision.datasets.ImageFolder. Images must be contained in first level sub-folders (level 2+ sub-folders are ignored) that describe the target class they belong to (target class label is the name of the folder).

nifti_dataset_root_folder
├── control_group
│   ├── subject_1.nii
│   └── subject_2.nii
│   └── ...
└── disease_group
    ├── subject_3.nii
    └── subject_4.nii
    └── ...

In this example, there are 4 samples (one from each *.nii file), 2 target class, with labels control_group and disease_group. subject_1.nii has class label control_group, subject_3.nii has class label disease_group,etc.

Parameters:

Name	Type	Description	Default
`root`	`Union[str, PathLike, Path]`	folder where the data is located.	required
`transform`	`Union[Callable, None]`	transforms to be applied on data.	`None`
`target_transform`	`Union[Callable, None]`	transforms to be applied on target indexes.	`None`

Raises:

Type	Description
`FedbiomedDatasetError`	bad argument type
`FedbiomedDatasetError`	bad root path

Source code in fedbiomed/common/data/_medical_datasets.py

def __init__(self, root: Union[str, PathLike, Path],
             transform: Union[Callable, None] = None,
             target_transform: Union[Callable, None] = None
             ):
    """Constructor of the class

    Args:
        root: folder where the data is located.
        transform: transforms to be applied on data.
        target_transform: transforms to be applied on target indexes.

    Raises:
        FedbiomedDatasetError: bad argument type
        FedbiomedDatasetError: bad root path
    """
    # check parameters type
    for tr, trname in ((transform, 'transform'), (target_transform, 'target_transform')):
        if not callable(tr) and tr is not None:
            raise FedbiomedDatasetError(f"{ErrorNumbers.FB612.value}: Parameter {trname} has incorrect "
                                        f"type {type(tr)}, cannot create dataset.")

    if not isinstance(root, str) and not isinstance(root, PathLike) and not isinstance(root, Path):
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB612.value}: Parameter `root` has incorrect type "
                                    f"{type(root)}, cannot create dataset.")

    # initialize object variables
    self._files = []
    self._class_labels = []
    self._targets = []

    try:
        self._root_dir = Path(root).expanduser()
    except RuntimeError as e:
        raise FedbiomedDatasetError(
            f"{ErrorNumbers.FB612.value}: Cannot expand path {root}, error message is: {e}")

    self._transform = transform
    self._target_transform = target_transform
    self._reader = Compose([
        LoadImage(ITKReader(), image_only=True),
        ToTensor()
    ])

    self._explore_root_folder()

Functions

files()

Retrieves the paths to the sample images.

Gives sames order as when retrieving the sample images (eg self.files[0] is the path to self.__getitem__[0])

Returns:

Type	Description
`List[Path]`	List of the absolute paths to the sample images

Source code in fedbiomed/common/data/_medical_datasets.pydef files(self) -> List[Path]:
    """Retrieves the paths to the sample images.

    Gives sames order as when retrieving the sample images (eg `self.files[0]`
    is the path to `self.__getitem__[0]`)

    Returns:
        List of the absolute paths to the sample images
    """
    return self._files

labels()

Retrieves the labels of the target classes.

Target label index is the index of the corresponding label in this list.

Returns:

Type	Description
`List[str]`	List of the labels of the target classes.

Source code in fedbiomed/common/data/_medical_datasets.pydef labels(self) -> List[str]:
    """Retrieves the labels of the target classes.

    Target label index is the index of the corresponding label in this list.

    Returns:
        List of the labels of the target classes.
    """
    return self._class_labels

NPDataLoader

CLASS

NPDataLoader(
    dataset,
    target,
    batch_size=1,
    shuffle=False,
    random_seed=None,
    drop_last=False,
)

DataLoader for a Numpy dataset.

This data loader encapsulates a dataset composed of numpy arrays and presents an Iterable interface. One design principle was to try to make the interface as similar as possible to a torch.DataLoader.

Attributes:

Name	Type	Description
`_dataset`		(np.ndarray) a 2d array of features
`_target`		(np.ndarray) an optional array of target values
`_batch_size`		(int) the number of elements in one batch
`_shuffle`		(bool) if True, shuffle the data at the beginning of every epoch
`_drop_last`		(bool) if True, drop the last batch if it does not contain batch_size elements
`_rng`		(np.random.Generator) the random number generator for shuffling

Parameters:

Name	Type	Description	Default
`dataset`	`np.ndarray`	2D Numpy array	required
`target`	`np.ndarray`	Numpy array of target values	required
`batch_size`	`int`	batch size for each iteration	`1`
`shuffle`	`bool`	shuffle before iteration	`False`
`random_seed`	`Optional[int]`	an optional integer to set the numpy random seed for shuffling. If it equals None, then no attempt will be made to set the random seed.	`None`
`drop_last`	`bool`	whether to drop the last batch in case it does not fill the whole batch size	`False`

Source code in fedbiomed/common/data/_sklearn_data_manager.py

def __init__(self,
             dataset: np.ndarray,
             target: np.ndarray,
             batch_size: int = 1,
             shuffle: bool = False,
             random_seed: Optional[int] = None,
             drop_last: bool = False):
    """Construct numpy data loader

    Args:
        dataset: 2D Numpy array
        target: Numpy array of target values
        batch_size: batch size for each iteration
        shuffle: shuffle before iteration
        random_seed: an optional integer to set the numpy random seed for shuffling. If it equals
            None, then no attempt will be made to set the random seed.
        drop_last: whether to drop the last batch in case it does not fill the whole batch size
    """

    if not isinstance(dataset, np.ndarray) or not isinstance(target, np.ndarray):
        msg = f"{ErrorNumbers.FB609.value}. Wrong input type for `dataset` or `target` in NPDataLoader. " \
              f"Expected type np.ndarray for both, instead got {type(dataset)} and" \
              f"{type(target)} respectively."
        logger.error(msg)
        raise FedbiomedTypeError(msg)

    # If the researcher gave a 1-dimensional dataset, we expand it to 2 dimensions
    if dataset.ndim == 1:
        logger.info(f"NPDataLoader expanding 1-dimensional dataset to become 2-dimensional.")
        dataset = dataset[:, np.newaxis]

    # If the researcher gave a 1-dimensional target, we expand it to 2 dimensions
    if target.ndim == 1:
        logger.info(f"NPDataLoader expanding 1-dimensional target to become 2-dimensional.")
        target = target[:, np.newaxis]

    if dataset.ndim != 2 or target.ndim != 2:
        msg = f"{ErrorNumbers.FB609.value}. Wrong shape for `dataset` or `target` in NPDataLoader. " \
              f"Expected 2-dimensional arrays, instead got {dataset.ndim}-dimensional " \
              f"and {target.ndim}-dimensional arrays respectively."
        logger.error(msg)
        raise FedbiomedValueError(msg)

    if len(dataset) != len(target):
        msg = f"{ErrorNumbers.FB609.value}. Inconsistent length for `dataset` and `target` in NPDataLoader. " \
              f"Expected same length, instead got len(dataset)={len(dataset)}, len(target)={len(target)}"
        logger.error(msg)
        raise FedbiomedValueError(msg)

    if not isinstance(batch_size, int):
        msg = f"{ErrorNumbers.FB609.value}. Wrong type for `batch_size` parameter of NPDataLoader. Expected a " \
              f"non-zero positive integer, instead got type {type(batch_size)}."
        logger.error(msg)
        raise FedbiomedTypeError(msg)

    if batch_size <= 0:
        msg = f"{ErrorNumbers.FB609.value}. Wrong value for `batch_size` parameter of NPDataLoader. Expected a " \
              f"non-zero positive integer, instead got value {batch_size}."
        logger.error(msg)
        raise FedbiomedValueError(msg)

    if not isinstance(shuffle, bool):
        msg = f"{ErrorNumbers.FB609.value}. Wrong type for `shuffle` parameter of NPDataLoader. Expected `bool`, " \
              f"instead got {type(shuffle)}."
        logger.error(msg)
        raise FedbiomedTypeError(msg)

    if not isinstance(drop_last, bool):
        msg = f"{ErrorNumbers.FB609.value}. Wrong type for `drop_last` parameter of NPDataLoader. " \
              f"Expected `bool`, instead got {type(drop_last)}."
        logger.error(msg)
        raise FedbiomedTypeError(msg)

    if random_seed is not None and not isinstance(random_seed, int):
        msg = f"{ErrorNumbers.FB609.value}. Wrong type for `random_seed` parameter of NPDataLoader. " \
              f"Expected int or None, instead got {type(random_seed)}."
        logger.error(msg)
        raise FedbiomedTypeError(msg)

    self._dataset = dataset
    self._target = target
    self._batch_size = batch_size
    self._shuffle = shuffle
    self._drop_last = drop_last
    self._rng = np.random.default_rng(random_seed)

Functions

batch_size()

Returns the batch size

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef batch_size(self) -> int:
    """Returns the batch size"""
    return self._batch_size

dataset()

property

Returns the encapsulated dataset

This needs to be a property to harmonize the API with torch.DataLoader, enabling us to write generic code for both DataLoaders.

Source code in fedbiomed/common/data/_sklearn_data_manager.py@property
def dataset(self) -> np.ndarray:
    """Returns the encapsulated dataset

    This needs to be a property to harmonize the API with torch.DataLoader, enabling us to write
    generic code for both DataLoaders.
    """
    return self._dataset

drop_last()

Returns the boolean drop_last attribute

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef drop_last(self) -> bool:
    """Returns the boolean drop_last attribute"""
    return self._drop_last

n_remainder_samples()

Returns the remainder of the division between dataset length and batch size.

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef n_remainder_samples(self) -> int:
    """Returns the remainder of the division between dataset length and batch size."""
    return len(self._dataset) % self._batch_size

rng()

Returns the random number generator

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef rng(self) -> np.random.Generator:
    """Returns the random number generator"""
    return self._rng

shuffle()

Returns the boolean shuffle attribute

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef shuffle(self) -> bool:
    """Returns the boolean shuffle attribute"""
    return self._shuffle

target()

property

Returns the array of target values

This has been made a property to have a homogeneous interface with the dataset property above.

Source code in fedbiomed/common/data/_sklearn_data_manager.py@property
def target(self) -> np.ndarray:
    """Returns the array of target values

    This has been made a property to have a homogeneous interface with the dataset property above.
    """
    return self._target

SerializationValidation

CLASS

SerializationValidation()

Provide Validation capabilities for serializing/deserializing a [DataLoadingBlock] or [DataLoadingPlan].

When a developer inherits from [DataLoadingBlock] to define a custom loading block, they are required to call the _serialization_validator.update_validation_scheme function with a dictionary argument containing the rules to validate all the additional fields that will be used in the serialization of their loading block.

These rules must follow the syntax explained in the SchemeValidator class.

For example

    class MyLoadingBlock(DataLoadingBlock):
        def __init__(self):
            self.my_custom_data = {}
            self._serialization_validator.update_validation_scheme({
                'custom_data': {
                    'rules': [dict, ...any other rules],
                    'required': True
                }
            })
        def serialize(self):
            serialized = super().serialize()
            serialized.update({'custom_data': self.my_custom_data})
            return serialized

Attributes:

Name	Type	Description
`_validation_scheme`		(dict) an extensible set of rules to validate the DataLoadingBlock metadata.

Source code in fedbiomed/common/data/_data_loading_plan.py

def __init__(self):
    self._validation_scheme = {}

Functions

dlb_default_scheme()

classmethod

The dictionary of default validation rules for a serialized [DataLoadingBlock].

Source code in fedbiomed/common/data/_data_loading_plan.py@classmethod
def dlb_default_scheme(cls) -> Dict:
    """The dictionary of default validation rules for a serialized [DataLoadingBlock]."""
    return {
        'loading_block_class': {
            'rules': [str, cls._identifier_validation_hook],
            'required': True,
        },
        'loading_block_module': {
            'rules': [str, cls._identifier_validation_hook],
            'required': True,
        },
        'dlb_id': {
            'rules': [str, cls._serial_id_validation_hook],
            'required': True,
        },
    }

dlp_default_scheme()

classmethod

The dictionary of default validation rules for a serialized [DataLoadingPlan].

Source code in fedbiomed/common/data/_data_loading_plan.py@classmethod
def dlp_default_scheme(cls) -> Dict:
    """The dictionary of default validation rules for a serialized [DataLoadingPlan]."""
    return {
        'dlp_id': {
            'rules': [str],
            'required': True,
        },
        'dlp_name': {
            'rules': [str],
            'required': True,
        },
        'target_dataset_type': {
            'rules': [str, cls._target_dataset_type_validator],
            'required': True,
        },
        'loading_blocks': {
            'rules': [dict, cls._loading_blocks_types_validator],
            'required': True
        },
        'key_paths': {
            'rules': [dict, cls._key_paths_validator],
            'required': True
        }
    }

update_validation_scheme(new_scheme)

Updates the validation scheme.

Parameters:

Name	Type	Description	Default
`new_scheme`	`dict`	(dict) new dict of rules	required

Source code in fedbiomed/common/data/_data_loading_plan.pydef update_validation_scheme(self, new_scheme: dict) -> None:
    """Updates the validation scheme.

    Args:
        new_scheme: (dict) new dict of rules
    """
    self._validation_scheme.update(new_scheme)

validate(dlb_metadata, exception_type, only_required=True)

Validate a dict of dlb_metadata according to the _validation_scheme.

Parameters:

Name	Type	Description	Default
`dlb_metadata`	`dict)`	the [DataLoadingBlock] metadata, as returned by serialize or as loaded from the node database.	required
`exception_type`	`Type[FedbiomedError]`	the type of the exception to be raised when validation fails.	required
`only_required`	`bool)`	see SchemeValidator.populate_with_defaults	`True`

Raises:

Type	Description
`exception_type`	if the validation fails.

Source code in fedbiomed/common/data/_data_loading_plan.pydef validate(self,
             dlb_metadata: Dict,
             exception_type: Type[FedbiomedError],
             only_required: bool = True) -> None:
    """Validate a dict of dlb_metadata according to the _validation_scheme.

    Args:
        dlb_metadata (dict) : the [DataLoadingBlock] metadata, as returned by serialize or as loaded from the
            node database.
        exception_type (Type[FedbiomedError]): the type of the exception to be raised when validation fails.
        only_required (bool) : see SchemeValidator.populate_with_defaults
    Raises:
        exception_type: if the validation fails.
    """
    try:
        sc = SchemeValidator(self._validation_scheme)
    except RuleError as e:
        msg = ErrorNumbers.FB614.value + f": {e}"
        logger.critical(msg)
        raise exception_type(msg)

    try:
        dlb_metadata = sc.populate_with_defaults(dlb_metadata,
                                                 only_required=only_required)
    except ValidatorError as e:
        msg = ErrorNumbers.FB614.value + f": {e}"
        logger.critical(msg)
        raise exception_type(msg)

    try:
        sc.validate(dlb_metadata)
    except ValidateError as e:
        msg = ErrorNumbers.FB614.value + f": {e}"
        logger.critical(msg)
        raise exception_type(msg)

SkLearnDataManager

CLASS

SkLearnDataManager(inputs, target, kwargs)

Bases: object

Wrapper for pd.DataFrame, pd.Series and np.ndarray datasets.

Manages datasets for scikit-learn based model training. Responsible for managing inputs, and target variables that have been provided in training_data of scikit-learn based training plans.

The loader arguments will be passed to the [fedbiomed.common.data.NPDataLoader] classes instantiated when split is called. They may include batch_size, shuffle, drop_last, and others. Please see the [fedbiomed.common.data.NPDataLoader] class for more details.

Parameters:

Name	Type	Description	Default
`inputs`	`Union[np.ndarray, pd.DataFrame, pd.Series]`	Independent variables (inputs, features) for model training	required
`target`	`Union[np.ndarray, pd.DataFrame, pd.Series]`	Dependent variable/s (target) for model training and validation	required
`**kwargs`	`dict`	Loader arguments	`{}`

Source code in fedbiomed/common/data/_sklearn_data_manager.py

def __init__(self,
             inputs: Union[np.ndarray, pd.DataFrame, pd.Series],
             target: Union[np.ndarray, pd.DataFrame, pd.Series],
             **kwargs: dict):

    """ Construct a SkLearnDataManager from an array of inputs and an array of targets.

    The loader arguments will be passed to the [fedbiomed.common.data.NPDataLoader] classes instantiated
    when split is called. They may include batch_size, shuffle, drop_last, and others. Please see the
    [fedbiomed.common.data.NPDataLoader] class for more details.

    Args:
        inputs: Independent variables (inputs, features) for model training
        target: Dependent variable/s (target) for model training and validation
        **kwargs: Loader arguments
    """

    if not isinstance(inputs, (np.ndarray, pd.DataFrame, pd.Series)) or \
            not isinstance(target, (np.ndarray, pd.DataFrame, pd.Series)):
        msg = f"{ErrorNumbers.FB609.value}. Parameters `inputs` and `target` for " \
              f"initialization of {self.__class__.__name__} should be one of np.ndarray, pd.DataFrame, pd.Series"
        logger.error(msg)
        raise FedbiomedTypeError(msg)

    # Convert pd.DataFrame or pd.Series to np.ndarray for `inputs`
    if isinstance(inputs, (pd.DataFrame, pd.Series)):
        self._inputs = inputs.to_numpy()
    else:
        self._inputs = inputs

    # Convert pd.DataFrame or pd.Series to np.ndarray for `target`
    if isinstance(target, (pd.DataFrame, pd.Series)):
        self._target = target.to_numpy()
    else:
        self._target = target

    # Additional loader arguments
    self._loader_arguments = kwargs

    # Subset None means that train/validation split has not been performed
    self._subset_test: Union[Tuple[np.ndarray, np.ndarray], None] = None
    self._subset_train: Union[Tuple[np.ndarray, np.ndarray], None] = None

Functions

dataset()

Gets the entire registered dataset.

This method returns whole dataset as it is without any split.

Returns:

Name	Type	Description
`inputs`	`np.ndarray`	Input variables for model training
`targets`	`np.ndarray`	Target variable for model training

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef dataset(self) -> Tuple[np.ndarray, np.ndarray]:
    """Gets the entire registered dataset.

    This method returns whole dataset as it is without any split.

    Returns:
         inputs: Input variables for model training
         targets: Target variable for model training
    """
    return self._inputs, self._target

split(test_ratio)

Splits np.ndarray dataset into train and validation.

Parameters:

Name	Type	Description	Default
`test_ratio`	`float`	Ratio for validation set partition. Rest of the samples will be used for training	required

Raises:

Type	Description
`FedbiomedSkLearnDataManagerError`	If the `test_ratio` is not between 0 and 1

Returns:

Name	Type	Description
`train_loader`	`NPDataLoader`	NPDataLoader of input variables for model training
`test_loader`	`NPDataLoader`	NPDataLoader of target variable for model training

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef split(self, test_ratio: float) -> Tuple[NPDataLoader, NPDataLoader]:
    """Splits `np.ndarray` dataset into train and validation.

    Args:
         test_ratio: Ratio for validation set partition. Rest of the samples will be used for training

    Raises:
        FedbiomedSkLearnDataManagerError: If the `test_ratio` is not between 0 and 1

    Returns:
         train_loader: NPDataLoader of input variables for model training
         test_loader: NPDataLoader of target variable for model training
    """
    if not isinstance(test_ratio, float):
        msg = f'{ErrorNumbers.FB609.value}: The argument `ratio` should be type `float` not {type(test_ratio)}'
        logger.error(msg)
        raise FedbiomedTypeError(msg)

    if test_ratio < 0. or test_ratio > 1.:
        msg = f'{ErrorNumbers.FB609.value}: The argument `ratio` should be equal or between 0 and 1, ' \
             f'not {test_ratio}'
        logger.error(msg)
        raise FedbiomedTypeError(msg)

    empty_subset = (np.array([]), np.array([]))

    if test_ratio <= 0.:
        self._subset_train = (self._inputs, self._target)
        self._subset_test = empty_subset
    elif test_ratio >= 1.:
        self._subset_train = empty_subset
        self._subset_test = (self._inputs, self._target)
    else:
        x_train, x_test, y_train, y_test = train_test_split(self._inputs, self._target, test_size=test_ratio)
        self._subset_test = (x_test, y_test)
        self._subset_train = (x_train, y_train)

    test_batch_size = max(1, len(self._subset_test[0]))
    return self._subset_loader(self._subset_train, **self._loader_arguments), \
        self._subset_loader(self._subset_test, batch_size=test_batch_size)

subset_test()

Gets Subset of dataset for validation partition.

Returns:

Name	Type	Description
`test_inputs`	`np.ndarray`	Input variables of validation subset for model validation
`test_target`	`np.ndarray`	Target variable of validation subset for model validation

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef subset_test(self) -> Tuple[np.ndarray, np.ndarray]:
    """Gets Subset of dataset for validation partition.

    Returns:
        test_inputs: Input variables of validation subset for model validation
        test_target: Target variable of validation subset for model validation
    """
    return self._subset_test

subset_train()

Gets Subset for train partition.

Returns:

Name	Type	Description
`test_inputs`	`np.ndarray`	Input variables of training subset for model training
`test_target`	`np.ndarray`	Target variable of training subset for model training

Source code in fedbiomed/common/data/_sklearn_data_manager.pydef subset_train(self) -> Tuple[np.ndarray, np.ndarray]:

    """Gets Subset for train partition.

    Returns:
        test_inputs: Input variables of training subset for model training
        test_target: Target variable of training subset for model training
    """

    return self._subset_train

TabularDataset

CLASS

TabularDataset(inputs, target)

Bases: Dataset

Torch based Dataset object to create torch Dataset from given numpy or dataframe type of input and target variables

Parameters:

Name	Type	Description	Default
`inputs`	`Union[np.ndarray, pd.DataFrame, pd.Series]`	Input variables that will be passed to network	required
`target`	`Union[np.ndarray, pd.DataFrame, pd.Series]`	Target variable for output layer	required

Raises:

Type	Description
`FedbiomedTorchDatasetError`	If input variables and target variable does not have equal length/size

Source code in fedbiomed/common/data/_tabular_dataset.py

def __init__(self,
             inputs: Union[np.ndarray, pd.DataFrame, pd.Series],
             target: Union[np.ndarray, pd.DataFrame, pd.Series]):
    """Constructs PyTorch dataset object

    Args:
        inputs: Input variables that will be passed to network
        target: Target variable for output layer

    Raises:
        FedbiomedTorchDatasetError: If input variables and target variable does not have
            equal length/size
    """

    # Inputs and target variable should be converted to the torch tensors
    # PyTorch provides `from_numpy` function to convert numpy arrays to
    # torch tensor. Therefore, if the arguments `inputs` and `target` are
    # instance one of `pd.DataFrame` or `pd.Series`, they should be converted to
    # numpy arrays
    if isinstance(inputs, (pd.DataFrame, pd.Series)):
        self.inputs = inputs.to_numpy()
    elif isinstance(inputs, np.ndarray):
        self.inputs = inputs
    else:
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB610.value}: The argument `inputs` should be "
                                                f"an instance one of np.ndarray, pd.DataFrame or pd.Series")
    # Configuring self.target attribute
    if isinstance(target, (pd.DataFrame, pd.Series)):
        self.target = target.to_numpy()
    elif isinstance(inputs, np.ndarray):
        self.target = target
    else:
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB610.value}: The argument `target` should be "
                                                f"an instance one of np.ndarray, pd.DataFrame or pd.Series")

    # The lengths should be equal
    if len(self.inputs) != len(self.target):
        raise FedbiomedDatasetError(f"{ErrorNumbers.FB610.value}: Length of input variables and target "
                                                f"variable does not match. Please make sure that they have "
                                                f"equal size while creating the method `training_data` of "
                                                f"TrainingPlan")

    # Convert `inputs` adn `target` to Torch floats
    self.inputs = from_numpy(self.inputs).float()
    self.target = from_numpy(self.target).float()

Attributes

inputs `instance-attribute`

inputs = from_numpy(self.inputs).float()

target `instance-attribute`

target = from_numpy(self.target).float()

Functions

get_dataset_type()

staticmethod

Source code in fedbiomed/common/data/_tabular_dataset.py@staticmethod
def get_dataset_type() -> DatasetTypes:
    return DatasetTypes.TABULAR

TorchDataManager

CLASS

TorchDataManager(dataset, kwargs)

Bases: object

Wrapper for PyTorch Dataset to manage loading operations for validation and train.

Parameters:

Name	Type	Description	Default
`dataset`	`Dataset`	Dataset object for torch.utils.data.DataLoader	required
`**kwargs`	`dict`	Arguments for PyTorch `DataLoader`	`{}`

Raises:

Type	Description
`FedbiomedTorchDataManagerError`	If the argument `dataset` is not an instance of `torch.utils.data.Dataset`

Source code in fedbiomed/common/data/_torch_data_manager.py

def __init__(self, dataset: Dataset, **kwargs: dict):
    """Construct  of class

    Args:
        dataset: Dataset object for torch.utils.data.DataLoader
        **kwargs: Arguments for PyTorch `DataLoader`

    Raises:
        FedbiomedTorchDataManagerError: If the argument `dataset` is not an instance of `torch.utils.data.Dataset`
    """

    # TorchDataManager should get `dataset` argument as an instance of torch.utils.data.Dataset
    if not isinstance(dataset, Dataset):
        raise FedbiomedTorchDataManagerError(
            f"{ErrorNumbers.FB608.value}: The attribute `dataset` should an instance "
            f"of `torch.utils.data.Dataset`, please use `Dataset` as parent class for"
            f"your custom torch dataset object")

    self._dataset = dataset
    self._loader_arguments = kwargs
    self._subset_test: Union[Subset, None] = None
    self._subset_train: Union[Subset, None] = None

Functions

dataset()

property

Gets dataset.

Returns:

Type	Description
`Dataset`	PyTorch dataset instance

Source code in fedbiomed/common/data/_torch_data_manager.py@property
def dataset(self) -> Dataset:
    """Gets dataset.

    Returns:
        PyTorch dataset instance
    """
    return self._dataset

load_all_samples()

Loading all samples as PyTorch DataLoader without splitting.

Returns:

Type	Description
`DataLoader`	Dataloader for entire datasets. `DataLoader` arguments will be retrieved from the `**kwargs` which is defined while initializing the class

Source code in fedbiomed/common/data/_torch_data_manager.pydef load_all_samples(self) -> DataLoader:
    """Loading all samples as PyTorch DataLoader without splitting.

    Returns:
        Dataloader for entire datasets. `DataLoader` arguments will be retrieved from the `**kwargs` which
            is defined while initializing the class
    """
    return self._create_torch_data_loader(self._dataset, **self._loader_arguments)

split(test_ratio)

Splitting PyTorch Dataset into train and validation.

Parameters:

Name	Type	Description	Default
`test_ratio`	`float`	Split ratio for validation set ratio. Rest of the samples will be used for training	required

Raises:

Type	Description
`FedbiomedTorchDataManagerError`	If the ratio is not in good format

Returns:

Name	Type	Description
`train_loader`	`Union[DataLoader, None]`	DataLoader for training subset. `None` if the `test_ratio` is `1`
`test_loader`	`Union[DataLoader, None]`	DataLoader for validation subset. `None` if the `test_ratio` is `0`

Source code in fedbiomed/common/data/_torch_data_manager.pydef split(self, test_ratio: float) -> Tuple[Union[DataLoader, None], Union[DataLoader, None]]:
    """ Splitting PyTorch Dataset into train and validation.

    Args:
         test_ratio: Split ratio for validation set ratio. Rest of the samples will be used for training
    Raises:
        FedbiomedTorchDataManagerError: If the ratio is not in good format

    Returns:
         train_loader: DataLoader for training subset. `None` if the `test_ratio` is `1`
         test_loader: DataLoader for validation subset. `None` if the `test_ratio` is `0`
    """

    # Check the argument `ratio` is of type `float`
    if not isinstance(test_ratio, (float, int)):
        raise FedbiomedTorchDataManagerError(f'{ErrorNumbers.FB608.value}: The argument `ratio` should be '
                                             f'type `float` or `int` not {type(test_ratio)}')

    # Check ratio is valid for splitting
    if test_ratio < 0 or test_ratio > 1:
        raise FedbiomedTorchDataManagerError(f'{ErrorNumbers.FB608.value}: The argument `ratio` should be '
                                             f'equal or between 0 and 1, not {test_ratio}')

    # If `Dataset` has proper data attribute
    # try to get shape from self.data
    if not hasattr(self._dataset, '__len__'):
        raise FedbiomedTorchDataManagerError(f"{ErrorNumbers.FB608.value}: Can not get number of samples from "
                                             f"{str(self._dataset)} without `__len__`.  Please make sure "
                                             f"that `__len__` method has been added to custom dataset. "
                                             f"This method should return total number of samples.")

    try:
        samples = len(self._dataset)
    except AttributeError as e:
        raise FedbiomedTorchDataManagerError(f"{ErrorNumbers.FB608.value}: Can not get number of samples from "
                                             f"{str(self._dataset)} due to undefined attribute, {str(e)}")
    except TypeError as e:
        raise FedbiomedTorchDataManagerError(f"{ErrorNumbers.FB608.value}: Can not get number of samples from "
                                             f"{str(self._dataset)}, {str(e)}")

    # Calculate number of samples for train and validation subsets
    test_samples = math.floor(samples * test_ratio)
    train_samples = samples - test_samples

    self._subset_train, self._subset_test = random_split(self._dataset, [train_samples, test_samples])

    loaders = (self._subset_loader(self._subset_train, **self._loader_arguments),
               self._subset_loader(self._subset_test, batch_size=len(self._subset_test)))

    return loaders

subset_test()

Gets validation subset of the dataset.

Returns:

Type	Description
`Subset`	Validation subset

Source code in fedbiomed/common/data/_torch_data_manager.pydef subset_test(self) -> Subset:
    """Gets validation subset of the dataset.

    Returns:
        Validation subset
    """

    return self._subset_test

subset_train()

Gets train subset of the dataset.

Returns:

Type	Description
`Subset`	Train subset

Source code in fedbiomed/common/data/_torch_data_manager.pydef subset_train(self) -> Subset:
    """Gets train subset of the dataset.

    Returns:
        Train subset
    """
    return self._subset_train

to_sklearn()

Converts PyTorch Dataset to sklearn data manager of Fed-BioMed.

Returns:

Type	Description
`SkLearnDataManager`	Data manager to use in SkLearn base training plans

Source code in fedbiomed/common/data/_torch_data_manager.pydef to_sklearn(self) -> SkLearnDataManager:
    """Converts PyTorch `Dataset` to sklearn data manager of Fed-BioMed.

    Returns:
        Data manager to use in SkLearn base training plans
    """

    loader = self._create_torch_data_loader(self._dataset, batch_size=len(self._dataset))
    # Iterate over samples and get input variable and target variable
    inputs = next(iter(loader))[0].numpy()
    target = next(iter(loader))[1].numpy()

    return SkLearnDataManager(inputs=inputs, target=target, **self._loader_arguments)

Functions

discover_flamby_datasets()

Automatically discover the available Flamby datasets based on the contents of the flamby.datasets module.

Returns:

Name	Type	Description
	`Dict[int, str]`	a dictionary {index: dataset_name} where index is an int and dataset_name is the name of a flamby module
	`Dict[int, str]`	corresponding to a dataset, represented as str. To import said module one must prepend with the correct
`path`	`Dict[int, str]`	`import flamby.datasets.dataset_name`.

Source code in fedbiomed/common/data/_flamby_dataset.pydef discover_flamby_datasets() -> Dict[int, str]:
    """Automatically discover the available Flamby datasets based on the contents of the flamby.datasets module.

    Returns:
        a dictionary {index: dataset_name} where index is an int and dataset_name is the name of a flamby module
        corresponding to a dataset, represented as str. To import said module one must prepend with the correct
        path: `import flamby.datasets.dataset_name`.

    """
    dataset_list = [name for _, name, ispkg in pkgutil.iter_modules(flamby_datasets_module.__path__) if ispkg]
    return {i: name for i, name in enumerate(dataset_list)}

fedbiomed.common.data

Classes

DataLoadingBlock

Functions

apply(args, kwargs) abstractmethod

deserialize(load_from)

get_serialization_id()

instantiate_class(loading_block) staticmethod

instantiate_key(key_module, key_classname, loading_block_key_str) staticmethod

serialize()

DataLoadingPlan

Attributes

desc instance-attribute

dlp_id instance-attribute

target_dataset_type instance-attribute

Functions

deserialize(serialized_dlp, serialized_loading_blocks)

infer_dataset_type(dataset) staticmethod

serialize()

DataLoadingPlanMixin

Functions

apply_dlb(default_ret_value, dlb_key, args, kwargs)

clear_dlp()

set_dlp(dlp)

DataManager

Functions

load(tp_type)

FlambyDataset

Functions

clear_dlp()

get_center_id()

get_dataset_type() staticmethod

get_flamby_fed_class()

get_transform()

init_transform(transform)

set_dlp(dlp)

shape()

FlambyDatasetMetadataBlock

Attributes

metadata instance-attribute

Functions

apply()

deserialize(load_from)

serialize()

FlambyLoadingBlockTypes

Attributes

FLAMBY_DATASET_METADATA class-attribute

MapperBlock

Attributes

map instance-attribute

Functions

apply(key)

deserialize(load_from)

serialize()

MedicalFolderBase

Attributes

default_modality_names class-attribute

Functions

available_subjects(subjects_from_index, subjects_from_folder=None)

complete_subjects(subjects, modalities)

demographics_column_names(path) staticmethod

get_dataset_type() staticmethod

is_modalities_existing(subject, modalities)

modalities()

modalities_candidates_from_subfolders()

read_demographics(path, index_col=None) staticmethod

root() writable property

subjects_with_imaging_data_folders()

validate_MedicalFolder_root_folder(path) staticmethod

MedicalFolderController

Functions

load_MedicalFolder(tabular_file=None, index_col=None)

subject_modality_status(index=None)

MedicalFolderDataset

Attributes

ALLOWED_EXTENSIONS class-attribute

Functions

demographics() cached property

get_nontransformed_item(item)

index_col() writable property

`apply(args, kwargs)`
`abstractmethod`

`deserialize(load_from)`

`get_serialization_id()`

`instantiate_class(loading_block)`
`staticmethod`

`instantiate_key(key_module, key_classname, loading_block_key_str)`
`staticmethod`

`serialize()`

desc `instance-attribute`

dlp_id `instance-attribute`

target_dataset_type `instance-attribute`

`deserialize(serialized_dlp, serialized_loading_blocks)`

`infer_dataset_type(dataset)`
`staticmethod`

`serialize()`

`apply_dlb(default_ret_value, dlb_key, args, kwargs)`

`clear_dlp()`

`set_dlp(dlp)`

`load(tp_type)`

`clear_dlp()`

`get_center_id()`

`get_dataset_type()`
`staticmethod`

`get_flamby_fed_class()`

`get_transform()`

`init_transform(transform)`

`set_dlp(dlp)`

`shape()`

metadata `instance-attribute`

`apply()`

`deserialize(load_from)`

`serialize()`

FLAMBY_DATASET_METADATA `class-attribute`

map `instance-attribute`

`apply(key)`

`deserialize(load_from)`

`serialize()`

default_modality_names `class-attribute`

`available_subjects(subjects_from_index, subjects_from_folder=None)`

`complete_subjects(subjects, modalities)`

`demographics_column_names(path)`
`staticmethod`

`get_dataset_type()`
`staticmethod`

`is_modalities_existing(subject, modalities)`

`modalities()`

`modalities_candidates_from_subfolders()`

`read_demographics(path, index_col=None)`
`staticmethod`

`root()`
`writable` `property`

`subjects_with_imaging_data_folders()`

`validate_MedicalFolder_root_folder(path)`
`staticmethod`

`load_MedicalFolder(tabular_file=None, index_col=None)`

`subject_modality_status(index=None)`

ALLOWED_EXTENSIONS `class-attribute`

`demographics()`
`cached` `property`

`get_nontransformed_item(item)`

`index_col()`
`writable` `property`

`load_images(subject_folder, modalities)`

`set_dataset_parameters(parameters)`

`shape()`

`subject_folders()`

`subjects_has_all_modalities()`
`property`

`subjects_registered_in_demographics()`
`cached` `property`

`tabular_file()`
`writable` `property`

MODALITIES_TO_FOLDERS `class-attribute`

`files()`

`labels()`

`batch_size()`

`dataset()`
`property`

`drop_last()`

`n_remainder_samples()`

`rng()`

`shuffle()`

`target()`
`property`

`dlb_default_scheme()`
`classmethod`

`dlp_default_scheme()`
`classmethod`

`update_validation_scheme(new_scheme)`

`validate(dlb_metadata, exception_type, only_required=True)`

`dataset()`

`split(test_ratio)`

`subset_test()`

`subset_train()`

inputs `instance-attribute`

target `instance-attribute`

`get_dataset_type()`
`staticmethod`

`dataset()`
`property`