dataset module
Dataset module.
This model is intended to provide the specifications for the loading, preprocessing splitting and sequencing of the available input data.
- class dataset.Bearingset(data_path: str, experiments: List[str], is_target: bool = False, signals: List[str] = ['x', 'y'], scale: bool = False, scaler_path: str = './scaler.joblib', sequence_length=10)
Bases:
Dataset
Bearingset.
This class inherits from the PyTorch Dataset class and represents a dataset of bearing sensor data and health index (HI) values.
If the dataset is source, then standardize the data and save the scaler (if scaling is enabled). Otherwise, load the scaler from the saved dump and scale the target data.
__init__.
- Parameters:
data_path (str) – path of directory containing parquet files of sensor data and HI values for different experiments
experiments (List[str]) – list of names of the experiments to include in the dataset
is_target (bool) – True if dataset is used as target dataset for adaptation. Default is False
signals (List[str]) – signals to include in the dataset (‘x’, ‘y’)
scale (bool) – True to scale the sensor data using a standard scaler or not. Default is False.
scaler_path (str) – path of the standard scaler dump. If the dataset is target, scaler will be loaded. Otherwise, the scaler will be fitted and saved to this path. Default is ‘./scaler.joblib’
sequence_length (int) – sequence length for the LSTM model input. Default is 10
- get_dataloader(batch_size: int = 32, shuffle: bool = False) DataLoader
get_dataloader.
This method returns a PyTorch DataLoader object that can be used to iterate over the dataset in batches.
- Parameters:
batch_size (int) – size of the batches to return from the DataLoader. Default is 32
shuffle (bool) – True to shuffle the dataset before returning the batches or not. Shuffling is suggested for training. Default is False.
- Return type:
DataLoader
- Returns:
DataLoader object that can be used to iterate over the dataset in batches.