本文以ResNet50的图片分类模型训练为例,为您介绍KSpeed在CV领域加速图片数据的加载实践。ResNet50模型是基于NVIDIA官方开源代码DeepLearningExamples中的实现。使用KSpeed需要在原来的代码上做一点改动,改动的地方可以通过git patch的方式适配到ResNet50模型中,改动细节在文末接入KSpeed关键模块说明进行了简要说明。
代码准备
代码base库:
https://github.com/NVIDIA/DeepLearningExamples/commit/174b3d40bfc26f2adcf252676d38d6d5ffa7cbdc
git clone https://github.com/NVIDIA/DeepLearningExamples.git
cd DeepLearningExamples
git checkout master
git reset --hard 174b3d40bfc26f2adcf252676d38d6d5ffa7cbdc
接入KSpeed代码
#保持在DeepLearningExamples目录下
wget http://kspeed-release.oss-cn-beijing.aliyuncs.com/kspeed_resnet50.patch
git apply kspeed_resnet50.patch
运行环境配置
启动训练容器命令如下:
docker run -it --gpus all --name=resnet50_kspeed_test --net=host --ipc host --device=/dev/infiniband/ --ulimit memlock=-1:-1 -v /{path-to-imagenet}:/{path-to-imagenet-in-docker} -v /{path-to-DeepLearningExamples}:/{path-to-DeepLearningExamples-in-docker} eflo-registry.cn-beijing.cr.aliyuncs.com/eflo/ngc-pytorch-kspeed-22.05-py38:v2.2.0
上述命令中
{path-to-imagenet}
表示物理机中imagenet数据集所在路径;{path-to-imagenet-in-docker}
表示用户将数据集映射到容器中的路径;{path-to-DeepLearningExamples}
表示物理机中模型训练代码所在路径;{path-to-DeepLearningExamples-in-docker}
表示模型训练代码映射到容器中的路径;
以上路径需要用户自己设置。
imagenet数据集目录结构如下所示:
imagenet
├── train
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ └── ......
│ ├── n01443537
│ └── ......
└── val
├── n01440764
│ ├── ILSVRC2012_val_00000293.JPEG
│ ├── ILSVRC2012_val_00002138.JPEG
│ └── ......
├── n01443537
└── ......
运行模型训练
#保持在DeepLearningExamples目录下
cd ./PyTorch/Classification/ConvNets
#单机八卡 baseline
bash ./resnet50v1.5/training/AMP/DGXA100_resnet50_AMP_multi.sh pytorch {path-to-imagenet-in-docker}
#单机八卡 kspeed
bash ./resnet50v1.5/training/AMP/DGXA100_resnet50_AMP_multi.sh kspeed {path-to-imagenet-in-docker}
#单机八卡 dali+kspeed
bash ./resnet50v1.5/training/AMP/DGXA100_resnet50_AMP_multi.sh dali-kspeed {path-to-imagenet-in-docker}
上述命令中
{path-to-imagenet-in-docker}
表示imagenet数据集在容器中的路径,需要与启动容器时设置的路径保持一致。
执行KSpeed测试前,需要确保已经部署好kspeed服务。
接入KSpeed关键模块说明
增加kspeeddataloader模块文件
新增文件DeepLearningExamples/PyTorch/Classification/ConvNets/image_classification/kspeeddataloader.py
,主要实现了包括基于KSpeed的Pytorch Dataloader和基于KSpeed的Dali Dataloader。
基于KSpeed的Pytorch Dataloader
实现基于KSpeed的Pytorch Dataloader,只需修改Dataset,然后结合Pytorch原生的Sampler和Dataloader即可。核心代码如下:
导入kspeeddataset模块
import kspeed.utils.data.kspeeddataset as KSpeedDataset
将
torchvison.datasets.ImageFolder
替换为KSpeedDataset.KSpeedImageFolder
,从而可以使用KSpeed数据加载加速能力
train_dataset = KSpeedDataset.KSpeedImageFolder(
traindir, None, workers, kspeed_iplist,
"admin", "admin", transforms.Compose(transforms_list),
)
val_dataset = KSpeedDataset.KSpeedImageFolder(
valdir, None, workers, kspeed_iplist,
"admin", "admin",
transforms.Compose(
[
transforms.Resize(
image_size + crop_padding, interpolation=interpolation
),
transforms.CenterCrop(image_size),
]
),
)
实现
get_kspeed_train_loader
和get_kspeed_val_loader
方法,详见kspeeddataloader.py
16~72行和74~128行
基于KSpeed的Dali Dataloader
实现基于KSpeed的Dali Dataloader,只需修改Dali pipeline的输入数据源为一个外部数据源KSpeedCallable即可。核心代码如下:
KSpeedCallable
KSpeedCallable对象继承KSpeedDataset.KSpeedFolder
,在kspeeddataloader.py
164~179行中176行,通过self.dataset.getBIN(path)
读取imagenet数据集样本。
def __call__(self, sample_info):
if self.dataset is None:
self.load()
if sample_info.iteration >= self.full_iters:
raise StopIteration()
if self.last_seen_epoch != sample_info.epoch_idx:
self.last_seen_epoch = sample_info.epoch_idx
self.perm = np.random.default_rng(seed=42 + sample_info.epoch_idx).permutation(len(self.files))
idx = self.perm[sample_info.idx_in_epoch + self.shard_offset]
path = os.path.join(self.root, self.files[idx])
dout = self.dataset.getBIN(path)
sample = np.frombuffer(dout, dtype=np.uint8)
label = np.int32([self.labels[idx]])
return sample, label
基于KSpeedCallable的Dali Pipeline
在kspeeddataloader.py
223~229行中,使用KSpeedCallable作为Dali Pipeline的外部数据源获取数据集样本。
if kspeed:
images, labels = fn.external_source(source=kscallable,
num_outputs=2,
batch=False,
parallel=True,
dtype=[types.UINT8, types.INT32],
device='cpu')
增加DATA_BACKEND_CHOICES选项
在DeepLearningExamples/PyTorch/Classification/ConvNets/image_classification/dataloaders.py
40行,将原来的DATA_BACKEND_CHOICES = ["pytorch", "syntetic"]
, 修改如下:
DATA_BACKEND_CHOICES = ["pytorch", "syntetic", "kspeed", "dali-kspeed", "dali"]
增加args.data_backend选项
在文件DeepLearningExamples/PyTorch/Classification/ConvNets/main.py
中512~520行,将如下代码添加到args.data_backend的分支当中:
elif args.data_backend == "kspeed":
get_train_loader = get_kspeed_train_loader
get_val_loader = get_kspeed_val_loader
elif args.data_backend == "dali":
get_train_loader = get_dali_kspeed_train_loader(dali_cpu=True, kspeed=False)
get_val_loader = get_dali_kspeed_val_loader(dali_cpu=True, kspeed=False)
elif args.data_backend == "dali-kspeed":
get_train_loader = get_dali_kspeed_train_loader(dali_cpu=True)
get_val_loader = get_dali_kspeed_val_loader(dali_cpu=True)