PAI-EasyVision提供强大的图像特征提取能力,支持多机分布式运行。您通过PAI-EasyVision不仅能够从OSS读取图片,并将图像特征提取结果写回OSS文件,而且能够通过读取表数据获取图片,并将图像特征提取结果写回表中。本文以OSS IO通路为例,为您介绍图片特征提取过程。

数据格式

请参见标注文件格式说明

图像特征提取

基于已有的数据文件列表,您可以通过PAI命令提取图像特征。具体示例如下。
pai -name ev_predict_ext
             -Dmodel_path='oss://pai-vision-data-sh/pretrained_models/saved_models/resnet_v1_50/'
             -Dmodel_type='feature_extractor'
             -Dinput_oss_file='oss://path/to/your/filelist.txt'
             -Doutput_oss_file='oss://path/to/your/result.txt'
             -Dimage_type='url'
             -Dfeature_name='resnet_v1_50/block4'
             -Dnum_worker=2
             -DcpuRequired=800
             -DgpuRequired=100
             -Dbuckets='oss://pai-vision-data-sh/'
             -Darn='your_role_arn'
             -DossHost='oss-cn-shanghai-internal.aliyuncs.com'

输出结果

结果文件的每行表示一张图片的特征提取结果,由文件路径和JSON字符串组成,示例如下。
oss://path/to/your/image1.jpg,  {"feature": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4583122730255127, 0.0]}
oss://path/to/your/image1.jpg,  {"feature": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4583122730255127, 0.0]}
oss://path/to/your/image1.jpg,  {"feature": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4583122730255127, 0.0]}
其中JSON字符串中仅有一个KV对,feature对应的value表示图像特征,是一个一维向量。

模型地址和模型输出

resnet_v1_50地址:oss://pai-vision-data-sh/pretrained_models/saved_models/resnet_v1_50。输出的模型如下。
resnet_v1_50/block1 shape: [None, 56, 56, 256] type: <dtype: 'float32'>
resnet_v1_50/block2 shape: [None, 28, 28, 512] type: <dtype: 'float32'>
resnet_v1_50/block3 shape: [None, 14, 14, 1024] type: <dtype: 'float32'>
resnet_v1_50/block4 shape: [None, 7, 7, 2048] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 2048] type: <dtype: 'float32'>
resnet_v1_50/logits shape: [None, 1, 1, 1000] type: <dtype: 'float32'>
predictions shape: [None] type: <dtype: 'int32'>
class shape: [None] type: <dtype: 'int32'>
preprocessed_images shape: [None, 224, 224, 3] type: <dtype: 'float32'>
resnet_v1_50/conv1 shape: [None, 112, 112, 64] type: <dtype: 'float32'>
logits shape: [None, 1000] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
resnet_v1_50/spatial_squeeze shape: [None, 1000] type: <dtype: 'float32'>
resnet_v1_101地址:oss://pai-vision-data-sh/pretrained_models/saved_models/resnet_v1_101。输出的模型如下。
resnet_v1_101/block4 shape: [None, 7, 7, 2048] type: <dtype: 'float32'>
resnet_v1_101/logits shape: [None, 1, 1, 1000] type: <dtype: 'float32'>
resnet_v1_101/block2 shape: [None, 28, 28, 512] type: <dtype: 'float32'>
resnet_v1_101/conv1 shape: [None, 112, 112, 64] type: <dtype: 'float32'>
resnet_v1_101/block1 shape: [None, 56, 56, 256] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
resnet_v1_101/spatial_squeeze shape: [None, 1000] type: <dtype: 'float32'>
predictions shape: [None] type: <dtype: 'int32'>
preprocessed_images shape: [None, 224, 224, 3] type: <dtype: 'float32'>
logits shape: [None, 1000] type: <dtype: 'float32'>
resnet_v1_101/block3 shape: [None, 14, 14, 1024] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 2048] type: <dtype: 'float32'>
inception_v3地址:oss://pai-vision-data-sh/pretrained_models/saved_models/inception_v3。输出的模型如下。
preprocessed_images shape: [None, 299, 299, 3] type: <dtype: 'float32'>
Conv2d_1a_3x3 shape: [None, 149, 149, 32] type: <dtype: 'float32'>
Conv2d_2a_3x3 shape: [None, 147, 147, 32] type: <dtype: 'float32'>
Conv2d_2b_3x3 shape: [None, 147, 147, 64] type: <dtype: 'float32'>
MaxPool_3a_3x3 shape: [None, 73, 73, 64] type: <dtype: 'float32'>
Conv2d_3b_1x1 shape: [None, 73, 73, 80] type: <dtype: 'float32'>
Conv2d_4a_3x3 shape: [None, 71, 71, 192] type: <dtype: 'float32'>
MaxPool_5a_3x3 shape: [None, 35, 35, 192] type: <dtype: 'float32'>
Mixed_5b shape: [None, 35, 35, 256] type: <dtype: 'float32'>
Mixed_5c shape: [None, 35, 35, 288] type: <dtype: 'float32'>
Mixed_5d shape: [None, 35, 35, 288] type: <dtype: 'float32'>
Mixed_6a shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_6b shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_6c shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_6d shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_6e shape: [None, 17, 17, 768] type: <dtype: 'float32'>
Mixed_7a shape: [None, 8, 8, 1280] type: <dtype: 'float32'>
Mixed_7b shape: [None, 8, 8, 2048] type: <dtype: 'float32'>
Mixed_7c shape: [None, 8, 8, 2048] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 2048] type: <dtype: 'float32'>
PreLogits shape: [None, 1, 1, 2048] type: <dtype: 'float32'>
Logits shape: [None, 1001] type: <dtype: 'float32'>
Predictions shape: [None, 1001] type: <dtype: 'float32'>
logits shape: [None, 1001] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
predictions shape: [None] type: <dtype: 'int32'>
original_image shape: [None, None, None, 3] type: <dtype: 'float32'>
original_image_shape shape: [None, 3] type: <dtype: 'int32'>
inception_v4地址:oss://pai-vision-data-sh/pretrained_models/saved_models/inception_v4。输出的模型如下。
preprocessed_images shape: [None, 299, 299, 3] type: <dtype: 'float32'>
Conv2d_1a_3x3 shape: [None, 149, 149, 32] type: <dtype: 'float32'>
Conv2d_2a_3x3 shape: [None, 147, 147, 32] type: <dtype: 'float32'>
Conv2d_2b_3x3 shape: [None, 147, 147, 64] type: <dtype: 'float32'>
Mixed_3a shape: [None, 73, 73, 160] type: <dtype: 'float32'>
Mixed_4a shape: [None, 71, 71, 192] type: <dtype: 'float32'>
Mixed_5a shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_5b shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_5c shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_5d shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_5e shape: [None, 35, 35, 384] type: <dtype: 'float32'>
Mixed_6a shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6b shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6c shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6d shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6e shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6f shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6g shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_6h shape: [None, 17, 17, 1024] type: <dtype: 'float32'>
Mixed_7a shape: [None, 8, 8, 1536] type: <dtype: 'float32'>
Mixed_7b shape: [None, 8, 8, 1536] type: <dtype: 'float32'>
Mixed_7c shape: [None, 8, 8, 1536] type: <dtype: 'float32'>
Mixed_7d shape: [None, 8, 8, 1536] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 1536] type: <dtype: 'float32'>
PreLogitsFlatten shape: [None, 1536] type: <dtype: 'float32'>
Logits shape: [None, 1001] type: <dtype: 'float32'>
Predictions shape: [None, 1001] type: <dtype: 'float32'>
logits shape: [None, 1001] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
predictions shape: [None] type: <dtype: 'int32'>
original_image shape: [None, None, None, 3] type: <dtype: 'float32'>
original_image_shape shape: [None, 3] type: <dtype: 'int32'>
mobilenet_v2地址:oss://pai-vision-data-sh/pretrained_models/saved_models/mobilenet_v2_1.0_224。输出的模型如下。
preprocessed_images shape: [None, 224, 224, 3] type: <dtype: 'float32'>
layer_1 shape: [None, 112, 112, 32] type: <dtype: 'float32'>
layer_2 shape: [None, 112, 112, 16] type: <dtype: 'float32'>
layer_3 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
layer_4 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
layer_5 shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_6 shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_7 shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_8 shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_9 shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_10 shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_11 shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_12 shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_13 shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_14 shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_15 shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_16 shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_17 shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_18 shape: [None, 7, 7, 320] type: <dtype: 'float32'>
layer_19 shape: [None, 7, 7, 1280] type: <dtype: 'float32'>
layer_2/depthwise_output shape: [None, 112, 112, 32] type: <dtype: 'float32'>
layer_2/output shape: [None, 112, 112, 16] type: <dtype: 'float32'>
layer_3/expansion_output shape: [None, 112, 112, 96] type: <dtype: 'float32'>
layer_3/depthwise_output shape: [None, 56, 56, 96] type: <dtype: 'float32'>
layer_3/output shape: [None, 56, 56, 24] type: <dtype: 'float32'>
layer_4/expansion_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
layer_4/depthwise_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
layer_4/output shape: [None, 56, 56, 24] type: <dtype: 'float32'>
layer_5/expansion_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
layer_5/depthwise_output shape: [None, 28, 28, 144] type: <dtype: 'float32'>
layer_5/output shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_6/expansion_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_6/depthwise_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_6/output shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_7/expansion_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_7/depthwise_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_7/output shape: [None, 28, 28, 32] type: <dtype: 'float32'>
layer_8/expansion_output shape: [None, 28, 28, 192] type: <dtype: 'float32'>
layer_8/depthwise_output shape: [None, 14, 14, 192] type: <dtype: 'float32'>
layer_8/output shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_9/expansion_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_9/depthwise_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_9/output shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_10/expansion_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_10/depthwise_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_10/output shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_11/expansion_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_11/depthwise_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_11/output shape: [None, 14, 14, 64] type: <dtype: 'float32'>
layer_12/expansion_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_12/depthwise_output shape: [None, 14, 14, 384] type: <dtype: 'float32'>
layer_12/output shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_13/expansion_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_13/depthwise_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_13/output shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_14/expansion_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_14/depthwise_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_14/output shape: [None, 14, 14, 96] type: <dtype: 'float32'>
layer_15/expansion_output shape: [None, 14, 14, 576] type: <dtype: 'float32'>
layer_15/depthwise_output shape: [None, 7, 7, 576] type: <dtype: 'float32'>
layer_15/output shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_16/expansion_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_16/depthwise_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_16/output shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_17/expansion_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_17/depthwise_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_17/output shape: [None, 7, 7, 160] type: <dtype: 'float32'>
layer_18/expansion_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_18/depthwise_output shape: [None, 7, 7, 960] type: <dtype: 'float32'>
layer_18/output shape: [None, 7, 7, 320] type: <dtype: 'float32'>
AvgPool_1a shape: [None, 1, 1, 1280] type: <dtype: 'float32'>
Logits shape: [None, 1001] type: <dtype: 'float32'>
Predictions shape: [None, 1001] type: <dtype: 'float32'>
logits shape: [None, 1001] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
predictions shape: [None] type: <dtype: 'int32'>
original_image shape: [None, None, None, 3] type: <dtype: 'float32'>
original_image_shape shape: [None, 3] type: <dtype: 'int32'>
efficientnet_b0地址:oss://pai-vision-data-sh/pretrained_models/saved_models/efficientnet-b0。输出的模型如下。
stem shape: [None, 112, 112, 32] type: <dtype: 'float32'>
block_0/expansion_output shape: [None, 112, 112, 32] type: <dtype: 'float32'>
block_0 shape: [None, 112, 112, 16] type: <dtype: 'float32'>
reduction_1/expansion_output shape: [None, 112, 112, 32] type: <dtype: 'float32'>
reduction_1 shape: [None, 112, 112, 16] type: <dtype: 'float32'>
block_1/expansion_output shape: [None, 56, 56, 96] type: <dtype: 'float32'>
block_1 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
block_2/expansion_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
block_2 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
reduction_2/expansion_output shape: [None, 56, 56, 144] type: <dtype: 'float32'>
reduction_2 shape: [None, 56, 56, 24] type: <dtype: 'float32'>
block_3/expansion_output shape: [None, 28, 28, 144] type: <dtype: 'float32'>
block_3 shape: [None, 28, 28, 40] type: <dtype: 'float32'>
block_4/expansion_output shape: [None, 28, 28, 240] type: <dtype: 'float32'>
block_4 shape: [None, 28, 28, 40] type: <dtype: 'float32'>
reduction_3/expansion_output shape: [None, 28, 28, 240] type: <dtype: 'float32'>
reduction_3 shape: [None, 28, 28, 40] type: <dtype: 'float32'>
block_5/expansion_output shape: [None, 14, 14, 240] type: <dtype: 'float32'>
block_5 shape: [None, 14, 14, 80] type: <dtype: 'float32'>
block_6/expansion_output shape: [None, 14, 14, 480] type: <dtype: 'float32'>
block_6 shape: [None, 14, 14, 80] type: <dtype: 'float32'>
block_7/expansion_output shape: [None, 14, 14, 480] type: <dtype: 'float32'>
block_7 shape: [None, 14, 14, 80] type: <dtype: 'float32'>
block_8/expansion_output shape: [None, 14, 14, 480] type: <dtype: 'float32'>
block_8 shape: [None, 14, 14, 112] type: <dtype: 'float32'>
block_9/expansion_output shape: [None, 14, 14, 672] type: <dtype: 'float32'>
block_9 shape: [None, 14, 14, 112] type: <dtype: 'float32'>
block_10/expansion_output shape: [None, 14, 14, 672] type: <dtype: 'float32'>
block_10 shape: [None, 14, 14, 112] type: <dtype: 'float32'>
reduction_4/expansion_output shape: [None, 14, 14, 672] type: <dtype: 'float32'>
reduction_4 shape: [None, 14, 14, 112] type: <dtype: 'float32'>
block_11/expansion_output shape: [None, 7, 7, 672] type: <dtype: 'float32'>
block_11 shape: [None, 7, 7, 192] type: <dtype: 'float32'>
block_12/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
block_12 shape: [None, 7, 7, 192] type: <dtype: 'float32'>
block_13/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
block_13 shape: [None, 7, 7, 192] type: <dtype: 'float32'>
block_14/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
block_14 shape: [None, 7, 7, 192] type: <dtype: 'float32'>
block_15/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
block_15 shape: [None, 7, 7, 320] type: <dtype: 'float32'>
reduction_5/expansion_output shape: [None, 7, 7, 1152] type: <dtype: 'float32'>
reduction_5 shape: [None, 7, 7, 320] type: <dtype: 'float32'>
features shape: [None, 7, 7, 320] type: <dtype: 'float32'>
head_1x1 shape: [None, 7, 7, 1280] type: <dtype: 'float32'>
pooled_features shape: [None, 1280] type: <dtype: 'float32'>
global_pool shape: [None, 1280] type: <dtype: 'float32'>
class shape: [None] type: <dtype: 'int32'>
head shape: [None, 1000] type: <dtype: 'float32'>
logits shape: [None, 1000] type: <dtype: 'float32'>
probs shape: [None, 1001] type: <dtype: 'float32'>
predictions shape: [None] type: <dtype: 'int32'>