使用Python SDK V2的File-Like接口读取OSS对象-对象存储-阿里云

本文介绍如何使用Python SDK V2新增的File-Like接口访问存储空间的对象。

注意事项

本文示例代码以华东1（杭州）的地域IDcn-hangzhou为例，默认使用外网Endpoint，如果您希望通过与OSS同地域的其他阿里云产品访问OSS，请使用内网Endpoint。关于OSS支持的Region与Endpoint的对应关系，请参见OSS地域和访问域名。
本文以从环境变量读取访问凭证为例。如何配置访问凭证，请参见配置访问凭证。
要进行文件下载，您必须有oss:GetObject权限。具体操作，请参见为RAM用户授予自定义的权限策略。

方法定义

Python SDK V2新增了File-Like接口，以只读文件（ReadOnlyFile）的方式访问存储空间的对象。

提供了单流和并发+预取两种模式，您可以根据场景需要调整并发数，以提升读取速度。
接口内部实现了连接断掉重连的机制，在一些比较复杂的网络环境下，具备更好的鲁棒性。

class ReadOnlyFile:
    ...


def open_file(self, bucket: str, key: str, version_id: Optional[str] = None, request_payer: Optional[str] = None, **kwargs) -> ReadOnlyFile:
    ...

请求参数列表

参数名	类型	说明
bucket	str	设置存储空间名字
key	str	设置对象名
version_id	str	指定对象的版本号，多版本下有效
request_payer	str	启用了请求者付费模式时，需要设置为'requester'
**kwargs	Any	(可选)任意参数，类型为字典

其中，kwargs选项说明列举如下：

参数名	类型	说明
enable_prefetch	bool	是否启用预取模式，默认不启用
prefetch_num	int	预取块的数量，默认值为3。启用预取模式时有效
chunk_size	int	每个预取块的大小，默认值为6MiB。启用预取模式时有效
prefetch_threshold	int	持续顺序读取多少字节后进入到预取模式，默认值为20MiB。启用预取模式时有效
block_size	int	块的大小，默认值为None

返回值列表

返回值名	类型	说明
file	ReadOnlyFile	只读文件的实例

其中，ReadOnlyFile接口的常用方法列举如下：

方法名	说明
close(self)	关闭文件句柄，释放资源，例如内存，活动的socket 等
read(self, n=None)	从数据源中读取长度为len(p)的字节，存储到p中，返回读取的字节数和遇到的错误
seek(self, pos, whence=0)	用于设置下一次读或写的偏移量。其中whence的取值：0：相对于头部，1：相对于当前偏移量，2：相对于尾部
Stat() (os.FileInfo, error)	获取对象的信息，包括对象大小，最后修改时间以及元信息

重要

注意：当预取模式打开时，如果出现多次乱序读时，则会自动退回单流模式。

示例代码

以单流模式读取整个对象

import argparse
import alibabacloud_oss_v2 as oss

# 创建命令行参数解析器，用于接收用户输入的参数
parser = argparse.ArgumentParser(description="open file sample")

# 添加命令行参数 --region，表示存储空间所在的地域，必填项
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# 添加命令行参数 --bucket，表示存储空间的名称，必填项
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# 添加命令行参数 --endpoint，表示其他服务访问 OSS 时使用的域名，可选项
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# 添加命令行参数 --key，表示对象的名称（文件路径），必填项
parser.add_argument('--key', help='The name of the object.', required=True)


def main():
    # 解析命令行参数
    args = parser.parse_args()

    # 从环境变量中加载凭证信息（AccessKeyId 和 AccessKeySecret）
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # 加载 SDK 的默认配置
    cfg = oss.config.load_default()

    # 设置凭证提供者
    cfg.credentials_provider = credentials_provider

    # 设置存储空间所在的地域
    cfg.region = args.region

    # 如果用户提供了自定义的 endpoint，则设置到配置中
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # 使用配置对象初始化 OSS 客户端
    client = oss.Client(cfg)

    # 调用 open_file 方法打开存储空间中的文件对象
    result = client.open_file(
        bucket=args.bucket,          # 指定目标存储空间的名称
        key=args.key,                # 指定目标对象的名称（文件路径）
    )

    # 打印文件内容，读取文件数据并解码为字符串格式
    print(f'content: {result.read().decode()}')

    # 关闭文件对象，释放资源
    result.close()


if __name__ == "__main__":
    # 程序入口，调用 main 函数执行逻辑
    main()

启用预取模式读取整个对象

import argparse
import alibabacloud_oss_v2 as oss

# 创建命令行参数解析器，用于接收用户输入的参数
parser = argparse.ArgumentParser(description="open file sample")

# 添加命令行参数 --region，表示存储空间所在的地域，必填项
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# 添加命令行参数 --bucket，表示存储空间的名称，必填项
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# 添加命令行参数 --endpoint，表示其他服务访问 OSS 时使用的域名，可选项
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# 添加命令行参数 --key，表示对象的名称（文件路径），必填项
parser.add_argument('--key', help='The name of the object.', required=True)


def main():
    # 解析命令行参数
    args = parser.parse_args()

    # 从环境变量中加载凭证信息（AccessKeyId 和 AccessKeySecret）
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # 加载 SDK 的默认配置
    cfg = oss.config.load_default()

    # 设置凭证提供者
    cfg.credentials_provider = credentials_provider

    # 设置存储空间所在的地域
    cfg.region = args.region

    # 如果用户提供了自定义的 endpoint，则设置到配置中
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # 使用配置对象初始化 OSS 客户端
    client = oss.Client(cfg)

    # 调用 open_file 方法打开存储空间中的文件对象
    result = client.open_file(
        bucket=args.bucket,          # 指定目标存储空间的名称
        key=args.key,                # 指定目标对象的名称（文件路径）
        enable_prefetch=True,        # 是否启用预取功能，默认为 True
   )

    # 打印文件内容，读取文件数据并解码为字符串格式
    print(f'content: {result.read().decode()}')

    # 关闭文件对象，释放资源
    result.close()


if __name__ == "__main__":
    # 程序入口，调用 main 函数执行逻辑
    main()

通过Seek方法从指定位置开始读取剩余的数据

import argparse
import os
import io
import alibabacloud_oss_v2 as oss

# 创建命令行参数解析器，用于接收用户输入的参数
parser = argparse.ArgumentParser(description="open file sample")

# 添加命令行参数 --region，表示存储空间所在的地域，必填项
parser.add_argument('--region', help='The region in which the bucket is located.', required=True)

# 添加命令行参数 --bucket，表示存储空间的名称，必填项
parser.add_argument('--bucket', help='The name of the bucket.', required=True)

# 添加命令行参数 --endpoint，表示其他服务访问 OSS 时使用的域名，可选项
parser.add_argument('--endpoint', help='The domain names that other services can use to access OSS')

# 添加命令行参数 --key，表示对象的名称（文件路径），必填项
parser.add_argument('--key', help='The name of the object.', required=True)


def main():
    # 解析命令行参数
    args = parser.parse_args()

    # 从环境变量中加载凭证信息（AccessKeyId 和 AccessKeySecret）
    credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()

    # 加载 SDK 的默认配置
    cfg = oss.config.load_default()

    # 设置凭证提供者
    cfg.credentials_provider = credentials_provider

    # 设置存储空间所在的地域
    cfg.region = args.region

    # 如果用户提供了自定义的 endpoint，则设置到配置中
    if args.endpoint is not None:
        cfg.endpoint = args.endpoint

    # 使用配置对象初始化 OSS 客户端
    client = oss.Client(cfg)

    # 初始化一个只读文件对象
    rf: oss.ReadOnlyFile = None

    # 使用 with 语句打开 OSS 文件对象，确保文件操作完成后自动关闭资源
    with client.open_file(args.bucket, args.key) as f:
        rf = f  # 将文件对象赋值给 rf 变量

        # 移动文件指针到指定位置（偏移量为 1 字节，相对于文件开头）
        f.seek(1, os.SEEK_SET)

        # 将文件内容读取到内存中的字节流（BytesIO）中
        copied_stream = io.BytesIO(rf.read())

        # 打印写入字节流的数据长度
        print(f'written: {len(copied_stream.getvalue())}')

        # 打印读取到的内容（字节流会被解码为字符串格式）
        print(f'read: {copied_stream.getvalue()}')


if __name__ == "__main__":
    # 程序入口，调用 main 函数执行逻辑
    main()

类文件只读（Python SDK V2）

注意事项

方法定义

请求参数列表

返回值列表

示例代码

以单流模式读取整个对象

启用预取模式读取整个对象

通过Seek方法从指定位置开始读取剩余的数据

相关文档