如何通过Python UDF引用文件资源_云原生大数据计算服务 MaxCompute(MaxCompute)-阿里云帮助中心

本文以在MaxCompute客户端操作为例，为您介绍如何通过Python UDF引用文件资源。

前提条件

请确认您已完成如下操作：

已安装并配置MaxCompute客户端。
更多安装并配置MaxCompute客户端信息，请参见安装并配置MaxCompute客户端。
已将待引用的文件添加为MaxCompute项目中的资源。
本文已添加的文件资源示例为test_distcache.txt，包含的数据如下。
```
1 a
2 b
3 c
4 d
```
更多添加资源操作，请参见添加资源。

代码开发和使用步骤

1. 代码开发

Python UDF代码如下，实现从引用的文件资源（例如test_distcache.txt）中返回满足要求的数据。

from odps.udf import annotate
from odps.distcache import get_cache_file
@annotate('bigint->string')
class DistCacheExample(object):
    def __init__(self):
        cache_file = get_cache_file('test_distcache.txt')
        kv = {}
        for line in cache_file:
            line = line.strip()
            if not line:
                continue
            k, v = line.split()
            kv[int(k)] = v
        cache_file.close()
        self.kv = kv
    def evaluate(self, arg):
        return self.kv.get(arg)

将上述代码示例保存为PY脚本文件（例如file.py），并放置在MaxCompute客户端的bin目录中。

2. 上传资源和注册函数

完成UDF代码开发和调试之后，在MaxCompute客户端中将资源上传至MaxCompute并注册函数。

执行如下命令，将PY脚本文件上传为MaxCompute资源。
```
add py file.py;
```
返回结果如下。
```
OK: Resource 'file.py' have been created.
```
更多添加资源命令信息，请参见添加资源。
执行如下命令，注册Python UDF，即注册函数。
```
create function file_udf as 'file.DistCacheExample' using 'file.py, test_distcache.txt';
```
其中：
- file_udf表示注册的Python UDF名称，即后续在SQL语句中调用的自定义函数名称。
- file.DistCacheExample中，file表示file.py脚本文件的名称，DistCacheExample为file.py脚本文件中定义的类。
返回结果如下。
```
Success: Function 'file_udf' have been created.
```
更多注册函数信息，请参见注册函数。

3. 使用示例

成功注册UDF后，执行以下命令，构造测试数据并调用注册的函数。

--创建测试表。
create table file_table (arg bigint);
--插入数据。
insert into file_table values (1), (4), (15), (123), (7995);
--在SQL语句中调用新注册的函数，返回文件资源中满足要求的数据。
select file_udf(arg) from file_table;

返回结果如下。

+-----+
| _c0 |
+-----+
| a   |
| d   |
| NULL |
| NULL |
| NULL |
+-----+