Use MaxCompute Notebook

更新时间:
复制 MD 格式

MaxCompute Notebook is a fully managed, interactive module for data analytics and data mining. It provides a web-based interactive development environment for data engineers, data analysts, and data scientists. You can use SQL, PyODPS, and Python to analyze and explore data, gain insights from data, and develop integrated big data and AI applications. This topic describes how to use the Notebook feature.

Version guide

The MaxCompute Notebook feature is in public preview. Each tenant can start a maximum of five Notebook instances. Each instance provides 2 CU of free computing resources for development.

MaxCompute Notebook is available in the following regions: China (Hangzhou), China (Beijing), China (Shanghai), China (Shenzhen), and China (Ulanqab).

Note

If you have questions about using MaxCompute Notebook, you can search for and join the DingTalk group 29455027568 for support.

Prerequisites

  • A MaxCompute project is created. For more information, see Create a MaxCompute project.

  • A network connection is created between MaxCompute and a VPC. This allows the Notebook instance to access data in MaxCompute. For more information, see Create a network connection.

Notes

  • The network connection, file system, and MaxCompute project must be in the same region as the Notebook instance to ensure network connectivity.

  • Exercise caution when you delete a network connection. Before you delete a connection, make sure that it is not being used by any Notebook instance. Otherwise, the instance will fail to start.

Benefits

MaxCompute Notebook is highly optimized and based on the open source JupyterLab. It combines the powerful data processing capabilities of MaxCompute to help you perform data analytics, data mining, and data exploration in a single location.

  • Support for multiple engines

    MaxCompute Notebook supports multiple Python development methods, such as PyODPS and MaxFrame. You can quickly start data analytics and data mining without changing your existing development methods.

  • Deep integration with MaxCompute

    You can quickly create instances based on your existing MaxCompute computing resource pools. You do not need to perform complex configurations to start a MaxCompute Notebook instance.

  • Built-in function libraries

    MaxCompute Notebook includes many built-in extension libraries for data analytics, data mining, and visualization, such as pandas, numpy, pyplot, pyecharts, and matplotlib. This saves you the time required to prepare a development environment and meets your daily needs for data mining and visual analytics.

  • Security

    MaxCompute Notebook uses bearer tokens for user authentication. When you connect to a MaxCompute cluster through MaxCompute Notebook, you do not need to configure AccessKey pairs. This reduces the risk of AccessKey pair leaks.

Quick Start

Step 1: Create an instance template

You can create an instance template on the Notebook page. You can then reference this template when you create Notebook instances.

  1. Go to the Notebook list page. On the Instance Template tab, click Create Instance Template.

  2. In the Create Instance Template dialog box, configure the following parameters.

    Parameter

    Description

    Instance Template Name

    The name of the Notebook instance template to be created.

    Description

    A description of the instance template.

    Automatic Release Settings

    The time when the instance is automatically released.

    • No: The instance is not automatically released.

    • Yes: The instance is automatically released after a specified number of hours.

    Select Compute Engine

    The MaxFrame software development kit (SDK) is built-in and ready to use.

  3. Click OK to create the instance template.

Step 2: Create a Notebook instance

  1. Log on to the MaxCompute console. In the navigation pane, choose Workspace > Notebook to go to the Notebook list page.

  2. On the Instance Management tab, click Create Instance and configure the parameters.

    Parameter Name

    Description

    Instance Name

    The name of the Notebook instance to be created.

    Description

    A description of the instance.

    Associated Project

    This lets you perform subsequent data operations on associated projects without an AK/SK.

    Instance Creation Method

    You can create an instance from scratch or from an instance template.

    Select Instance Template

    This parameter is required when you set Instance Creation Method to From Instance Template. Select an existing instance template or create one. For more information, see Step 1: Create an instance template.

    Automatic Release Settings

    This parameter is required when you set Instance Creation Method to From Scratch.

    • No: The instance is not automatically released.

    • Yes: Specify the time after which the instance is automatically released.

    Computing Resources

    Select a quota group.

    Note

    Currently, only pay-as-you-go quota groups are supported.

    Storage Configuration

    Select an existing data storage. You can mount a user-created NAS file system for script file persistence.

    You can also click Create Data Storage to create a new one. For more information, see Appendix: Create a data storage.

    Compute Engine

    The MaxFrame SDK is built-in and ready to use.

    Is this instance shared?

    • Visible to Tenant: The instance is visible to all users within the tenant.

    • Visible to Me Only: The instance is visible only to you and the administrator.

  3. Click OK. The instance is ready to use when its Status changes to Running.

    Note

    After the instance is created, you can also click image > Auto-release Settings in the Actions column of the target instance to change the auto-release configuration.

Step 3: Develop in Notebook

To help you quickly get started with development, MaxCompute Notebook provides a demo script. The script is based on MaxFrame and shows how to perform distributed processing with Pandas. It covers data preparation, data analytics, data exploration, and distributed data processing. You can click product_sales_demo_nb.ipynb to download the script.

  1. Go to the Notebook instance page and upload the demo script.

    In the Actions column of the target instance, click Enter. On the instance page, click the image icon on the left to upload the demo script.image

  2. Enter the project information, execute the script file, and generate the visualization.

    1. In the pane on the left, double-click the product_sales_demo_nb.ipynb script to open it. Follow the instructions in the script to set the PROJECT_NAME. You can use the project associated with the Notebook instance or another project.

      • Use the associated project:

        In the script file, find the Create ODPS object code block in the 2.2 Prepare data section. Replace project=PROJECT_NAME with project=os.getenv('ODPS_PROJECT_NAME'). In this case, you do not need to specify PROJECT_NAME in the 2.1 Prepare project section.

      • Use another project:

        Set PROJECT_NAME to the name of the MaxCompute project that you want to use for computation.

    2. At the top of the script file, click the image icon to run all code blocks. In the Restart Kernel? dialog box, click Restart. The script is then executed. The execution is successful when the circle icon in the upper-right corner of the page changes to an empty circle image and no errors are reported in the code.

    3. View the chart generated by Matplotlib to visualize the data mining and analysis results.

Step 4 (Optional): Release the Notebook instance

After development is complete, go to the Instance Management tab on the Notebook page. In the Actions column of the target instance, click Stop and then click Delete to release the Notebook instance.

Appendix: Create a data storage

  1. On the Create Data Storage panel, configure the following parameters.

    Parameter name

    Description

    Data Storage Name

    You can specify a custom name.

    Select Data Storage

    Alibaba Cloud File Storage (General-purpose NAS file system).

    Select File System

    Select an existing file system.

    You can also click Create to create a General-purpose NAS file system in the File Storage console. For more information about how to create a file system, see Create a file system.

    Important

    When you create the file system, the VPC and VSwitch must be the same as those used for the network connection.

    File System Mount Target

    The address of the file system mount target. For more information about how to obtain the address, see Manage mount targets.

    Select Security Group

    Use the same security group as the one used for the network connection.

    File System Path

    The existing storage path in NAS. For example, /.

    Default Mount Path

    The mount path for the file system. For example, /mnt/data.

  2. Click OK.