MaxCompute Notebook is a fully managed, interactive module for data analytics and data mining. It provides a web-based interactive development environment for data engineers, data analysts, and data scientists. You can use SQL, PyODPS, and Python to analyze and explore data, gain insights from data, and develop integrated big data and AI applications. This topic describes how to use the Notebook feature.
Version guide
The MaxCompute Notebook feature is in public preview. Each tenant can start a maximum of five Notebook instances. Each instance provides 2 CU of free computing resources for development.
MaxCompute Notebook is available in the following regions: China (Hangzhou), China (Beijing), China (Shanghai), China (Shenzhen), and China (Ulanqab).
If you have questions about using MaxCompute Notebook, you can search for and join the DingTalk group 29455027568 for support.
Prerequisites
A MaxCompute project is created. For more information, see Create a MaxCompute project.
A network connection is created between MaxCompute and a VPC. This allows the Notebook instance to access data in MaxCompute. For more information, see Create a network connection.
Notes
The network connection, file system, and MaxCompute project must be in the same region as the Notebook instance to ensure network connectivity.
Exercise caution when you delete a network connection. Before you delete a connection, make sure that it is not being used by any Notebook instance. Otherwise, the instance will fail to start.
Benefits
MaxCompute Notebook is highly optimized and based on the open source JupyterLab. It combines the powerful data processing capabilities of MaxCompute to help you perform data analytics, data mining, and data exploration in a single location.
-
Support for multiple engines
MaxCompute Notebook supports multiple Python development methods, such as PyODPS and MaxFrame. You can quickly start data analytics and data mining without changing your existing development methods.
-
Deep integration with MaxCompute
You can quickly create instances based on your existing MaxCompute computing resource pools. You do not need to perform complex configurations to start a MaxCompute Notebook instance.
-
Built-in function libraries
MaxCompute Notebook includes many built-in extension libraries for data analytics, data mining, and visualization, such as pandas, numpy, pyplot, pyecharts, and matplotlib. This saves you the time required to prepare a development environment and meets your daily needs for data mining and visual analytics.
-
Security
MaxCompute Notebook uses bearer tokens for user authentication. When you connect to a MaxCompute cluster through MaxCompute Notebook, you do not need to configure AccessKey pairs. This reduces the risk of AccessKey pair leaks.
Quick Start
Step 1: Create an instance template
You can create an instance template on the Notebook page. You can then reference this template when you create Notebook instances.
Go to the Notebook list page. On the Instance Template tab, click Create Instance Template.
In the Create Instance Template dialog box, configure the following parameters.
Parameter
Description
Instance Template Name
The name of the Notebook instance template to be created.
Description
A description of the instance template.
Automatic Release Settings
The time when the instance is automatically released.
No: The instance is not automatically released.
Yes: The instance is automatically released after a specified number of hours.
Select Compute Engine
The MaxFrame software development kit (SDK) is built-in and ready to use.
Click OK to create the instance template.
Step 2: Create a Notebook instance
Log on to the MaxCompute console. In the navigation pane, choose Workspace > Notebook to go to the Notebook list page.
On the Instance Management tab, click Create Instance and configure the parameters.
Parameter Name
Description
Instance Name
The name of the Notebook instance to be created.
Description
A description of the instance.
Associated Project
This lets you perform subsequent data operations on associated projects without an AK/SK.
Instance Creation Method
You can create an instance from scratch or from an instance template.
Select Instance Template
This parameter is required when you set Instance Creation Method to From Instance Template. Select an existing instance template or create one. For more information, see Step 1: Create an instance template.
Automatic Release Settings
This parameter is required when you set Instance Creation Method to From Scratch.
No: The instance is not automatically released.
Yes: Specify the time after which the instance is automatically released.
Computing Resources
Select a quota group.
NoteCurrently, only pay-as-you-go quota groups are supported.
Storage Configuration
Select an existing data storage. You can mount a user-created NAS file system for script file persistence.
You can also click Create Data Storage to create a new one. For more information, see Appendix: Create a data storage.
Compute Engine
The MaxFrame SDK is built-in and ready to use.
Is this instance shared?
Visible to Tenant: The instance is visible to all users within the tenant.
Visible to Me Only: The instance is visible only to you and the administrator.
Click OK. The instance is ready to use when its Status changes to Running.
NoteAfter the instance is created, you can also click
> Auto-release Settings in the Actions column of the target instance to change the auto-release configuration.
Step 3: Develop in Notebook
To help you quickly get started with development, MaxCompute Notebook provides a demo script. The script is based on MaxFrame and shows how to perform distributed processing with Pandas. It covers data preparation, data analytics, data exploration, and distributed data processing. You can click product_sales_demo_nb.ipynb to download the script.
-
Go to the Notebook instance page and upload the demo script.
In the Actions column of the target instance, click Enter. On the instance page, click the
icon on the left to upload the demo script.
-
Enter the project information, execute the script file, and generate the visualization.
-
In the pane on the left, double-click the product_sales_demo_nb.ipynb script to open it. Follow the instructions in the script to set the
PROJECT_NAME. You can use the project associated with the Notebook instance or another project.Use the associated project:
In the script file, find the Create ODPS object code block in the 2.2 Prepare data section. Replace
project=PROJECT_NAMEwithproject=os.getenv('ODPS_PROJECT_NAME'). In this case, you do not need to specifyPROJECT_NAMEin the 2.1 Prepare project section.Use another project:
Set
PROJECT_NAMEto the name of the MaxCompute project that you want to use for computation.
At the top of the script file, click the
icon to run all code blocks. In the Restart Kernel? dialog box, click Restart. The script is then executed. The execution is successful when the circle icon in the upper-right corner of the page changes to an empty circle
and no errors are reported in the code.View the chart generated by Matplotlib to visualize the data mining and analysis results.
-
Step 4 (Optional): Release the Notebook instance
After development is complete, go to the Instance Management tab on the Notebook page. In the Actions column of the target instance, click Stop and then click Delete to release the Notebook instance.
Appendix: Create a data storage
On the Create Data Storage panel, configure the following parameters.
Parameter name
Description
Data Storage Name
You can specify a custom name.
Select Data Storage
Alibaba Cloud File Storage (General-purpose NAS file system).
Select File System
Select an existing file system.
You can also click Create to create a General-purpose NAS file system in the File Storage console. For more information about how to create a file system, see Create a file system.
ImportantWhen you create the file system, the VPC and VSwitch must be the same as those used for the network connection.
File System Mount Target
The address of the file system mount target. For more information about how to obtain the address, see Manage mount targets.
Select Security Group
Use the same security group as the one used for the network connection.
File System Path
The existing storage path in NAS. For example,
/.Default Mount Path
The mount path for the file system. For example,
/mnt/data.Click OK.