This document guides you through the process of deploying a model and calling its API on the FunModel platform. You will learn how to select and configure compute instances, manage service credentials, make inference requests, and perform basic troubleshooting. This guide helps you integrate the AI model capabilities of the FunModel platform into your applications.
Preparations
Before you begin, make sure that you have a valid Alibaba Cloud account and are logged on to the FunModel console.
Switch to the new console: If you are using the old version, click New Console in the upper-right corner of the page.
Complete authorization: The first time you log on, follow the on-screen instructions to configure RAM role authorization.
Deploy and call a model service
The following steps describe how to deploy a model as an online service and call it. This process applies to traditional models, such as OCR and speech recognition, and large language models (LLMs).
Step 1: Select a model
In the Model Marketplace, select a model that fits your business scenario. For example:
Traditional model:
iic/cv_convnextTiny_ocr-recognition-general_damo(OCR).Large language model (LLM):
Qwen/Qwen3-8B(Qwen 8B model).
Step 2 (Optional for some model services): Test the model
Before deployment, you can use the Quick Experience feature to check if the model's performance meets your expectations.
Select a model to open its details page.
In the Quick Experience section, click Run Test. This runs an inference using preset test data.
Review the output to determine whether the model's features meet your needs.
Step 3: Configure and deploy the model
In this step, you will deploy the model as an online service and allocate the required compute resources.
On the model details page, click Deploy Now.
On the configuration page, the core configuration items are Instance Type and GPU Specification. These specifications determine the performance and cost of the service. For more information about instance types, see Instance types and specifications.
Instance type descriptions and recommendations:
Instance type
Specification (vCPU/Memory/GPU)
Scenarios
Basic GPU
4-core 16 GB 8 GB
Feature validation for traditional models, low-frequency calls
Advanced GPU
8-core 32 GB 16 GB
Traditional models in production environments, lightweight LLMs
Performance GPU
8-core 64 GB 48 GB
LLM inference, image generation, and other GPU-intensive tasks
Performance GPU (Multi-card)
16-core 128 GB 48 GB × 2
High-performance inference for large-scale LLMs
Click Deploy Now. Once the service deployment is complete, the page automatically redirects to the service details page.
Step 4: Call the model service
After the service is deployed, you can interact with the model in one of two ways.
Method 1: Online call
Use this method to quickly verify in the console that the deployed service is processing input and generating output correctly.
On the service details page, click the Online Debugging tab.
The system automatically populates the request with sample parameters, which you can modify as needed.
Click Send Request.
In the Response section on the right, you can view the model's response.
Method 2: API call
Use this method to integrate the model's capabilities into your application by making standard HTTP requests.
Obtain the service credentials and endpoint
Before making an API call, you need to obtain two key pieces of information from the section on the model details page:
API endpoint: The exclusive access URL for the service.
Bearer Token: Used for identity authentication in API calls.
NoteTo ensure service security, FunModel recommends that you enable authenticated access.
With authentication enabled, you must include a valid
Authorization: Bearer <YOUR_TOKEN>in the HTTP request header to prevent unauthorized access. Disabling authentication means anyone who knows your service's API endpoint can call it. This poses a security risk and is only recommended for temporary testing in a trusted internal network.Construct and send the request
Models on FunModel may follow different API specifications.
Large language models (LLMs) that are compatible with the OpenAI API
LLM services that are deployed on FunModel provide an API endpoint that is compatible with the OpenAI
v1/chat/completionsinterface. This compatibility makes it easy to migrate existing applications.The following
curlexample shows how to call theQwen/Qwen3-8Bmodel. Replace theurlandAuthorizationvalues with your service's information.curl --request POST \ --url https://YOUR_SERVICE_URL/v1/chat/completions \ --header 'Authorization: Bearer YOUR_BEARER_TOKEN' \ --header 'content-type: application/json' \ --data '{ "model": "Qwen/Qwen3-8B", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, please tell me about Hangzhou."} ], "stream": false, "temperature": 0.8, "max_tokens": 1024 }'Key request parameter descriptions:
Parameter
Type
Required
Description
modelstring
Yes
Specifies the ID of the model to call. It must match the model you deployed.
messagesarray
Yes
The conversation history, containing
roleandcontent.streamboolean
No
Specifies whether to return a stream. The default is
false.temperaturefloat
No
Controls the randomness of the generated text. The value is between 0 and 2. A higher value makes the response more creative.
max_tokensinteger
No
Controls the maximum length of a single generated response.
Traditional models
The API format for traditional models is typically simpler. The following
curlexample shows how to call an OCR model.curl --request POST \ --url https://YOUR_OCR_SERVICE_URL/ \ --header 'Authorization: Bearer YOUR_BEARER_TOKEN' \ --header 'content-type: application/json' \ --data '{"input":{"image":"http://modelscope.oss-cn-beijing.aliyuncs.com/demo/images/image_ocr_recognition.jpg"}}'
Billing information
The FunModel platform is free to use. However, you are charged for the underlying cloud resources that are consumed when you deploy and call services. These fees are settled in your Alibaba Cloud account and primarily include the following:
Function Compute (FC) fees: These are the core computing costs for running the model. You are billed on a pay-as-you-go basis according to the instance type and running time that you select.
Apsara File Storage NAS (NAS) fees: Model files are stored in NAS. You are billed on a pay-as-you-go basis according to the amount of storage space that is used.
Simple Log Service (SLS) fees: Service logs are collected in SLS for querying. You are billed based on usage.
To prevent service interruptions due to overdue payments, make sure that your Alibaba Cloud account has a sufficient balance. The related Alibaba Cloud services typically offer a free quota. Usage that exceeds the free quota is billed on a pay-as-you-go basis. For more information, see the official pricing documentation for each cloud product.
Troubleshooting
If you encounter a problem, logs are the primary source of information for troubleshooting. Always check the logs first to identify the cause of a problem.
Deployment failures
If a model deployment fails, on the model details page, click to view detailed error messages.
OOMKilled(Out of Memory): This error indicates insufficient memory or GPU memory. It typically occurs when the selected instance type is too small for a large model. To resolve this issue, try upgrading to an instance with a higher configuration.ImagePullBackoff/ErrImagePull: The image failed to pull. Check your network configuration or contact technical support.Download timeout: The model file download timed out. This error is typically caused by a large model file or network fluctuations. Try to deploy the model again.
Call failures
If a model service call fails, first check the returned HTTP status code. Then, use the request ID (
x-fc-request-id) to find the detailed logs on the model details page.403 Forbidden: This error indicates that authentication failed, which typically means your API key (Bearer Token) is invalid. Check the following:The format of the
Authorizationrequest header must beBearer sk-xxxxxxxx.The provided
Bearer Tokenmust be complete, correct, and not have expired.Check the
Messagefield in the response body. It provides the specific reason for the failure, such asaccess denied due to invalid bearer token.
429 Too Many Requests: The call frequency exceeds the service's concurrency limit. To resolve this issue, increase the number of instances in the service's advanced settings or optimize your call logic.502 Bad Gateway/504 Gateway Timeout: A backend service error or timeout occurred. Check the operational logs for information about program crashes or inference timeouts.
Best practices
Cost control: Before deployment, use the Quick Experience feature to verify the model's performance and avoid unnecessary resource overhead. For compute-intensive tasks, such as large model inference, select a suitable compute-optimized instance to balance cost and efficiency.
Performance monitoring: On the Monitoring tab of the service details page, track core metrics such as Function Invocations, Function Running Time, Memory Usage, and GPU Memory Usage. You can also configure alert rules to promptly detect and address performance issues.
Continuous optimization and configuration adjustment: Dynamically adjust service configurations based on your business load and monitoring data to balance performance and cost. For example, you can increase the number of instances to handle high concurrency or upgrade the instance type to reduce inference latency.