DeepSeek-OCR quick start guide

更新时间:
复制 MD 格式

This topic describes how to use the DevPod feature of FunModel to quickly launch a cloud-based GPU development environment. This environment is pre-configured with the DeepSeek-OCR model, its dependencies, and sample code. You can use this environment to validate model features, perform custom development, or run performance tests.

Prerequisites

Before you begin, make sure that you have an active Alibaba Cloud account and are logged on to the FunModel console.

  1. Switch to the new console: If you are using the old version, click New Console in the upper-right corner of the page.

  2. Complete authorization: When you log on for the first time, follow the on-screen instructions to grant permissions to a RAM role and complete other configurations.

Create a DevPod development environment

  1. In the FunModel console, click Custom Development and select Custom Environment.

  2. Configuring and activating developer parameters:

    • Container Image:

      • Regions in China: serverless-registry.cn-hangzhou.cr.aliyuncs.com/functionai/devpod-presets:deepseek-ocr-v1

      • Regions outside China: serverless-registry.ap-southeast-1.cr.aliyuncs.com/functionai/devpod-presets:deepseek-ocr-v1

    • Model Name: Enter a name for the development environment, for example, deepseek-ocr-dev. This name is used as your workspace name and determines the storage path for the model file on NAS (/mnt/deepseek-ocr-dev).

    • Model Source > Model ID: deepseek-ai/DeepSeek-OCR.

    • Startup Command and Listener Port: Keep the default settings.

    • Instance Type: Select Elastic Instance > GPU Compute-optimized Instance.

      • Recommendation: To ensure that the model runs stably, select a GPU instance with at least 16 GB of GPU memory.

      • Cost note: The GPU instance is billed on a pay-as-you-go basis while the DevPod is running. This can be expensive. To save costs, shut down the development environment when it is not in use.

    • For the Role Name parameter, select AliyunFCDefaultRole.

  3. Launch the development environment:

    Click DevPod Development and Debugging.

Configuration and testing

After the environment starts, the VS Code WebIDE interface opens automatically. In this interface, you can upload test images and run sample scripts to perform inference.

Verifying the environment

  • Run the following commands in the terminal at the bottom of the IDE to check if the environment is ready.

    # 1. Check if the GPU is available. A list of GPU information should appear.
    nvidia-smi
    
    # 2. Check if the model file is downloaded to NAS.
    # Replace deepseek-ocr-dev with the model name you set in Step 1.
    ls -l /mnt/deepseek-ocr-dev
    The model file is pre-downloaded to the NAS drive and stored at the fixed path /mnt/{name}, where {name} is the Model Name that you entered during creation.

Hugging Face example

  1. Open the terminal and go to the Hugging Face (HF) sample directory:

    cd /workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-hf
  2. (Optional) Upload your own test image to replace input/test.png.

  3. Run inference:

    python run_dpsk_ocr.py
  4. The terminal prints the detected text and saves the result file to the output/ folder.

    image

vLLM example

vLLM provides optimized scripts for different tasks, such as image, PDF, and batch processing. All tasks are configured through the config.py file.

Overview of operations:

  1. Go to the directory: First, go to the vLLM sample folder.

    cd /workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm
  2. Confirm the configuration: Open config.py and configure the path for the task, or use the sample path in the code.

  3. Execute the script: Run the corresponding .py file for the task.

The following sections describe the specific configuration and command for each task.

Single-image inference

Sample path:

INPUT_PATH = '/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/input_image/test.png'
OUTPUT_PATH = '/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/output_run_dpsk_ocr_image'

Command:

python run_dpsk_ocr_image.py

PDF inference

Sample path:

INPUT_PATH = '/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/input_pdf/test.pdf'
OUTPUT_PATH = '/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/output_run_dpsk_ocr_pdf'

Command:

python run_dpsk_ocr_pdf.py

Batch image editing

Sample path:

# /workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/config.py
INPUT_PATH = '/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/input_image/'
OUTPUT_PATH = '/workspace/DeepSeek-OCR/DeepSeek-OCR-master/DeepSeek-OCR-vllm/output_run_dpsk_ocr_eval_batch/'

Command:

python run_dpsk_ocr_eval_batch.py
Note: All image files in the input path are processed automatically. The results are saved to OUTPUT_PATH.

Manage the development environment and costs

  • Stop the environment: On the DevPod list page, click Shutdown to stop GPU billing.

  • Delete the environment: On the DevPod list page, click Delete to permanently delete the GPU instance.

Note

Stopping or deleting the environment does not affect the data on NAS. To delete the data, you must run the rm command. Use this command with caution.

FAQ

Q: What should I do if the DevPod fails to start or remains in the "Creating" state for a long time?

A: Check the following: 1. Verify that the required permissions are granted to the RAM sub-account. 2. Check if your Alibaba Cloud account has an overdue payment. 3. GPU resources may be low in the selected region. You can try again after changing the region or instance type.

Q: What should I do if an error is reported or the command is not found when I run the nvidia-smi command?

A: Make sure that you selected a GPU Compute-optimized Instance when you created the environment. If you selected this instance type but the error persists, try to restart the development environment in the console.

Q: What should I do if the model fails to download and the /mnt/{name} directory is empty?

A: This may be caused by temporary network fluctuations or issues accessing the model community. Try deleting the current DevPod and creating a new one.

Q: How do I install additional Python packages in the environment?

A: In the terminal of the Web IDE, run the pip install <package_name> command.