SLS provides a fully managed, scalable, and highly available data transformation (new version) service for data normalization, extraction, cleansing, filtering, and distribution across multiple Logstores.
Video tutorial
How it works
Data transformation (new version) processes log data in real time through hosted consumption jobs and SPL rules. SPL syntax. Standard consumption overview.
Data transformation uses the SLS real-time consumption API and does not depend on the source Logstore index configuration.

Scheduling mechanism
The scheduler starts one or more instances per job to process data concurrently. Each instance consumes data from one or more shards of the source Logstore. Instance count adjusts based on resource usage and processing progress for elastic scaling. Maximum concurrency per job equals the shard count of the source Logstore.
Running instances
Based on the job's SPL rules and destination Logstore configurations, each instance consumes source log data from allocated shards, then distributes and writes results to the destination Logstore. Instances automatically save shard checkpoints, so a restarted job resumes from the last checkpoint.
Stop and resume jobs
-
Automatic stop: A job with a configured end time stops after processing all logs before that time. Without an end time, the job runs continuously. ETL.
-
Resumable processing: After an unexpected stop, a restarted job resumes from the last saved shard checkpoint by default, ensuring data consistency.
View job running status
SLS supports monitoring for data transformation jobs. Observe and monitor data transformation (new version) jobs.
Scenarios
Data transformation supports the following scenarios:
-
Data normalization and information extraction: Extract fields from inconsistently formatted logs and normalize formats to produce structured data for stream processing and data warehouse analysis.
-
Data forwarding and distribution:
-
Collect mixed log types into a single Logstore, then distribute them to downstream Logstores by characteristic (source module, business component) for data isolation and scenario-specific computing.
-
For multi-region deployments, aggregate logs from each region into a central region with cross-region acceleration for centralized management.
-
-
Data cleansing and filtering: Remove invalid entries or unused fields, then write filtered key information to downstream Logstores for focused analysis.
-
Data masking: Mask sensitive data such as passwords, phone numbers, and addresses.
Features
-
SPL provides a unified language for data collection, query, and processing, reducing the need to learn multiple syntaxes.
-
Line-by-line debugging and code hinting deliver an IDE-like SPL coding experience.
-
Real-time processing with data visibility within seconds, elastic scaling, and high throughput.
-
Built-in data processing instructions and SQL functions for log analysis.
-
Real-time observability metrics, dashboards, and custom monitoring rules based on runtime metrics.
-
Fully managed and maintenance-free, with Alibaba Cloud big data and open-source ecosystem integrations.
Billing
-
pay-by-ingested-data Logstores: Data transformation (new version) is free. Outbound traffic when pulling or writing data over the SLS public endpoint is charged based on compressed data volume. Billable items for the pay-by-ingested-data mode.
-
pay-by-feature Logstores: You are charged for compute and network resources consumed by data transformation (new version). Billable items for the pay-by-feature mode.