Frequently asked questions about encoding in EdgeRoutine (ER), including supported encoding methods, pass-through behavior, and UTF-16 surrogate handling.
What encoding methods does ER support?
ER supports only UTF-8 encoding.
Does ER affect data transmission in pass-through mode?
No. In pass-through mode, ER does not read the request body. It transmits data as a network stream, modifying only the request header while forwarding the body as-is. The streams do not pass through JavaScript virtual machines.
The Fetch API decompresses streams by default, so ER also decompresses streams. To pass data without modification, set the decompress parameter to manual.
Are JavaScript strings encoded in UTF-16?
No. UTF-16 is not compatible with ASCII and uses surrogate code points. If a web page contains code points encoded as surrogate pairs, character errors may occur.
String.substring extracts a substring of characters represented as UTF-16 code points. A surrogate pair consists of two UTF-16 code points, and may become unpaired in a substring. An unpaired surrogate pair is encoded as INVALID REPLACEMENT CHAR (65533) in UTF-8 and is not displayed in a browser.
How can I modify content?
Buffer data by using the following methods:
text/arrayBuffer/JSON ...
-
During stream processing, verify that surrogate pairs remain intact. If a surrogate pair is unpaired, ER cannot determine the correct substring range. If your web pages do not contain surrogate-pair characters but may contain emojis (which use surrogate pairs), you can generally ignore this consideration.
-
Alibaba Cloud plans to launch an HTML parser for more efficient HTML code modification. For more information, see the announcements on the Alibaba Cloud International site.
How do I encode an ArrayBuffer in UTF-8 or decode a UTF-8-encoded string to an ArrayBuffer?
Use TextEncoder or TextDecoder to perform encoding or decoding.