Optimizing the metadata requests that ossfs 2.0 sends to OSS reduces API call costs, improves concurrency, and speeds up read/write operations on mount points.
Basic principles
ossfs 2.0 is built on the FUSE (Filesystem in Userspace) framework. It translates file system metadata operations into OSS requests, allowing you to access OSS storage resources through standard file system interfaces.
|
Command |
Interface conversion rules |
|
|
When executing If the GetObjectMeta request returns a 404 response (indicating the object does not exist), it will further send a ListObject(max-keys=1) request to query whether a virtual folder object with the same name exists. |
|
|
|
|
|
When executing Note that ossfs 2.0 enables the |
|
|
Scenario analysis
Accessing a file through a file system differs significantly from accessing the corresponding object directly in OSS.
File access methods
ossfs resolves file paths top-down from the root directory. For example, to obtain the attributes of /dir/object, the stat /dir/object command executes as follows:
-
First, perform an operation on /dir, sending a GetObjectMeta dir request. If it returns 404 Not Found, it indicates that the object does not exist, and then a ListObject (max-keys=1)dir/ request is sent. If it returns 200 OK, it indicates that a corresponding virtual folder exists.
-
Perform an operation on /dir/object, sending a GetObjectMeta dir/object request. If it returns 200 OK, the object attribute information is successfully obtained.
A single stat /dir/object command results in two GetObjectMeta requests and one ListObject request. Because each path component requires its own metadata lookups, the number of OSS requests grows with the file depth, which degrades performance.
Impact of file metadata caching
ossfs 2.0 enables file metadata caching by default, with a default cache validity period of 60 seconds. The cache capacity of metadata is implemented based on the FUSE low-level API and is determined by the operating system kernel when to evict. Machines with more memory can typically cache more metadata information.
The following example shows how metadata caching affects performance when reading attributes of 100 child files in the /dir/ directory.
-
Without metadata caching
-
Accessing files with a known file list:
When executing the
stat /dir/object-<i>command in a loop, eachstatoperation will be converted into one GetObjectMeta request, ultimately generating 100 GetObjectMeta requests sent to OSS to obtain file attributes, resulting in too many metadata requests affecting performance. -
Accessing files with an unknown file list:
When executing the
lscommand, this operation will be converted into one ListObject request sent to OSS to obtain the file list, and then execute thestat /dir/object-<i>command in a loop to obtain file attributes based on the obtained file list. This will ultimately generate one ListObject request and 100 GetObjectMeta requests sent to OSS, resulting in too many metadata requests affecting performance.
-
-
With metadata caching
-
Accessing files with a known file list:
When executing the
stat /dir/object-<i>command in a loop, eachstatoperation will be converted into one GetObjectMeta request, ultimately generating 100 GetObjectMeta requests. These 100 requests will directly hit the local metadata cache to obtain file attributes within the cache validity period, thereby effectively reducing the number of requests sent to OSS. -
Accessing files with an unknown file list:
When executing the
lscommand, this operation will be converted into one ListObject request sent to OSS while updating the local metadata cache. After completing the cache update, when executing thestat /dir/object-<i>command in a loop, since the metadata is already in the local cache, no additional OSS requests will be sent.
-
Metadata caching effectively reduces repeated requests to OSS. When traversing all files in a folder, running ls first preloads the cache and eliminates subsequent per-file OSS requests.
Optimization methods
Use the following methods to reduce metadata requests to OSS and improve performance:
Extend metadata cache time
If your data is immutable after upload or changes infrequently relative to the cache duration, increase the attr_timeout mount option to extend the metadata cache validity period and reduce repeated requests.
-
Business scenario: In a data annotation scenario, the system reads a batch of previously collected raw data, processes it, and then generates a new batch of data. In this scenario, the raw data will not be modified once uploaded to OSS.
-
Mount configuration: In the ossfs 2.0 configuration file, configure the metadata cache validity period to 7200 seconds.
# Bucket Endpoint (region node) --oss_endpoint=https://oss-cn-hangzhou-internal.aliyuncs.com # Bucket name --oss_bucket=bucketName # Metadata cache validity period --attr_timeout=7200 # Access keys AccessKey ID and AccessKey Secret (optional for ossfs 2.0.1 and later versions) --oss_access_key_id=LTAI****************** --oss_access_key_secret=8CE4**********************
Operate after obtaining file list
Before accessing individual files in a directory, run the ls command or send a ListObject request to preload all file metadata into the local cache. Combined with a longer cache validity period, this eliminates repeated per-file requests to OSS.
You can replace the ls command with any program that reads directory contents. The following examples list files in the /mnt/data/ directory.
Python
os.listdir('/mnt/data/')
Go
entries, err := os.ReadDir("/mnt/data/")
C
dir = opendir("/mnt/data/");
if (dir != NULL) {
struct dirent *entry;
while((entry = readdir(dir)) != NULL) {}
closedir(dir);
}
Use a negative cache to accelerate file creation
To create a new file, a file system executes two system calls in sequence: lookup and create.
-
The
lookupoperation determines if the corresponding file exists. In ossfs 2.0, this operation is parsed into a GetObjectMeta request and a ListObjects request. -
If a 404 Not Found error is returned, ossfs creates the file using the
createoperation. When ossfs 2.0 executescreate, it also sends a GetObjectMeta request and a ListObjects request to query whether the file exists in OSS.
Therefore, the process of creating a new file involves four OSS metadata query operations.
ossfs 2.0 supports the caching of `404` requests that are returned by OSS to reduce subsequent duplicate requests. To enable this feature, specify the following options when you mount the file system:
-
--oss_negative_cache_timeout=30(The default value is 0 seconds. We recommend that you set this value to be less than the value ofattr_timeout.) -
--oss_negative_cache_size=10000(Default value: 10000)
When the OSS negative cache is enabled, the 404 request from the lookup operation for a new file is cached. As a result, the subsequent query during the create operation hits the negative cache, and no request is sent to OSS. This reduces the number of OSS requests for the file creation process from four to two.
After you enable the OSS negative cache, if a 404 cache entry for a file named object-A is cached, the file is visible at the mount target only after the cache entry expires, even if you immediately create object-A in OSS. The cache validity period is specified by oss_negative_cache_timeout. We do not recommend that you enable this feature in scenarios that require high data consistency.
Performance comparison
Test method: Mount an OSS Bucket with ossfs 2.0 on an ECS instance in the same region using an internal endpoint with metadata caching enabled, then read the metadata of 10,000 files in the mounted directory.
Test results
|
Operation |
Time consumed |
|
Without preloading metadata cache (reading file metadata in the folder directly without executing the |
111 seconds |
|
With preloading metadata cache (executing the |
18 seconds |
Test conclusion: Preloading the metadata cache before bulk file access, combined with an appropriate cache validity period, significantly reduces OSS metadata requests and improves overall performance.