Build a meteorological grid data system

更新时间:
复制 MD 格式

Store and query five-dimensional meteorological grid data with Tablestore-Grid, which supports arbitrary-dimension slicing and multi-condition dataset retrieval at petabyte scale.

Background

Meteorological grid data is multi-dimensional spatiotemporal data. Daily volumes range from tens to hundreds of terabytes and continue to grow, demanding high storage scalability and flexible query capabilities.

A grid dataset is a five-dimensional structure: variables (temperature, humidity, wind speed, and others), time, altitude, longitude, and latitude. Weather forecasting and analysis require slicing queries across these dimensions in various combinations.

  • Query a latitude-longitude plane of grid data. For example, retrieve global surface temperatures for the next three hours.

  • Query time-series data at a specific grid point. For example, retrieve temperature changes at a location from the next 3 hours to 72 hours.

  • Query data for different variables. For example, retrieve all variables at a specific time, altitude, and point.

  • Query data produced by different model systems. For example, compare forecast data from different meteorological agencies.

Traditional solutions store grid data in relational databases with file systems, but scalability and query performance degrade as data volume grows.

Tablestore-Grid is a sample project for storing and querying five-dimensional grid data. With automatic horizontal scaling and search index-based retrieval, it supports petabyte-scale storage and millisecond-level multi-dimensional queries.

Data model

A five-dimensional grid dataset (GridDataSet) consists of the following dimensions.

Dimension

Description

variable

The physical quantity, such as temperature, humidity, and wind speed.

time

The time dimension. For example, the forecast lead time (next 3 hours, 6 hours, and so on).

z

The z-axis, which represents altitude.

x

The x-axis, which represents longitude or latitude.

y

The y-axis, which represents longitude or latitude.

GridDataSet = F(variable, time, z, x, y). Each dataset also contains the following metadata.

Field

Description

GridDataSetId

The unique identifier of the dataset.

Attributes

Custom attributes such as creation time, data source, forecast type, and resolution. Create a search index on custom attributes to retrieve datasets by combined conditions.

This solution uses two tables: a meta table for dataset metadata and a data table for grid data.

meta table: The primary key is GridDataSetId. Attribute columns store dimension information (size and data type of each dimension) and custom attributes. A search index on the meta table enables combined queries and sorting by source, resolution, status, and creation time.

data table: Four primary key columns locate each row of data.

Primary key

Description

GridDataSetId

The dataset ID, which uniquely identifies a dataset.

Variable

The variable name, which corresponds to the first dimension of the five-dimensional model.

Time

The time, which corresponds to the second dimension of the five-dimensional model.

Z

The altitude, which corresponds to the third dimension of the five-dimensional model.

These four primary key columns locate a single row that stores the two-dimensional plane formed by the remaining dimensions (x and y). To balance full-scan and localized query performance, the plane data is split into blocks and stored across multiple attribute columns. A query reads only the blocks that cover the target area.

Quick start

Download the sample code for the meteorological grid data system. The project covers table creation, data ingestion, slicing queries, and multi-condition retrieval.

  • Download the sample code: grid.demo.zip.

  • Create a configuration file named tablestoreConf.json in your home directory. Specify the endpoint, AccessKey ID, AccessKey Secret, and instance name.

  • Prepare a NetCDF test data file and set the file path in ExampleConfig.java.

  • Run the examples in order: CreateStoreExample (create tables and indexes), DataImportExample (ingest data), DataFetchExample (slicing queries), and MetaQueryExample (multi-condition retrieval).

Implementation

To build a meteorological grid data system, use the Tablestore-Grid sample project:

Note

Tablestore-Grid is a sample project that encapsulates table creation, data ingestion, and query operations for the meta table and data table. It provides three core interfaces: GridStore, GridDataWriter, and GridDataFetcher. The source code is included in the sample code package and requires Tablestore SDK for Java 5.17.4 or later and the NetCDF Java library.

Step 1: Initialize and create data tables

Initialize the client with TableStoreGrid, call createStore to create the meta table and data table, and create a search index for multi-condition dataset retrieval.

// Initialize TableStoreGrid
TableStoreGridConfig config = new TableStoreGridConfig();
config.setTableStoreEndpoint(endpoint);
config.setAccessId(accessKeyId);
config.setAccessKey(accessKeySecret);
config.setTableStoreInstance(instanceName);
config.setDataTableName("grid_data_table");
config.setMetaTableName("grid_meta_table");

TableStoreGrid tableStoreGrid = new TableStoreGrid(config);

// Create meta table and data table
tableStoreGrid.createStore();

// Create search index on meta table for multi-condition query
IndexSchema indexSchema = new IndexSchema();
indexSchema.setFieldSchemas(Arrays.asList(
        new FieldSchema("status", FieldType.KEYWORD).setIndex(true).setEnableSortAndAgg(true),
        new FieldSchema("source", FieldType.KEYWORD).setIndex(true).setEnableSortAndAgg(true),
        new FieldSchema("accuracy", FieldType.KEYWORD).setIndex(true).setEnableSortAndAgg(true),
        new FieldSchema("create_time", FieldType.LONG).setIndex(true).setEnableSortAndAgg(true)
));
tableStoreGrid.createMetaIndex("grid_meta_index", indexSchema);

Step 2: Ingest grid data

Data ingestion involves three steps: write dataset metadata, use GridDataWriter to ingest grid data layer by layer as two-dimensional planes, and update the metadata to mark ingestion as complete. The following example reads a NetCDF weather data file and ingests it into Tablestore.

// Step 1: Write dataset metadata
GridDataSetMeta meta = new GridDataSetMeta(
        "forecast_20260611",           // GridDataSetId
        DataType.FLOAT,                // Data type
        Arrays.asList("temperature"),  // Variables
        24, 10, 720, 1440,            // Dimensions: time=24, z=10, x=720, y=1440
        new StoreOptions(StoreOptions.StoreType.SLICE));

Map<String, Object> attributes = new HashMap<>();
attributes.put("source", "ECMWF");
attributes.put("accuracy", "0.25deg");
attributes.put("status", "INIT");
attributes.put("create_time", System.currentTimeMillis());
meta.setAttributes(attributes);
tableStoreGrid.putDataSetMeta(meta);

// Step 2: Import data from NetCDF file
GridDataWriter writer = tableStoreGrid.getDataWriter(meta);
NetcdfFile ncFile = NetcdfFile.open("forecast_data.nc");
for (Variable variable : ncFile.getVariables()) {
    if (meta.getVariables().contains(variable.getShortName())) {
        for (int t = 0; t < meta.gettSize(); t++) {
            for (int z = 0; z < meta.getzSize(); z++) {
                Array array = variable.read(
                        new int[]{t, z, 0, 0},
                        new int[]{1, 1, meta.getxSize(), meta.getySize()});
                Grid2D grid2D = new Grid2D(
                        array.getDataAsByteBuffer(),
                        variable.getDataType(),
                        new int[]{0, 0},
                        new int[]{meta.getxSize(), meta.getySize()});
                writer.writeGrid2D(variable.getShortName(), t, z, grid2D);
            }
        }
    }
}

// Step 3: Update metadata status to DONE
attributes.put("status", "DONE");
meta.setAttributes(attributes);
tableStoreGrid.updateDataSetMeta(meta);

Step 3: Query grid data

GridDataFetcher supports slicing queries across arbitrary dimensions. Call setVariablesToGet to specify the variables to retrieve. For the remaining four dimensions, set the origin and shape to define the sub-space to query.

GridDataFetcher fetcher = tableStoreGrid.getDataFetcher(
        tableStoreGrid.getDataSetMeta("forecast_20260611"));

// Query 1: Get a latitude-longitude plane (fixed time and height)
fetcher.setVariablesToGet(Arrays.asList("temperature"));
fetcher.setOriginShape(
        new int[]{0, 0, 0, 0},       // origin: time=0, z=0, x=0, y=0
        new int[]{1, 1, 720, 1440});  // shape: one full plane
Grid4D grid4D = fetcher.fetch().getVariable("temperature");
Array planeData = grid4D.toArray();

// Query 2: Get a time series at a specific point and height
fetcher.setOriginShape(
        new int[]{0, 0, 360, 720},    // origin: time=0, z=0, specific point
        new int[]{24, 1, 1, 1});      // shape: all 24 time steps, single point
Grid4D timeSeries = fetcher.fetch().getVariable("temperature");

// Query 3: Get an arbitrary 4D sub-space
fetcher.setOriginShape(
        new int[]{6, 2, 100, 200},    // origin: time=6, z=2, x=100, y=200
        new int[]{12, 5, 50, 80});    // shape: 12 time steps, 5 heights, 50x80 area
Grid4D subSpace = fetcher.fetch().getVariable("temperature");

Step 4: Retrieve datasets by conditions

With a search index on the meta table, use QueryBuilder to retrieve datasets by combined conditions. The following example retrieves datasets where ingestion is complete, created within the last day, sourced from ECMWF or NMC, and at 0.25-degree resolution. Results are sorted by creation time in descending order.

// Query: (status == DONE) AND (create_time > last 24h)
//        AND (accuracy == "0.25deg")
//        AND (source == "ECMWF" OR source == "NMC")
QueryGridDataSetResult result = tableStoreGrid.queryDataSets(
        "grid_meta_index",
        QueryBuilder.and()
                .equal("status", "DONE")
                .greaterThan("create_time", System.currentTimeMillis() - 86400000)
                .equal("accuracy", "0.25deg")
                .query(QueryBuilder.or()
                        .equal("source", "ECMWF")
                        .equal("source", "NMC")
                        .build())
                .build(),
        new QueryParams(0, 10, new Sort(
                Arrays.<Sort.Sorter>asList(
                        new FieldSort("create_time", SortOrder.DESC)))));

for (GridDataSetMeta meta : result.getGridDataSetMetas()) {
    System.out.println("DataSet: " + meta.getGridDataSetId()
            + ", Attributes: " + meta.getAttributes());
}

Resource cleanup

Important

The following operations delete all tables and indexes related to meteorological grid data. Deleted data cannot be recovered. Make sure you no longer need the data before you proceed.

// Delete search index first, then drop tables
SyncClient syncClient = new SyncClient(endpoint, accessKeyId, accessKeySecret, instanceName);
syncClient.deleteSearchIndex(new DeleteSearchIndexRequest("grid_meta_table", "grid_meta_index"));
syncClient.deleteTable(new DeleteTableRequest("grid_meta_table"));
syncClient.deleteTable(new DeleteTableRequest("grid_data_table"));
syncClient.shutdown();

// Or if using TableStoreGrid instance, close it first
tableStoreGrid.close();