This topic describes how to use INSERT statements to import data into StarRocks.
Scenarios
-
To import a few test records, use the INSERT INTO VALUES statement.
-
To transform data from a source table using extract, transform, and load (ETL) and import it into a new StarRocks table, use the INSERT INTO SELECT statement.
-
To overwrite a few test records, use the INSERT OVERWRITE VALUES statement.
-
To transform data from a source table using ETL and overwrite data in a new StarRocks table, use the INSERT OVERWRITE SELECT statement.
The source table can be one or more internal tables or foreign tables. The target table must be an internal StarRocks table.
Considerations
-
By default, StarRocks fails an INSERT operation if any data does not match the target table format—for example, if a string exceeds the allowed length. To filter out invalid data and continue the INSERT operation, set the session variable
enable_insert_stricttofalse. -
Frequently using INSERT statements to import small batches of data creates too many data versions, which degrades query performance. Do not use INSERT statements frequently for data import or as a routine daily import task in a production environment. If your scenario requires streaming or frequent small-batch imports, use Stream Load or Routine Load.
-
When you run an INSERT OVERWRITE statement, the system creates temporary partitions for the target partitions, writes data into them, and then atomically replaces the target partitions with the temporary ones. All these steps run on the Leader FE node. If the Leader FE node breaks down during this process, the INSERT OVERWRITE operation fails, and the system deletes all temporary partitions created during the operation.
Syntax
INSERT { INTO | OVERWRITE } [db_name.]<table_name>
[ PARTITION (<partition_name> [, ...] ) ]
[ TEMPORARY PARTITION (<temporary_partition_name> [, ...] ) ]
[ WITH LABEL <label>]
[ (<column_name>[, ...]) ]
{ VALUES ( { <expression> | DEFAULT } [, ...] ) | <query> }
The following table describes the parameters.
|
Parameter |
Description |
|
INTO |
Appends data to the target table. |
|
OVERWRITE |
Overwrites data in the target table. |
|
table_name |
Name of the target table for data import. Use the format db_name.table_name. |
|
PARTITION |
Target partitions for data import. These partitions must exist in the target table. Separate multiple partition names with commas (,). If you specify this parameter, data imports only into the listed partitions. If you omit it, data imports into all partitions of the target table by default. |
|
TEMPORARY PARTITION |
Specifies temporary partitions for data import. For details, see temporary partition. |
|
label |
Unique identifier for the import task within the database. Specify a label.
|
|
column_name |
Target columns for data import. These columns must exist in the target table. Column mapping depends on position, not column names. If you omit this parameter, all columns in the target table are used by default. If a source column does not exist in the target, the system inserts the default value. If no default value exists, the import task fails. If the result column type from the query does not match the target column type, StarRocks attempts implicit conversion. If conversion fails, the INSERT INTO statement returns a parsing error. |
|
VALUES |
Inserts one or more rows of data using the VALUES clause. Important
Use VALUES only to import a few rows for demonstration purposes. Do not use it in any test or production environment. StarRocks is not designed for single-row imports. Use INSERT INTO SELECT for batch imports. |
|
expression |
Expression used to assign a value to the corresponding column. |
|
DEFAULT |
Assigns the default value to the corresponding column. |
|
query |
Query statement whose results are imported into the target table. Supports all SQL query syntax provided by StarRocks. |
Related configurations
You can configure the following settings for INSERT import tasks.
FE configuration
|
FE configuration |
Description |
|
insert_load_default_timeout_second |
Timeout for INSERT import tasks, in seconds. If an INSERT task does not complete within this time, the system cancels it and sets its state to CANCELLED. This parameter applies globally to all INSERT tasks. You cannot set a timeout for individual tasks. The default is 3600 seconds (1 hour). If your task cannot finish within this time, increase the timeout by adjusting this parameter. |
Session variables
|
Session variable |
Description |
|
enable_insert_strict |
Determines whether the INSERT operation tolerates rows with invalid data.
|
|
query_timeout |
Timeout for SQL commands, in seconds. Since INSERT is an SQL command, it is subject to this session variable. Set it using |
Preparations
Check permissions
The INSERT operation requires INSERT permission on the target table. If your user account lacks this permission, grant it using GRANT. The syntax is GRANT INSERT ON TABLE <table_name> IN DATABASE <database_name> TO { ROLE <role_name> | USER <user_identity>}.
Create database and tables
Create a database named load_test in StarRocks, and create two tables inside it: the target table insert_wiki_edit and the source table source_wiki_edit.
All examples in this topic use the target table insert_wiki_edit and the source table source_wiki_edit. If you use your own tables and data, skip this step and adjust the data to import based on your scenario.
CREATE DATABASE IF NOT EXISTS load_test;
USE load_test;
CREATE TABLE insert_wiki_edit
(
event_time DATETIME,
channel VARCHAR(32) DEFAULT '',
user VARCHAR(128) DEFAULT '',
is_anonymous TINYINT DEFAULT '0',
is_minor TINYINT DEFAULT '0',
is_new TINYINT DEFAULT '0',
is_robot TINYINT DEFAULT '0',
is_unpatrolled TINYINT DEFAULT '0',
delta INT DEFAULT '0',
added INT DEFAULT '0',
deleted INT DEFAULT '0'
)
DUPLICATE KEY(
event_time,
channel,
user,
is_anonymous,
is_minor,
is_new,
is_robot,
is_unpatrolled
)
PARTITION BY RANGE(event_time)(
PARTITION p06 VALUES LESS THAN ('2015-09-12 06:00:00'),
PARTITION p12 VALUES LESS THAN ('2015-09-12 12:00:00'),
PARTITION p18 VALUES LESS THAN ('2015-09-12 18:00:00'),
PARTITION p24 VALUES LESS THAN ('2015-09-13 00:00:00')
)
DISTRIBUTED BY HASH(user);
CREATE TABLE source_wiki_edit
(
event_time DATETIME,
channel VARCHAR(32) DEFAULT '',
user VARCHAR(128) DEFAULT '',
is_anonymous TINYINT DEFAULT '0',
is_minor TINYINT DEFAULT '0',
is_new TINYINT DEFAULT '0',
is_robot TINYINT DEFAULT '0',
is_unpatrolled TINYINT DEFAULT '0',
delta INT DEFAULT '0',
added INT DEFAULT '0',
deleted INT DEFAULT '0'
)
DUPLICATE KEY(
event_time,
channel,user,
is_anonymous,
is_minor,
is_new,
is_robot,
is_unpatrolled
)
PARTITION BY RANGE(event_time)(
PARTITION p06 VALUES LESS THAN ('2015-09-12 06:00:00'),
PARTITION p12 VALUES LESS THAN ('2015-09-12 12:00:00'),
PARTITION p18 VALUES LESS THAN ('2015-09-12 18:00:00'),
PARTITION p24 VALUES LESS THAN ('2015-09-13 00:00:00')
)
DISTRIBUTED BY HASH(user);
Import using INSERT INTO VALUES
Use the INSERT INTO VALUES statement to directly import data into a specified table. Separate multiple rows with commas (,). For details, see INSERTINSERT.
Use INSERT INTO VALUES only to import a small amount of data for verification demos. Do not use it for large-scale testing or in production environments. For large-scale imports, choose another method.
The following example uses the label insert_load_wikipedia to import two rows into the source table source_wiki_edit.
INSERT INTO load_test.source_wiki_edit
WITH LABEL insert_load_wikipedia
VALUES
("2015-09-12 00:00:00","#en.wikipedia","AustinFF",0,0,0,0,0,21,5,0),
("2015-09-12 00:00:00","#ca.wikipedia","helloSR",0,1,0,1,0,3,23,0);
|
Parameter |
Description |
|
table_name |
Target table for data import. Use the format |
|
label |
Unique identifier for the import task within the database. If you do not specify a label, StarRocks automatically generates one. Specify a label so you can confirm whether the operation succeeded if a network error prevents the result from returning. If you specify a label, run |
|
values |
Inserts one or more rows using the VALUES syntax. Separate multiple rows with commas (,). |
Import using INSERT INTO SELECT
Use the INSERT INTO SELECT statement to transform data from a source table using ETL and import it into an internal StarRocks table. The source table can be one or more internal or foreign tables. The target table must be an internal StarRocks table. After running this statement, the system imports the results of the SELECT query into the target table.
The following examples show importing from internal tables. The process is identical for foreign tables, so those cases are not repeated.
-
The following example uses the label
insert_load_wikipedia_1to import data from the source table into the target table.INSERT INTO load_test.insert_wiki_edit WITH LABEL insert_load_wikipedia_1 SELECT * FROM load_test.source_wiki_edit; -
The following example uses the label
insert_load_wikipedia_2to import data into partitionsp06andp12of the target table. If you do not specify target partitions, data imports into the entire table. If you specify partitions, data imports only into those partitions.INSERT INTO load_test.insert_wiki_edit PARTITION(p06, p12) WITH LABEL insert_load_wikipedia_2 SELECT * FROM load_test.source_wiki_edit;If you clear partitions
p06andp12, previously inserted data in those partitions disappears.TRUNCATE TABLE load_test.insert_wiki_edit PARTITION(p06, p12); SELECT * FROM load_test.insert_wiki_edit; -
The following example uses the label
insert_load_wikipedia_3to import only theevent_timeandchannelcolumns from the source table into the corresponding columns of the target table. Other columns receive their default values.INSERT INTO load_test.insert_wiki_edit WITH LABEL insert_load_wikipedia_3 ( event_time, channel ) SELECT event_time, channel FROM load_test.source_wiki_edit;
Overwrite using INSERT OVERWRITE VALUES
Use the INSERT OVERWRITE VALUES statement to overwrite data in a specified table. Separate multiple rows with commas (,). For details, see INSERTINSERT.
Use INSERT OVERWRITE VALUES only to import a small amount of data for verification demos. Do not use it for large-scale testing or in production environments. For large-scale imports, choose another method.
Query the source table to confirm existing data.
SELECT * FROM load_test.source_wiki_edit;
The system returns the following result.
The result contains two records: the first has event_time 2015-09-12, channel #ca.wikipedia, user helloSR, and delta/added/deleted values of 3/23/0; the second has event_time 2015-09-12, channel #en.wikipedia, user AustinFF, and delta/added/deleted values of 21/5/0.
The following example uses the label insert_load_wikipedia_ow to overwrite two rows in the source table source_wiki_edit.
INSERT OVERWRITE load_test.source_wiki_edit
WITH LABEL insert_load_wikipedia_ow
VALUES
("2015-09-12 00:00:00","#cn.wikipedia","GELongstreet",0,0,0,0,0,36,36,0),
("2015-09-12 00:00:00","#fr.wikipedia","PereBot",0,1,0,1,0,17,17,0);
Query the source table again. The result shows the data was successfully overwritten. The new result contains two records: the first has event_time 2015-09-12 00:00:00, channel #cn.wikipedia, user GELongstreet, and delta/added/deleted values of 36/36/0; the second has event_time 2015-09-12 00:00:00, channel #fr.wikipedia, user PereBot, and delta/added/deleted values of 17/17/0.
Overwrite using INSERT OVERWRITE SELECT
Use the INSERT OVERWRITE SELECT statement to overwrite data in the target table with transformed data from the source table. This statement performs ETL on data from the source table and overwrites data in an internal StarRocks table. The source table can be one or more internal or foreign tables. The target table must be an internal StarRocks table. After running this statement, the system overwrites the target table with the results of the SELECT query. For details, see INSERTINSERT.
The following examples show importing from internal tables. The process is identical for foreign tables, so those cases are not repeated.
-
The following example uses the label
insert_load_wikipedia_ow_1to overwrite the target table with data from the source table.INSERT OVERWRITE load_test.insert_wiki_edit WITH LABEL insert_load_wikipedia_ow_1 SELECT * FROM load_test.source_wiki_edit; -
The following example uses the label
insert_load_wikipedia_ow_2to overwrite partitionsp06andp12of the target table. If you do not specify target partitions, data overwrites the entire table. If you specify partitions, data overwrites only those partitions.INSERT OVERWRITE load_test.insert_wiki_edit PARTITION(p06, p12) WITH LABEL insert_load_wikipedia_ow_2 SELECT * FROM load_test.source_wiki_edit;If you clear partitions
p06andp12, previously inserted data in those partitions disappears.TRUNCATE TABLE load_test.insert_wiki_edit PARTITION(p06, p12); SELECT * FROM load_test.insert_wiki_edit;NoteFor tables partitioned by list expressions (
PARTITION BY column), INSERT OVERWRITE can create new partitions in the target table by specifying partition key values. Existing partitions are overwritten normally.The following example creates a partitioned table
activityand automatically creates a new partition during data import:CREATE TABLE load_test.activity ( id INT NOT NULL, dt VARCHAR(10) NOT NULL ) ENGINE=OLAP DUPLICATE KEY(`id`) PARTITION BY (`id`, `dt`) DISTRIBUTED BY HASH(`id`); INSERT OVERWRITE load_test.activity PARTITION(id='4', dt='2022-01-01') WITH LABEL insert_activity_auto_partition VALUES ('4', '2022-01-01'); -
The following example uses the label
insert_load_wikipedia_ow_3to overwrite only theevent_timeandchannelcolumns in the target table with data from the source table. Other columns receive their default values.INSERT OVERWRITE load_test.insert_wiki_edit WITH LABEL insert_load_wikipedia_ow_3 ( event_time, channel ) SELECT event_time, channel FROM source_wiki_edit;
View import tasks
Use one of the following methods to view import tasks.
-
Method 1: View import tasks in EMR StarRocks Manager. For details, see View import task details.
-
Method 2: View import progress using the
loadsview in the StarRocks Information Schema database. This feature is available starting from version 3.1.SELECT * FROM information_schema.loads;For details about fields in the
loadsview, seeloads.If you submitted multiple import tasks, filter by
LABELto find a specific task. Example:SELECT * FROM information_schema.loads WHERE LABEL = 'insert_load_wikipedia_ow';In the result, the
STATEfield shows the task status.-
If the state is
CANCELLED, check theERROR_MSGfield to identify the cause. -
If the state is
FINISHED, the task succeeded.
-