Working with static columns
Defining static columns
Consider a scenario where you use a Cassandra table to store basic user information, such as email and password, along with user status updates. Basic user information rarely changes, but user status updates are frequent. Including the basic information with every status update wastes significant storage space.
To solve this problem, Cassandra introduced static columns. Columns declared as static have only one value for each partition key. This means the value is stored only once per partition.
Define a static column
You can define a column as STATIC by adding the STATIC keyword after the column definition, as follows:
CREATE TABLE "iteblog_users_with_status_updates" (
"username" text,
"id" timeuuid,
"email" text STATIC,
"encrypted_password" blob STATIC,
"body" text,
PRIMARY KEY ("username", "id")
);
The preceding command sets the email and encrypted_password columns to STATIC. This means that for a given username, only one email and one encrypted_password value exist.
Static column limitations
The STATIC keyword is not supported in all cases. Static columns have the following limitations:
The table does not have clustering columns (also known as clustering keys). For example:
cqlsh:iteblog_keyspace> CREATE TABLE "iteblog_users_with_status_updates_invalid" (
... "username" text,
... "id" timeuuid,
... "email" text STATIC,
... "encrypted_password" blob STATIC,
... "body" text,
... PRIMARY KEY ("username")
... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Static columns are only useful (and thus allowed) if the table has at least one clustering column"
The iteblog_users_with_status_updates_invalid table has only a PRIMARY KEY and no clustering columns. Therefore, it does not support static columns. This is because static columns are most efficient when multiple rows exist for the same partition key, and the benefit increases with the number of rows. If no clustering columns are defined, only one row can exist for each PRIMARY KEY in a partition. The data is inherently static, so there is no need for static columns.
The table was created with COMPACT STORAGE. For example:
cqlsh:iteblog_keyspace> CREATE TABLE "iteblog_users_with_status_updates_invalid" (
... "username" text,
... "id" timeuuid,
... "email" text STATIC,
... "encrypted_password" blob STATIC,
... "body" text,
... PRIMARY KEY ("username", "id")
... )WITH COMPACT STORAGE;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Static columns are not supported in COMPACT STORAGE tables"
The column is part of the partition key or clustering columns. For example:
cqlsh:iteblog_keyspace> CREATE TABLE "iteblog_users_with_status_updates_invalid" (
... "username" text,
... "id" timeuuid STATIC,
... "email" text STATIC,
... "encrypted_password" blob STATIC,
... "body" text,
... PRIMARY KEY ("username", "id")
... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Static column id cannot be part of the PRIMARY KEY"
cqlsh:iteblog_keyspace> CREATE TABLE "iteblog_users_with_status_updates_invalid" (
... "username" text,
... "id" timeuuid,
... "email" text STATIC,
... "encrypted_password" blob STATIC,
... "body" text,
... PRIMARY KEY (("username", "id"), email)
... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Static column email cannot be part of the PRIMARY KEY"
Insert data into a table with static columns
Inserting data into a table with static columns is similar to inserting data into a regular table. For example, you can insert data into the iteblog_users_with_status_updates table as follows:
cqlsh:iteblog_keyspace> INSERT INTO "iteblog_users_with_status_updates"
... ("username", "id", "email", "encrypted_password", "body")
... VALUES (
... 'iteblog',
... NOW(),
... 'iteblog_hadoop@iteblog.com',
... 0x877E8C36EFA827DBD4CAFBC92DD90D76,
... 'Learning Cassandra!'
... );
cqlsh:iteblog_keyspace> select username, email, encrypted_password, body from iteblog_users_with_status_updates;
username | email | encrypted_password | body
----------+----------------------------+------------------------------------+---------------------
iteblog | iteblog_hadoop@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | Learning Cassandra!
(1 rows)
The output shows that one record was inserted successfully. The preceding statement performs two actions:
The email and encrypted_password for all data with the username 'iteblog' are set to iteblog_hadoop@iteblog.com and 0x877e8c36efa827dbd4cafbc92dd90d76.
A new record with the body 'Learning Cassandra!' is added to the partition for 'iteblog'. Now, insert another record into the table as follows:
cqlsh:iteblog_keyspace> INSERT INTO "iteblog_users_with_status_updates"
... ("username", "id", "body")
... VALUES ('iteblog', NOW(), 'I love Cassandra!');
cqlsh:iteblog_keyspace> select username, email, encrypted_password, body from iteblog_users_with_status_updates;
username | email | encrypted_password | body
----------+----------------------------+------------------------------------+---------------------
iteblog | iteblog_hadoop@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | Learning Cassandra!
iteblog | iteblog_hadoop@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | I love Cassandra!
(2 rows)
cqlsh:iteblog_keyspace>
Notice that the email and encrypted_password columns were not specified in this INSERT statement. However, the query results show that the email and encrypted_password values for the new row are the same as the previous one.
Now, the user changes their email. For example:
cqlsh:iteblog_keyspace> UPDATE iteblog_users_with_status_updates SET email = 'iteblog@iteblog.com'
... WHERE username = 'iteblog';
cqlsh:iteblog_keyspace> select username, email, encrypted_password, body from iteblog_users_with_status_updates;
username | email | encrypted_password | body
----------+---------------------+------------------------------------+---------------------
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | Learning Cassandra!
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | I love Cassandra!
(2 rows)
The query output shows that the email for all rows with the username 'iteblog' has been updated. This demonstrates the power of static columns.
The table now contains the user's mailbox and password. If your frontend application allows users to modify their mailbox and password, the backend system must first retrieve the current values, as follows:
cqlsh:iteblog_keyspace> SELECT "username", "email", "encrypted_password"
... FROM "iteblog_users_with_status_updates"
... WHERE "username" = 'iteblog';
username | email | encrypted_password
----------+---------------------+------------------------------------
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76
(2 rows)
The query returns the mailbox and password for each row that has the username 'iteblog'. This is not the desired result. To fix this, add the DISTINCT keyword to your query. For example:
cqlsh:iteblog_keyspace> SELECT DISTINCT "username", "email", "encrypted_password"
... FROM "iteblog_users_with_status_updates"
... WHERE "username" = 'iteblog';
username | email | encrypted_password
----------+---------------------+------------------------------------
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76
(1 rows)
This ensures that only one row is returned, regardless of how many rows exist for the username 'iteblog'.
Although you use the DISTINCT keyword, Cassandra does not fetch all rows for the username 'iteblog' and then remove duplicates. This is because the static column data is stored only once at a low level, so deduplication is not required.