Strict mode

Strict mode is an optional property that you can configure for data loads. It affects the loading behavior and the final loaded data.

This topic introduces what strict mode is and how to set strict mode.

Understand strict mode

During data loading, the data types of the source columns may not be completely consistent with the data types of the destination columns. In such cases, CelerData performs conversions on the source column values that have inconsistent data types. Data conversions may fail due to various issues such as unmatched field data types and field length overflows. Source column values that fail to be properly converted are unqualified column values, and source rows that contain unqualified column values are referred to as "unqualified rows". Strict mode is used to control whether to filter out unqualified rows during data loading.

Strict mode works as follows:

  • If strict mode is enabled, CelerData loads only qualified rows. It filters out unqualified rows and returns details about the unqualified rows.
  • If strict mode is disabled, CelerData converts unqualified column values into NULL and loads unqualified rows that contain these NULL values together with qualified rows.

Note the following points:

  • In actual business scenarios, both qualified and unqualified rows may contain NULL values. If the destination columns do not allow NULL values, CelerData reports errors and filters out the rows that contain NULL values.

  • The maximum percentage of unqualified rows that can be filtered out for a Stream Load, Broker Load, or Routine Load job is controlled by an optional job property max_filter_ratio. INSERT does not support setting the max_filter_ratio property.

For example, you want to load four rows that hold \N (\N denotes a NULL value), abc, 2000, and 1 values respectively in a column from a CSV-formatted data file into a CelerData table, and the data type of the destination CelerData table column is TINYINT [-128, 127].

  • The source column value \N is processed into NULL upon conversion to TINYINT.

    • NOTE

      \N is always processed into NULL upon conversion regardless of the destination data type.

  • The source column value abc is processed into NULL, because its data type is not TINYINT and the conversion fails.

  • The source column value 2000 is processed into NULL, because it is beyond the range supported by TINYINT and the conversion fails.

  • The source column value 1 can be properly converted to a TINYINT-type value 1.

If strict mode is disabled, CelerData loads all the four rows.

If strict mode is enabled, CelerData loads only the rows that hold \N or 1 and filters out the rows that hold abc or 2000. The rows filtered out are counted against the maximum percentage of rows that can be filtered out due to inadequate data quality as specified by the max_filter_ratio parameter.

Final loaded data with strict mode disabled

Source column valueColumn value upon conversion to TINYINTLoad result when destination column allows NULL valuesLoad result when destination column does not allow NULL values
\NNULLThe value NULL is loaded.An error is reported.
abcNULLThe value NULL is loaded.An error is reported.
2000NULLThe value NULL is loaded.An error is reported.
11The value 1 is loaded.The value 1 is loaded.

Final loaded data with strict mode enabled

Source column valueColumn value upon conversion to TINYINTLoad result when destination column allows NULL valuesLoad result when destination column does not allow NULL values
\NNULLThe value NULL is loaded.An error is reported.
abcNULLThe value NULL is not allowed and therefore is filtered out.An error is reported.
2000NULLThe value NULL is not allowed and therefore is filtered out.An error is reported.
11The value 1 is loaded.The value 1 is loaded.

Set strict mode

If you run a Stream Load, Broker Load, or Routine Load job to load data, use the strict_mode parameter to set strict mode for the load job. Valid values are true and false. The default value is false. The value true enables strict mode, and the value false disables strict mode.

If you execute INSERT to load data, use the enable_insert_strict session variable to set strict mode. Valid values are true and false. The default value is true. The value true enables strict mode, and the value false disables strict mode.

Examples are as follows:

Stream Load

curl --location-trusted -u <username>:<password> \
    -H "strict_mode: {true | false}" \
    -T <file_name> -XPUT \
    http://<fe_host>/api/<database_name>/<table_name>/_stream_load

For detailed syntax and parameters about Stream Load, see STREAM LOAD.

Broker Load

LOAD LABEL [<database_name>.]<label_name>
(
    DATA INFILE ("<file_path>"[, "<file_path>" ...])
    INTO TABLE <table_name>
)
WITH BROKER
(
    "username" = "<hdfs_username>",
    "password" = "<hdfs_password>"
)
PROPERTIES
(
    "strict_mode" = "{true | false}"
)

The preceding code snippet uses HDFS as an example. For detailed syntax and parameters about Broker Load, see BROKER LOAD.

Routine Load

CREATE ROUTINE LOAD [<database_name>.]<job_name> ON <table_name>
PROPERTIES
(
    "strict_mode" = "{true | false}"
) 
FROM KAFKA
(
    "kafka_broker_list" ="<kafka_broker1_ip>:<kafka_broker1_port>[,<kafka_broker2_ip>:<kafka_broker2_port>...]",
    "kafka_topic" = "<topic_name>"
)

The preceding code snippet uses Apache Kafka® as an example. For detailed syntax and parameters about Routine Load, see CREATE ROUTINE LOAD.

INSERT

SET enable_insert_strict = {true | false};
INSERT INTO <table_name> ...

For detailed syntax and parameters about INSERT, see INSERT.