SQL-based ingestion known issues

Multi-stage query task runtime

Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.
Worker task stage outputs are stored in the working directory given by druid.indexer.task.baseDir. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including "No space left on device".

GROUPING SETS are not implemented. Queries using these features return a QueryNotSupported error.

The INSERT and REPLACE statements with column lists, like INSERT INTO tbl (a, b, c) SELECT ..., is not implemented.
INSERT ... SELECT and REPLACE ... SELECT insert columns from the SELECT statement based on column name. This differs from SQL standard behavior, where columns are inserted based on position.
INSERT and REPLACE do not support all options available in ingestion specs, including the createBitmapIndex and multiValueHandling dimension properties, and the indexSpec tuningConfig property.

The schemaless dimensions feature is not available. All columns and their types must be specified explicitly using the signature parameter of the EXTERN function.
EXTERN with input sources that match large numbers of files may exhaust available memory on the controller task.
EXTERN refers to external files. Use FROM to access druid input sources.

The maximum number of elements in a window cannot exceed a value of 100,000.
To avoid leafOperators in MSQ engine, window functions have an extra scan stage after the window stage for cases where native engine has a non-empty leafOperator.

The following known issues and limitations affect automatic compaction with the MSQ task engine:

The metricSpec field is only supported for certain aggregators. For more information, see Supported aggregators.
Only dynamic and range-based partitioning are supported.
Set rollup to true if and only if metricSpec is not empty or null.
You can only partition on string dimensions. However, multi-valued string dimensions are not supported.
The maxTotalRows config is not supported in DynamicPartitionsSpec. Use maxRowsPerSegment instead.
Segments can only be sorted on __time as the first column.