SQL-based ingestion known issues
This page describes SQL-based batch ingestion using the druid-multi-stage-query
extension, new in Druid 24.0. Refer to the ingestion methods table to determine which
ingestion method is right for you.
Multi-stage query task runtime
-
Fault tolerance is partially implemented. Workers get relaunched when they are killed unexpectedly. The controller does not get relaunched if it is killed unexpectedly.
-
Worker task stage outputs are stored in the working directory given by
druid.indexer.task.baseDir. Stages that generate a large amount of output data may exhaust all available disk space. In this case, the query fails with an UnknownError with a message including "No space left on device".
SELECT Statement
GROUPING SETSare not implemented. Queries using these features return a QueryNotSupported error.
INSERT and REPLACE Statements
-
The
INSERTandREPLACEstatements with column lists, likeINSERT INTO tbl (a, b, c) SELECT ..., is not implemented. -
INSERT ... SELECTandREPLACE ... SELECTinsert columns from theSELECTstatement based on column name. This differs from SQL standard behavior, where columns are inserted based on position. -
INSERTandREPLACEdo not support all options available in ingestion specs, including thecreateBitmapIndexandmultiValueHandlingdimension properties, and theindexSpectuningConfigproperty.
EXTERN Function
-
The schemaless dimensions feature is not available. All columns and their types must be specified explicitly using the
signatureparameter of theEXTERNfunction. -
EXTERNwith input sources that match large numbers of files may exhaust available memory on the controller task. -
EXTERNrefers to external files. UseFROMto accessdruidinput sources.
WINDOW Function
- The maximum number of elements in a window cannot exceed a value of 100,000.
- To avoid
leafOperatorsin MSQ engine, window functions have an extra scan stage after the window stage for cases where native engine has a non-emptyleafOperator.
Automatic compaction
The following known issues and limitations affect automatic compaction with the MSQ task engine:
- The
metricSpecfield is only supported for certain aggregators. For more information, see Supported aggregators. - Only dynamic and range-based partitioning are supported.
- Set
rolluptotrueif and only ifmetricSpecis not empty or null. - You can only partition on string dimensions. However, multi-valued string dimensions are not supported.
- The
maxTotalRowsconfig is not supported inDynamicPartitionsSpec. UsemaxRowsPerSegmentinstead. - Segments can only be sorted on
__timeas the first column.