Spark SQL 支持批-流,批-批以及流-流的JOIN,语义和传统批处理 JOIN 一致。

语法

tableReference [, tableReference ]* | tableexpression
[ joinType ] JOIN tableexpression [ joinCondition ];

约束

当进行流数据的 JOIN 操作时,有一些 JOIN 类型是不支持的,具体参考 Spark 官方文档说明,下面简要列举一些类型:
左表 右表 Join 类型 是否支持
Stream Static Inner Supported, not stateful
Left Outer Supported, not stateful
Right Outer Not supported
Full Outer Not supported
Static Stream Inner Supported, not stateful
Left Outer Not supported
Right Outer Supported, not stateful
Full Outer Not supported
Stream Stream Inner Supported, optionally specify watermark on both sides + time constraints for state cleanup.
Left Outer Conditionally supported, must specify watermark on right + time constraints for correct results, optionally specify watermark on left for all state cleanup.
Right Outer Conditionally supported, must specify watermark on left + time constraints for correct results, optionally specify watermark on right for all state cleanup.
Full Outer Not supported