branch-4.0: [pick](pr) #58396 #60540 #60684#62320
Open
zhangstar333 wants to merge 5 commits intoapache:branch-4.0from
Open
branch-4.0: [pick](pr) #58396 #60540 #60684#62320zhangstar333 wants to merge 5 commits intoapache:branch-4.0from
zhangstar333 wants to merge 5 commits intoapache:branch-4.0from
Conversation
Problem Summary: before in pr : apache#24788 Previously, CLUSTER BY was used to define sort columns but with limited syntax (ASC only, no sort order control). This PR changes it to ORDER BY, which is more intuitive and flexible. Users can now explicitly specify sort direction and nulls order for each column. The default remains ASC with NULLS FIRST for column order. and support order by clause in iceberg table ``` CREATE TABLE `test_table2` ( `id` int NULL, `name` text NULL, `score` double NULL, `create_time` datetimev2(6) NULL ) ENGINE=ICEBERG_EXTERNAL_TABLE ORDER BY (`id` ASC NULLS FIRST, `score` DESC NULLS LAST) LOCATION 's3a://warehouse/wh/test_with_sr/test_table2' PROPERTIES ( "write-format" = "ORC", "doris.version" = "doris-0.0.0-2fa88d38b0", "write.parquet.compression-codec" = "zstd" ); ```
…les (apache#58396) ### What problem does this PR solve? ### Proposed changes This PR implements static partition overwrite functionality for Iceberg external tables, allowing users to precisely overwrite specific partitions using the `INSERT OVERWRITE ... PARTITION (col='value', ...)` syntax. ### Background Before this PR, Doris supports: - ✅ `INSERT INTO` with dynamic partition for Iceberg tables - ✅ `INSERT OVERWRITE` for full table replacement - ❌ `INSERT OVERWRITE ... PARTITION (...)` for static partition overwrite ### New Features 1. **Full Static Partition Mode**: Overwrite a specific partition when all partition columns are specified ```sql INSERT OVERWRITE TABLE iceberg_db.tbl PARTITION (dt='2025-01-25', region='bj') SELECT id, name FROM source_table; ``` 2. **Hybrid Partition Mode**: Partial static + partial dynamic partition ```sql -- dt is static, region comes from SELECT dynamically INSERT OVERWRITE TABLE iceberg_db.tbl PARTITION (dt='2025-01-25') SELECT id, name, region FROM source_table; ``` ### Implementation Details #### FE Changes - **Parser** (`DorisParser.g4`, `LogicalPlanBuilder.java`): Extended partition spec parsing to support `PARTITION (col='value', ...)` syntax - **InsertPartitionSpec**: New unified data structure to represent partition modes (auto-detect, dynamic, static) - **UnboundIcebergTableSink**: Added `staticPartitionKeyValues` field to carry static partition info - **BindSink**: Added validation for static partition columns and generate constant expressions for static partition values - **IcebergTransaction**: Implemented `commitStaticPartitionOverwrite()` using Iceberg's `OverwriteFiles.overwriteByRowFilter()` API - **IcebergUtils**: Added `parsePartitionValueFromString()` utility for partition value type conversion #### BE Changes - **VIcebergTableWriter**: - Support full static partition mode (all data goes to single partition) - Support hybrid partition mode (static columns from config, dynamic columns from data) - Added `_is_full_static_partition` and `_dynamic_partition_column_indices` for mode detection #### Thrift Changes - Added `static_partition_values` field to `TIcebergTableSink` for passing static partition info from FE to BE
…pache#60540) Problem Summary: support write iceberg table with sort-order, the write data have been local sorted, and have add lower/upper_bounds metadata. so the iceberg plan could use it to prune datafile. **Notes**: this is only a local sort, not global sort. so if you are more parallel about iceberg writer, you many see overlapping of lower/upper_bounds between files. if you need a global sort, maybe could add order by cluster in the insert SQL. you could create table, and then alter table eg: ``` CREATE TABLE test_table2 ( id INT, name STRING, score DOUBLE, create_time datetime ) ORDER BY ( id ASC NULLS FIRST, score DESC NULLS LAST) PROPERTIES ( 'write-format'='ORC' ); ```
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
dce0997 to
ca838dd
Compare
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
Contributor
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
FE UT Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
cherry-pick from master #58396 (#60540) #60684
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)