Describe the bug
Adding "replicated_deduplication_window=0" setting creates duplicates in DBT entities with a ReplicatedMergeTree engine. The table is not incremental, and I've added allow_automatic_deduplication: "True" to profiles.yml but it doesn't help.
In case I'm launching the same DBT-generated script for table creation manually with SETTINGS replicated_deduplication_window=0 removed - it works fine returning 2 records from table as a result
Steps to reproduce
- Create test model "test_dbt_model"
with source_data as (
select 1 as id
union all
select null as id
)
select *
from source_data
- Set it's config:
engine: ReplicatedMergeTree()
materialized: table
- Run dbt model
dbt run -s test_dbt_model
- The final result shows 4 records inserted, 2 pairs of similar records:
default> select *
from dbt.test_dbt_model
limit 1000
[2023-12-04 22:30:59] 4 rows retrieved starting from 1 in 110 ms (execution: 100 ms, fetching: 10 ms)
Expected behaviour
DBT creates a test_dbt_model table with 2 rows: id = 1 and id is null
Code examples, such as models or profile settings
- dbt_project.yml
models:
batch_etl:
example:
+materialized: table
+engine: ReplicatedMergeTree()
- profiles.yml
prod:
type: clickhouse
schema: dbt
host: "{{ env_var('WH_DEV_HOST') }}"
port: 8123
user: "{{ env_var('WH_USER') }}"
password: "{{ env_var('WH_PW') }}"
cluster: "{{ env_var('CH_CLUSTER_NAME') }}"
check_exchange: False
secure: False
allow_automatic_deduplication: True
dbt and/or ClickHouse server logs
Generated SQL on insert:
Configuration
Environment
- dbt version: 1.6.9
- dbt-clickhouse version: 1.6.0
- clickhouse-driver version (if using native)
- clickhouse-connect version (if using http):
- Python version: 3.8.10
- Operating system: Windows 10
ClickHouse server
- ClickHouse Server version: 22.3.15.33
- ClickHouse Server non-default settings, if any:
CREATE TABLE statements for tables involved:
create table dbt.test_dbt_model__dbt_backup
ON CLUSTER "test_v2"
engine = ReplicatedMergeTree()
order by (tuple())
SETTINGS replicated_deduplication_window=0
as (
with source_data as (
select 1 as id
union all
select null as id
)
select *
from source_data
)
Describe the bug
Adding "replicated_deduplication_window=0" setting creates duplicates in DBT entities with a ReplicatedMergeTree engine. The table is not incremental, and I've added allow_automatic_deduplication: "True" to profiles.yml but it doesn't help.
In case I'm launching the same DBT-generated script for table creation manually with SETTINGS replicated_deduplication_window=0 removed - it works fine returning 2 records from table as a result
Steps to reproduce
dbt run -s test_dbt_modelExpected behaviour
DBT creates a test_dbt_model table with 2 rows: id = 1 and id is null
Code examples, such as models or profile settings
dbt and/or ClickHouse server logs
Generated SQL on insert:
Configuration
Environment
ClickHouse server
CREATE TABLEstatements for tables involved: