Skip to content

replicated_deduplication_window=0 setting produces duplicates in DBT #216

Description

@Timev

Describe the bug

Adding "replicated_deduplication_window=0" setting creates duplicates in DBT entities with a ReplicatedMergeTree engine. The table is not incremental, and I've added allow_automatic_deduplication: "True" to profiles.yml but it doesn't help.
In case I'm launching the same DBT-generated script for table creation manually with SETTINGS replicated_deduplication_window=0 removed - it works fine returning 2 records from table as a result

Steps to reproduce

  1. Create test model "test_dbt_model"
with source_data as (
    select 1 as id
    union all
    select null as id
)
select *
from source_data
  1. Set it's config:
engine: ReplicatedMergeTree()
materialized: table
  1. Run dbt model
    dbt run -s test_dbt_model
  2. The final result shows 4 records inserted, 2 pairs of similar records:
default> select * 
         from dbt.test_dbt_model
         limit 1000
[2023-12-04 22:30:59] 4 rows retrieved starting from 1 in 110 ms (execution: 100 ms, fetching: 10 ms)

Expected behaviour

DBT creates a test_dbt_model table with 2 rows: id = 1 and id is null

Code examples, such as models or profile settings

  1. dbt_project.yml
models:
  batch_etl:
    example:
      +materialized: table
      +engine: ReplicatedMergeTree()
  1. profiles.yml
    prod:
      type: clickhouse
      schema: dbt
      host: "{{ env_var('WH_DEV_HOST') }}"
      port: 8123
      user: "{{ env_var('WH_USER') }}"
      password: "{{ env_var('WH_PW') }}"
      cluster: "{{ env_var('CH_CLUSTER_NAME') }}"
      check_exchange: False
      secure: False
      allow_automatic_deduplication: True

dbt and/or ClickHouse server logs

Generated SQL on insert:

Configuration

Environment

  • dbt version: 1.6.9
  • dbt-clickhouse version: 1.6.0
  • clickhouse-driver version (if using native)
  • clickhouse-connect version (if using http):
  • Python version: 3.8.10
  • Operating system: Windows 10

ClickHouse server

  • ClickHouse Server version: 22.3.15.33
  • ClickHouse Server non-default settings, if any:
  • CREATE TABLE statements for tables involved:
        create table dbt.test_dbt_model__dbt_backup
        
  
    
    ON CLUSTER "test_v2" 
        
  engine = ReplicatedMergeTree()
        
      order by (tuple())
        
        
        SETTINGS  replicated_deduplication_window=0

          as (
            with source_data as (
    select 1 as id
    union all
    select null as id
)
select *
from source_data
          )

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions