Skip to content

Support multi-catalog#105

Merged
ueshin merged 21 commits into
databricks:mainfrom
ueshin:multi-catalog
Jun 14, 2022
Merged

Support multi-catalog#105
ueshin merged 21 commits into
databricks:mainfrom
ueshin:multi-catalog

Conversation

@ueshin
Copy link
Copy Markdown
Contributor

@ueshin ueshin commented May 26, 2022

resolves #95

Description

Supports multi-catalog.

Enables catalog or database config to use a different catalog for models.

  • model alternative_catalog
{{ config(
    catalog = 'alternative',
    materialized = 'table'
) }}

select * from {{ ref('seed') }}

Also enables to run cross catalog queries.

select * from {{ ref('seed') }}
union all select * from {{ ref('alternative_catalog')}}

Note: mixing Unity Catalog and Hive metastore tables is not recommened:

org.apache.spark.sql.AnalysisException: Non-Unity-Catalog object hive_metastore.schema_1.table_a can't be referenced in Unity Catalog objects

org.apache.spark.sql.AnalysisException: Create a persistent view that references both unity catalog and Hive metastore objects is not supported in Unity Catalog

@ueshin
Copy link
Copy Markdown
Contributor Author

ueshin commented May 26, 2022

This is based on #94.

@ueshin ueshin marked this pull request as ready for review June 9, 2022 21:22
Comment thread dbt/adapters/databricks/impl.py Outdated
current_catalog: Optional[str] = None
try:
if catalog is not None:
current_catalog = self.execute_macro(CURRENT_CATALOG_MACRO_NAME)[0][0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious is the result of current_catalog cached somewhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not cached.

Copy link
Copy Markdown
Collaborator

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Added a few comments.

Comment thread dbt/adapters/databricks/impl.py Outdated
yield as_dict

@contextmanager
def _catalog(self, catalog: Optional[str]) -> Iterator[None]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can add a comment here?

Comment thread dbt/include/databricks/macros/catalog.sql Outdated


class TestMultiCatalogTableModels(TestMultiCatalog):
@use_profile("databricks_uc_cluster")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would the test behave if we run it against a non-uc cluster?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will just be skipped.

ueshin and others added 2 commits June 13, 2022 22:13
Co-authored-by: allisonwang-db <allison.wang@databricks.com>
@ueshin ueshin requested a review from allisonwang-db June 14, 2022 18:34
Copy link
Copy Markdown
Collaborator

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@ueshin
Copy link
Copy Markdown
Contributor Author

ueshin commented Jun 14, 2022

Thanks! merging.

@ueshin ueshin merged commit 0c48ecb into databricks:main Jun 14, 2022
@ueshin ueshin deleted the multi-catalog branch June 14, 2022 23:46
ueshin added a commit to ueshin/dbt-databricks that referenced this pull request Jun 14, 2022
resolves databricks#95

### Description

Supports multi-catalog.

Enables `catalog` or `database` config to use a different catalog for models.

- model `alternative_catalog`

```sql
{{ config(
    catalog = 'alternative',
    materialized = 'table'
) }}

select * from {{ ref('seed') }}
```

Also enables to run cross catalog queries.

```sql
select * from {{ ref('seed') }}
union all select * from {{ ref('alternative_catalog')}}
```

Note: mixing Unity Catalog and Hive metastore tables is not recommened:

> org.apache.spark.sql.AnalysisException: Non-Unity-Catalog object `hive_metastore`.`schema_1`.`table_a` can't be referenced in Unity Catalog objects

> org.apache.spark.sql.AnalysisException: Create a persistent view that references both unity catalog and Hive metastore objects is not supported in Unity Catalog

Co-authored-by: allisonwang-db <allison.wang@databricks.com>
ueshin added a commit that referenced this pull request Jun 15, 2022
resolves #95

### Description

Supports multi-catalog.

Enables `catalog` or `database` config to use a different catalog for models.

- model `alternative_catalog`

```sql
{{ config(
    catalog = 'alternative',
    materialized = 'table'
) }}

select * from {{ ref('seed') }}
```

Also enables to run cross catalog queries.

```sql
select * from {{ ref('seed') }}
union all select * from {{ ref('alternative_catalog')}}
```

Note: mixing Unity Catalog and Hive metastore tables is not recommened:

> org.apache.spark.sql.AnalysisException: Non-Unity-Catalog object `hive_metastore`.`schema_1`.`table_a` can't be referenced in Unity Catalog objects

> org.apache.spark.sql.AnalysisException: Create a persistent view that references both unity catalog and Hive metastore objects is not supported in Unity Catalog

Co-authored-by: allisonwang-db <allison.wang@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for Databricks CATALOG as a DATABASE in DBT compilations

2 participants