Implementing metadata freshness checks#481
Conversation
|
BTW, there may be additional tests required as the guide calls out two more functional tests, but since those tests had a lot of snowflake specific comments in them (and so far are only in the snowflake adapter), I'm waiting until they're part of core before implementing here. I wanted to get this code up for people to look at while I'm away. |
| def get_relation_type(cls) -> Type[DatabricksRelationType]: | ||
| return DatabricksRelationType | ||
|
|
||
| def information_schema(self, view_name=None) -> InformationSchema: |
There was a problem hiding this comment.
This is mostly copied from core, but I had to override so that it would use an information schema that used "`" for quoting, since InformationSchema inherits from BaseRelation.
| project.adapter.drop_schema(relation) | ||
|
|
||
| def test_get_relation_last_modified(self, project, set_env_vars, custom_schema): | ||
| project.run_sql( |
There was a problem hiding this comment.
This test is currently hand copied from dbt-snowflake. Nothing I do in the test is specific to databricks, so hopefully it gets pulled into core.
There was a problem hiding this comment.
Essentially the test just says, were we able to run the metadata commands and not warn/error. We definitely need another test that asserts that something useful happens. Will add after I'm back.
There was a problem hiding this comment.
Updated the test to ensure the freshness tests passed, as opposed to not having a warning or error. @dataders might be worth considering this form for core; to me at least its more obvious what the test is doing (as opposed to the probe function). Also, this file did not match the format I found in the documentation for sources.json, btw.
|
@benc-db you da real MVP for posting this. I'll definitely flag this for the adapters eng team to look at once we're back from Coalesce |
susodapop
left a comment
There was a problem hiding this comment.
LGTM with a few comments. Agree we need a test of the true output of the freshness check. But the smoke test is a good start. Nice work
| {Capability.TableLastModifiedMetadata: CapabilitySupport(support=Support.Full)} | ||
| ) | ||
|
|
||
| @available.parse(lambda *a, **k: 0) |
There was a problem hiding this comment.
Is the first time our dialect has needed to use _capabilities? Or had we just inherited the _capabilities from dbt-spark? Not a blocker for this PR but I wonder if more capabilities should be reflected explicitly here.
There was a problem hiding this comment.
This is new in 1.7
| quote_character: str = "`" | ||
|
|
||
| def is_hive_metastore(self): | ||
| return self.database is None or self.database == "hive_metastore" |
There was a problem hiding this comment.
Just confirming: does dbt's database == Databricks' schema?
There was a problem hiding this comment.
database == catalog.
| {%- for relation in relations -%} | ||
| select '{{ relation.schema }}' as schema, | ||
| '{{ relation.identifier }}' as identifier, | ||
| max(timestamp) as last_modified, |
There was a problem hiding this comment.
just confirming: the output of a Databricks SQL query that includes a max() aggregation is never more than a single row, yes? That's why there is no GROUP BY present here?
There was a problem hiding this comment.
with the absence of group by, it will be a single row.
Co-authored-by: Jesse <jesse.whitehouse@databricks.com>
Description
Addresses the 'metadata freshness changes' part of dbt-labs/dbt-core#8307
Doing this in a way that can work for hive and unity was interesting. I'm a little concerned about the accuracy/perf around the hive implementation, but its the best I can do, and I don't have a great way to insert a check for hive metastore earlier in the call chain.
@dataders any concern that if someone has a view as a source (as is the case in my bigger transform jobs), then this mechanism doesn't work? I didn't see if there was any logic that only called down to this path if the relation is a table. Both using information schema in this way and describe history are table-only features. I mean we could look at information_schema.views, but last altered doesn't give the same sort of information for views as it does for tables.
Checklist
CHANGELOG.mdand added information about my change to the "dbt-databricks next" section.