Adding tables to Find MoJ data from Create a Derived Table (CaDeT)
Find MoJ data uses the Create a Derived Table service (CaDeT) as a source of metadata about the Analytical Platform. CaDeT uses a python package called dbt.
By default, all models and sources will be ingested into the Datahub catalogue, but they will not be shown in the Find MoJ data service.
Make a model or source visible
To make a model or source visible in Find MoJ data, set the dc_display_in_catalogue
tag to that model. Config of CaDeT models is described in their documentation here.
For example, in dbt_project.yml
you can include
models:
courts:
some_subdirectory:
common_platform_derived:
+tags:
- dc_display_in_catalogue
For sources, include the tag in the properties file (models/sources/xyz.yml
) instead:
sources:
- description: "..."
meta:
# ...
name: "..."
tags:
- dc_display_in_catalogue
tables:
# ...
This tag should be used for sources and derived tables that users are expected to work with directly. Don’t add it to intermediate/staging tables.
Set required metadata
When adding new entities to the catalgoue, we require that you specify some additional metadata in DBT. For example:
models:
courts:
+meta:
dc_slack_channel_name: #ask-data-engineering
dc_slack_channel_url: https://moj.enterprise.slack.com/archives/C8X3PP1TN
dc_owner: Joe.Bloggs
For sources, add the additional metadata to meta
in the properties file:
sources:
- description: "..."
meta:
location: ""
number_of_tables: 42
source_file_last_updated: "..."
dc_slack_channel_name: #ask-data-engineering
dc_slack_channel_url: https://moj.enterprise.slack.com/archives/C8X3PP1TN
dc_owner: Joe.Bloggs
This metadata can be set at domain level, so for all tables in that domain, or individually on a per-table level.
The required fields are as follows:
field name | description | example |
---|---|---|
dc_slack_channel_name | The name of a slack channel to be used as a contact point for users of the catalogue service, including the leading ‘#’. Note: this is not the same as the owner channel for notifications. | #data-engineering |
dc_slack_channel_url | The URL to the slack channel | https://moj.enterprise.slack.com/archives/C8X3PP1TN |
dc_owner | The Datahub user ID for the data owner, usually in the form FirstName.LastName. This is the senior individual accountable for the data, not a data custodian. This is not the same as the DBT owner. | Joe.Bloggs |
Additional metadata
field name | description | example |
---|---|---|
dc_where_to_access_dataset | An enum representing how the data can be accessed by end users, eg a choice of [“AnalyticalPlatform”, “CourtsAPI”]. For DBT, this always defaults to AnalyticalPlatform. | AnalyticalPlatform |
Full example dbt_project.yml
file
models:
mojap_derived_tables:
+materialized: table
+group: default
+meta:
# Metadata to send Find MoJ data. Can be overriden
# per domain/model/source
dc_slack_channel_name: "#ask-data-modelling"
dc_slack_channel_url: https://moj.enterprise.slack.com/archives/C03J21VFHQ9
dc_where_to_access_dataset: AnalyticalPlatform
bold:
+meta:
dc_owner: jane.doe
+group: bold
bold_rr_pnc_ids:
+tags:
- bold_daily
- dc_display_in_catalogue
Full example properties file
sources:
- description: ""
meta:
location: ""
number_of_tables: 62
source_file_last_updated: "2024-09-15"
# Metadata to send Find MoJ data. Can be overriden
# per domain/model/source
dc_slack_channel_name: #ask-data-engineering
dc_slack_channel_url: https://moj.enterprise.slack.com/archives/C8X3PP1TN
dc_owner: Joe.Bloggs
name: alpha_vcms_data
tags:
- dc_display_in_catalogue
Ensure the data owner has an account in Datahub
The owner’s Datahub account must exist before you set the dc_owner_id
. This will happen automatically the first time they log into Datahub.
The user ID is visible in the URL of a user page in Datahub, e.g.
https://datahub-catalogue-dev.apps.live.cloud-platform.service.justice.gov.uk/user/urn:li:corpuser:Joe.Bloggs/owner%20of
Speak to Find MoJ data team if you would like us to manually add a set of users without them logging in.