Skip to main content

Adding tables to Find MoJ data from Create a Derived Table (CaDeT)

Find MoJ data uses the Create a Derived Table service (CaDeT) as a source of metadata about the Analytical Platform. CaDeT uses a python package called dbt.

By default, all models and sources will be ingested into the Datahub catalogue, but they will not be shown in the Find MoJ data service.

Make a model or source visible

To make a model or source visible in Find MoJ data, set the dc_display_in_catalogue tag to that model. Config of CaDeT models is described in their documentation here.

For example, in dbt_project.yml you can include

models:
  courts:
    some_subdirectory:
      common_platform_derived:
        +tags:
          - dc_display_in_catalogue

For sources, include the tag in the properties file (models/sources/xyz.yml) instead:

sources:
  - description: "..."
    meta:
      # ...
    name: "..."
    tags:
      - dc_display_in_catalogue
    tables:
      # ...

This tag should be used for sources and derived tables that users are expected to work with directly. Don’t add it to intermediate/staging tables.

Set required metadata

When adding new entities to the catalgoue, we require that you specify some additional metadata in DBT. For example:

models:
  courts:
    +meta:
      dc_slack_channel_name: #ask-data-engineering
      dc_slack_channel_url: https://moj.enterprise.slack.com/archives/C8X3PP1TN
      dc_owner: Joe.Bloggs

For sources, add the additional metadata to meta in the properties file:

sources:
  - description: "..."
    meta:
      location: ""
      number_of_tables: 42
      source_file_last_updated: "..."
      dc_slack_channel_name: #ask-data-engineering
      dc_slack_channel_url: https://moj.enterprise.slack.com/archives/C8X3PP1TN
      dc_owner: Joe.Bloggs

This metadata can be set at domain level, so for all tables in that domain, or individually on a per-table level.

The required fields are as follows:

field name description example
dc_slack_channel_name The name of a slack channel to be used as a contact point for users of the catalogue service, including the leading ‘#’. Note: this is not the same as the owner channel for notifications. #data-engineering
dc_slack_channel_url The URL to the slack channel https://moj.enterprise.slack.com/archives/C8X3PP1TN
dc_owner The Datahub user ID for the data owner, usually in the form FirstName.LastName. This is the senior individual accountable for the data, not a data custodian. This is not the same as the DBT owner. Joe.Bloggs

Additional metadata

field name description example
dc_where_to_access_dataset An enum representing how the data can be accessed by end users, eg a choice of [“AnalyticalPlatform”, “CourtsAPI”]. For DBT, this always defaults to AnalyticalPlatform. AnalyticalPlatform

Full example dbt_project.yml file

models:
  mojap_derived_tables:
    +materialized: table
    +group: default
    +meta:
      # Metadata to send Find MoJ data. Can be overriden
      # per domain/model/source
      dc_slack_channel_name: "#ask-data-modelling"
      dc_slack_channel_url: https://moj.enterprise.slack.com/archives/C03J21VFHQ9
      dc_where_to_access_dataset: AnalyticalPlatform
    bold:
      +meta:
        dc_owner: jane.doe
      +group: bold
      bold_rr_pnc_ids:
      +tags:
        - bold_daily
        - dc_display_in_catalogue

Full example properties file

sources:
  - description: ""
    meta:
      location: ""
      number_of_tables: 62
      source_file_last_updated: "2024-09-15"
      # Metadata to send Find MoJ data. Can be overriden
      # per domain/model/source
      dc_slack_channel_name: #ask-data-engineering
      dc_slack_channel_url: https://moj.enterprise.slack.com/archives/C8X3PP1TN
      dc_owner: Joe.Bloggs
    name: alpha_vcms_data
    tags:
      - dc_display_in_catalogue

Ensure the data owner has an account in Datahub

The owner’s Datahub account must exist before you set the dc_owner_id. This will happen automatically the first time they log into Datahub.

The user ID is visible in the URL of a user page in Datahub, e.g.

https://datahub-catalogue-dev.apps.live.cloud-platform.service.justice.gov.uk/user/urn:li:corpuser:Joe.Bloggs/owner%20of

Speak to Find MoJ data team if you would like us to manually add a set of users without them logging in.

This page was last reviewed on 29 July 2024. It needs to be reviewed again on 29 January 2025 by the page owner #data-catalogue .
This page was set to be reviewed before 29 January 2025 by the page owner #data-catalogue. This might mean the content is out of date.