Skip to main content

Adding metadata from Airflow to Find MoJ data

AP Users using Airflow for data workloads will generally register outputs with the AWS Glue catalog, so that they are accessible for analysis via AWS Athena. Find MoJ data can easily ingest from AWS Glue, please follow our instructions to do so.

To define Airflow metadata in code, so that it is not just stored in the Glue catalogue, use awswrangler.s3.store_parquet_metadata to attach metadata to the parquet file created from your Airflow job. Here’s an example of this being done.

This page was last reviewed on 24 October 2024. It needs to be reviewed again on 24 April 2025 by the page owner #data-catalogue .