Adding metadata from Airflow to Find MoJ data
AP Users using Airflow for data workloads will generally register outputs with the AWS Glue catalog, so that they are accessible for analysis via AWS Athena. Find MoJ data can easily ingest from AWS Glue, please follow our instructions to do so.
To define Airflow metadata in code, so that it is not just stored in the Glue catalogue, use awswrangler.s3.store_parquet_metadata
to attach metadata to the parquet file created from your Airflow job. Here’s an example of this being done.
This page was last reviewed on 24 October 2024.
It needs to be reviewed again on 24 April 2025
by the page owner #data-catalogue
.
This page was set to be reviewed before 24 April 2025
by the page owner #data-catalogue.
This might mean the content is out of date.