Skip to content

T370851-T376721 Use external task sensor and update paths

Andrew McAllister (WMDE) requested to merge T370851-T376721-wmde-dag-fixes into main

Contributor checklist

Note: Tests were ran for previous MRs and do not need to be reran as there are no changes to job queries.

  • I have tested the included DAGs in my local database using the process outlined in TEST_AIRFLOW_DAGS.md and the test variable files provided for each DAG

    • Box is checked for tests for T370851

Note: T376721 is changing some paths to files, so going to production is ok as worst comes to worst the HQL file to generate the CSV isn't accessed and thus no data is exported.

  • All Hive tables that are needed by the included DAG jobs have been created and are accessible by the analytics-wmde Airflow user
    • All tables were created for prior MRs

Description

This MR contains work for two different tasks:

  • T370851: It was found that the wd_query_segments_daily DAG will not pick up data for every hour within the partitioned database as it is possible that the Wikidata Query Service has no requests in an hour. We thus need to switch the base sensor over to an ExternalTaskSensor.
  • T376721: Some of the identifiers for previously deployed wd_rest_api_metrics_daily and wd_item_sitelink_segments_weekly DAG jobs need to be updated to be more standardized. Specifically TASK_ID_gen_csv_DAG_INTERVAL and create_DAG_ID_table should be renamed gen_csv_DAG_ID and create_table_DAG_ID respectively to assure that the DAG ID is always unbroken and that jobs outside the main DAG job query are always prepended to the DAG ID.

Test outputs

Jobs are not changed, so testing isn't required.

Edited by Andrew McAllister (WMDE)

Merge request reports