Bump up Spark config for anomaly detection DAGs
Some jobs have had OOMs and failed during the last days. 2 of the 3 anomaly detection DAGs running today are querying 1 full day of pageview_hourly, which probably needs a bit more than the Spark default config. This change bumps up the resources for all anomaly detection DAGs.
This is not a definitive fix, since each anomaly detection DAG should be able to specify their own Spark config.
However, I argue in favor of tackling that when we refactor the anomaly detection DAGs, to include the use of datasets.yaml
, get_easy_dag
and DagProperties
. When we do that, I'd recommend getting rid of the DAG template, and have each DAG specify all their operators.