Refine staging: Add more parallelization
Currently, the process is too slow. Running the daily dag in production takes almost an hour, with only 2 task retries and a 15-minute task timeout.
This patch parameterizes max_active_tis_per_dag, which determines the limits of active task instances (TIs) across all runs of a DAG, which should exceed the max_active_tasks limit applicable for a single DAG run.