search: process_sparql_query workaround oom issues (take 2) (!791) · Merge requests · repos / data-engineering / Airflow DAGs

This has been failing more and more, make a couple changes to try and get things running more consistently:

Increase output partitions to 4. On review of a few days data we see typical outputs of 1.5-3GB. Split the output into 4 partitions so we don't try and do as much work all in the same place.
Add some memory overhead. This was at the default of 10% before, or ~1.6GB. Increase to 3gb since yarn is killing executions due to overrunning.
Disable adaptive query execution. Not entirely sure that this is necessary, but in testing from a jupyter notebook it would regularly fail without this, but pass with it on. This job is very simple and shouldn't need anything fancy from AQE.

Admin message