Skip to content

search: process_sparql_query workaround oom issues (take 2)

Ebernhardson requested to merge work/ebernhardson/process-sparql-query-hourly into main

This has been failing more and more, make a couple changes to try and get things running more consistently:

  • Increase output partitions to 4. On review of a few days data we see typical outputs of 1.5-3GB. Split the output into 4 partitions so we don't try and do as much work all in the same place.

  • Add some memory overhead. This was at the default of 10% before, or ~1.6GB. Increase to 3gb since yarn is killing executions due to overrunning.

  • Disable adaptive query execution. Not entirely sure that this is necessary, but in testing from a jupyter notebook it would regularly fail without this, but pass with it on. This job is very simple and shouldn't need anything fancy from AQE.

Merge request reports