Fine tune Dumps 2.0 backfill and event ingestion.
A couple of spark.sql.shuffle.partitions
changes to make Dumps 2.0 backfill and event ingestion more efficient.
- When doing the backfill MERGE,
spark.sql.shuffle.partitions=5120
create way less files while still generating enough tasks to keep Spark busy. - When doing the event MERGE,
spark.sql.shuffle.partitions=64
also creates way less files per hour, and it doesn't affect performance much. - We also introduce a helper function
util.dict_add_or_append_string_value()
to update appendable configurations.
Bug: T340863