Use new mediawiki-jdbc datasource
Get an optimized and partitioned dataframe passing in only the table name, a partition column, and a boundary query.
Tested to pull 22 million records from enwiki.revision pretty quickly
Bug: T372677
How to register an account on GitLab. Due to spam, new accounts are locked until approved by an admin or the approver bot. Your GitLab account gets automatically approved within one hour if you are a member of Trusted Contributors in Gerrit, or a member of the Trusted-Contributors group in Phabricator and linked your Developer account to your Phabricator account. If none of these apply, you can file an unlock request to expedite access.
Support: mw:GitLab, how to host a project on GitLab, #wikimedia-gitlab on libera.chat, #GitLab on Phabricator.
Get an optimized and partitioned dataframe passing in only the table name, a partition column, and a boundary query.
Tested to pull 22 million records from enwiki.revision pretty quickly
Bug: T372677