Packaging ImageMatching as a Python wheel
This MR is a refactoring of ImageMatching to make it more compliant with Python’s packaging tooling and practices (setuptools ).
The problem I’m trying to solve is to identify a boundary between upstream (research) code and how we use downstream in product features.
How to test
We can install algorunner in an env (e.g. stats machines) and launch the pyspark/papermill job with
$ (venv) pip install algorunner --extra-index-url https://gitlab.wikimedia.org/api/v4/projects/40/packages/pypi/simple
$ algorunner.py 2021-07-26 hywiki Output
Changes
- The
ImageMatching
repo now contains only notebooks, an nbconverted script & papermill runners. - I moved all etl and test infra to the
platfor-airflow-dags
repo. - I created an ima package. Right now it contains notebooks. In the future it could host a library (that the notebooks can import).
-
setuptools
(setup.py
) is configured to package notebooks and scripts in a wheel, and installs them in PYTHONPATH (e.g. ./venv/lib/python3.7/site-packages/ima and ./venv/bin/ ). Scripts are also added to PATH. - CI builds and deploys the wheel to pypi (https://gitlab.wikimedia.org/gmodena/ImageMatching/-/pipelines/670).