Prepare for Airflow deployment
Highlights:
- fuse scored topics with null ones
- test final output
- add pipeline docstring with a link to the expected output schema
- no explicit output partitioning. It worsens performance and isn't needed, since downstream jobs are expected to read all the data all the time
- make I/O HDFS paths relative to a working directory
- optionally pass the working directory to the CLI, defaulting to
section_topics
relative to the current user home - update CLI