2 Working with a Project¶
Attention
We assume you have already cut a project by following the instructions, and you are in the project directory, /{ base_folder }/{{ cookiecutter.repo_slug }}
.
2.1 Overview¶
A project cut from cookiecutter-ds-docker
consists of a docker-compose stack with the services below:
- A customized Jupyter service with a starter Python package installed. It runs on Python 3.7.
- An mlflow tracking server to log experiments.
- A postgresql database, which stores mlflow tracking information.
We mount several folders from our host to these services:
- The project base folder,
./
, is mounted on the Jupyter docker container so that all modifications are synchronized immediately. - The folder,
./data/artifacts
, where the artifacts logged by mlflow are stored by default, is mounted on the Jupyter and mlflow services. - The postgresql data folder,
/var/lib/postgresql/data
inside the container, is mounted locally on./data/db/
to keep the database intact, after stopping the stack.
Note
The project also includes these supplementary, standalone Docker images:
- for building Sphinx documentation (See Documentation)
- for testing Python code (See Python Tests)
2.1.1 Makefile¶
Makefile
commands are used extensively to interact with the project. For a list of commands, please refer to the help by running on the terminal:
make help
2.1.2 Python development¶
The project comes with a Python starter package called {{ cookiecutter.package_name }}
, which is located at ./src/
. The package is pip
installed to the Jupyter docker service in editable mode, while the Docker stack is being built.
2.2 Setup¶
If you want to build the stack from the cut project without starting it, run:
make build
The above command will build these images:
Service | Image name |
---|---|
jupyter | {{ cookiecutter.github_username }}/{{ cookiecutter.repo_slug }}/jupyter:0.1.0 |
mlflow | {{ cookiecutter.github_username }}/{{ cookiecutter.repo_slug }}/mlflow:0.1.0 |
postgres | {{ cookiecutter.github_username }}/{{ cookiecutter.repo_slug }}/postgres:0.1.0 |
Note
The version tag of docker images in a new project starts from 0.1.0
, which is read from the VERSION
variable in .env
file.
If you need to make a clean start:
make clean-all
2.3 Running the Docker Stack¶
To build and run the Docker stack in a cut project, run:
make
For convenience, the above command stops running stacks (if exist), cleans, (re)builds, and starts the services.
Note
Accessing Jupyter UI
Once the stack is up and running, you will see a link on the terminal, e.g., http://127.0.0.1:8888/?token=3c321...
, which you can follow to access the JupyterLab interface from your browser.
Note
Accessing mlflow UI
You can reach the mlflow UI at http://localhost:5000
. For a simple example on how to track a run, please refer to notebooks/mlflow_example.ipynb.
For in-depth tutorials, please refer to the official mlflow documentation.
2.3.1 Additional Run Options¶
By default, the Jupyter service is based on the official scipy-notebook image. You can also build & run from tensorflow or pyspark notebooks by:
make tensorflow
make pyspark
If you want to use classic Jupyter notebooks, run instead:
make notebook
2.4 Documentation¶
The project comes with basic documentation, which is located at {{ cookiecutter.repo_slug }}/docs
. You can use Sphinx to build the documentation locally by running:
make sphinx-html
The above command builds a docker image called {{ cookiecutter.github_username }}/{{ cookiecutter.repo_slug }}/sphinx
. It then starts a container from the image and renders the documentation (including automatic Python API documentation from docstrings).
Afterward, you can access the documentation by opening ./docs/_build/html/index.html
on your browser.
Note
By default, {{ cookiecutter.package_name }}
follows the numpy docstring style. If you would like to use Google style docstrings instead, please reverse the napoleon_google_docstring
and napoleon_numpy_docstring
variables inside {{ cookiecutter.repo_slug }}/docs/conf.py
.
2.5 Testing¶
2.5.1 Python¶
Build, code style, linting checks and unittests of the starter Python package are automated using tox
in a docker environment. You can run these tests by:
make tox
This command builds a docker image called {{ cookiecutter.github_username }}/{{ cookiecutter.repo_slug }}/python-dev
. It then starts a container from the image and runs the Python tests.
2.5.2 Docker Stack¶
You can test the integration of the Docker services (e.g., sending log requests to mlflow tracking server from the Jupyter service) automatically by running the docker-compose stack in “test” mode by executing:
make test
2.6 Online Services¶
2.6.1 Github¶
Github is a popular code hosting platform with (git) version control (and many other complementary services).
To host the project in Github, follow the steps below:
Create an empty repository (do not initialize readme, license, or .gitignore files). See the official Github documentation for detailed instructions.
Note
Your Github Username and Repository Name should match
{{ cookiecutter.github_username }}
and{{ cookiecutter.repo_slug }}
, respectively.Initialize git and make the first commit, e.g.:
git init git add . git commit -m "First commit"
Push the project to Github, e.g. using https connection:
git remote add origin https://github.com/{{ cookiecutter.github_username }}/{{ cookiecutter.repo_slug }}.git git push -u origin master
For more information on the Github ecosystem, please refer to the official help and guides.
2.6.2 Travis CI¶
Travis CI is a continuous integration service to build and test projects hosted in Github. The project comes with a pre-made Travis CI configuration located at .travis.yml
.
Important
You need to host the project in Github to use Travis CI.
Please follow the official Travis CI documentation for instructions to grant Travis CI access to the repository.
Once enabled, Travis CI runs all of the tests mentioned above automatically after each push. You can view the results at:
https://travis-ci.com/github/{{ cookiecutter.github_username }}/{{ cookiecutter.repo_slug }}
Travis CI also generates code coverage reports for the starter Python package, which can be viewed at codecov:
https://codecov.io/gh/{{ cookiecutter.github_username }}/{{ cookiecutter.repo_slug }}
Note
Please refer to the official guide to how to quick-start and use codecov.
2.6.3 Online Documentation¶
You may want to host the Sphinx documentation online, e.g. at Read the Docs or Github Pages. Typically, these services offer effortless integration with Github. Please refer to these services to learn how.
Note
We assume that you will host the documentation at https://{{ cookiecutter.repo_slug}}.readthedocs.io
. Please modify the URLs in the project README
and documentation, if you would like to host it elsewhere.