This Tutorial uses Docker technology to address a real scientific problem: building custom views of the sky. Starting from scratch (scratch here being a working Docker server), we will collect all the software components necessary to locate and retrieve remote archive data, then build and visualize maps of the sky. Both the sofware and the data are available off-the-shelf.
Our science task involves building a mosaic image for the galaxy Messier 051 and overlaying it with catalog data.
The first half of the tutorial constructs a docker image that has all the tools we will want for our task. This consists of the Montage software suite and Anaconda Python. Anaconda comes complete with a number of tools, such as Astropy and the SQLite database. We will add a few more.
The second half of the tutorial consists of using the docker image to solve our science problem. This includes subtleties like using visualization tools from outside the container and persisting the results after the container exits.
When we are done, we will have a tool that can be generalized for any image data and all on-line catalogs. This can either be wrapped as a command-line utility for ad hoc use or incorporated into large-scale (e.g. cloud) pipelines for massively parallel processing.
Docker server notes
Using your own Docker server
Many times, you can find a prebuilt Docker image that does what you want. The rest of the time you will have to build your own. This isn't as scary as it sounds. All docker images are based on a configuration file ("Dockerfile") and a build command. Once it is built, this new image will be kept (unless you actively delete it) and can be reused over and over or even used as the basis for another image.
Technically, you can use other names than "Dockerfile" (and have more than one of them in the same directory) but don't; it gets very tedious and at least one thing (at Docker Hub) breaks if you do.
Below is the Dockerfile for our project. If you haven't already, log into your assigned AWS machine, create a workspace, and copy the Dockerfile and a Jupyter notebook file we will need later:
mkdir Docker mkdir Docker/work cd Docker/work wget http://montage.ipac.caltech.edu/DockerTutorial/Dockerfile wget http://montage.ipac.caltech.edu/DockerTutorial/mViewer.ipynb
# All Docker images have to be built 'FROM' something FROM debian:latest # This is the way you set environment variables to be use in the running container ENV LANG=C.UTF-8 LC_ALL=C.UTF-8 ENV PATH /opt/conda/bin:$PATH ENV LD_LIBRARY_PATH /montage/Montage/lib:$LD_LIBRARY_PATH # 'RUN' directives modify the image in the same way running these commands would # change an OS instance. Only here we are modifying what will be installed in # the saved Docker image. We start with a few basic utilities that don't come # with baseline debian. The reason for all the continuation lines is that every # RUN creates another layer and uses up more disk space. RUN apt update --fix-missing && \ apt install -y wget bzip2 git && \ apt install -y curl grep sed && \ apt install -y vim && \ apt install -y build-essential # Installing Anaconda gives us a bunch of tools (in particular Jupyter and Astropy) # that will let us better interact with our data. We then use pip to augment Anaconda # with astroquery and the Python Montage package. RUN wget --quiet https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh -O ~/anaconda.sh && \ /bin/bash ~/anaconda.sh -b -p /opt/conda && \ rm ~/anaconda.sh && \ ln -s /opt/conda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \ echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc && \ echo "conda activate base" >> ~/.bashrc RUN pip install astroquery && \ pip install MontagePy && \ conda install -y plotly && \ conda install -y qgrid # Download and install Montage: While we have a copy of Montage install for Python # use, we are going to install it directly in the Docker container as well. This is # partly to illustrate actually building applications but also because in a large-scale # processing pipeline (e.g. building terabytes of little mosaics to support HiPS) we # don't really want to run that through Python and would prefer the command-line Montage. # Remember, this Dockerfile is not a shell script; there is no shell running. So we # don't really have any "state" (like a current working directory). The 'WORKDIR' # directive provides the equivalent, allowing use to do things the require location # context. WORKDIR /montage RUN wget --quiet http://montage.ipac.caltech.edu/download/Montage_07Aug19.tar.gz -O montage.tar.gz RUN tar -zxf montage.tar.gz WORKDIR /montage/Montage RUN make RUN cp bin/* /usr/local/bin/ # Finally, for this container we want the default application to just be a shell # so we can be logged-in as soon as we start the container. Other containers often # default to running a web server, etc. WORKDIR /work CMD /bin/bash
Dockerfile notes
Once we have a Dockerfile we can create a docker image using it. This is done with the following command:
docker build --tag montage/jupyter:latest .
The 'tag' option allows you to choose a name for the image. The last argument (.) is the file context; any files referenced, etc. in the Docker file will be looked for here.
Docker images can be shared in multiple ways. The above just stores it in the local running Docker but it can be saved to tarfiles and/or uploaded to registries (most commonly Docker Hub). Also commonly, people publish their Dockerfiles in repositories like GitHub.
We are building ours directly on the Amazon machine instance. So then we can immediately use it (create a operational Docker container) by running:
docker run --name montage_jupyter --rm -it -p 8888:8888 \ -v /home/workshop_usr/Docker/work:/work montage/jupyter:latest
Here is an overview of the ancillary arguments and why we need them:
The last argument is the name of my Docker image.
Names and Tags
Once we run this, the container is operational and we see a prompt from the Debian OS inside my Docker container. In Part 2 of this tutorial, we will use this container to do our science project. This is a good break point; we have our image built and can run it as many times as we wish on as many platforms as we have.
Go to Part 2 ...