Why develop in docker environments
I’m wanting to experiment with Julia. It’s been mostly
Python for me over the last four
or five years so the first hurdle is to get Julia installed and running smoothly.
works nicely to manage environments for Python so I gave that a try with installing Julia
from CondaForge. I started following
this tutorial and quickly hit a
Distributions package wouldn’t install.
Perhaps a different Julia version would help, CondaForge offers v1.1.1 which is between
v1.0.5 and the latest
v1.4.1. Unfortunately, the Julia package manager doesn’t
manage it’s own, or Julia’s, version and so I’m left to install it from the
downloads page. I’m sure I could manage that but it
seems error prone to manually install Julia for each new environment.
As added complexity, Jupyter is installed via Conda, and comes with IPython and an array of other python packages. I use Jupyter with Python for work and so am reluctant to risk polluting my base environment so I want stronger encapsulation. Perhaps developing within Docker containers?
Before embarking on this expedition, I think it is worth outlining what functionality I require for development. I would consider the following necessary:
- Compilation (for the general case, but obviously not relevant for Julia);
- Execution (to run a server, ML fitting/inference, whatever else it is meant to do);
- Test suits (for good development practices);
- IDE with code completion/linting (for expressive and efficient development); and,
- Interactive/data-science workflows (for all important data-analysis).
These seem like worthy acceptance criteria for this expedition.
Docker from the terminal
An easy use case of containers is executing them with a single command such as
compiling/building code or running a stage in a data pipeline, so they run in an
open-run-close fashion. For example, I use a Jekyll container to build and serve this
website locally to check the markdown now
that I build and deploy it with GitLab.
It is this container, which provides a
Rails environment with Jekyll so that Jekyll commands can be executed without requiring
Rails or Gems to be installed locally. For example, to build and serve a website on
docker run --rm --volume="$PWD:/srv/jekyll" -p 4000:4000 -it jekyll/jekyll jekyll serve.
Running the Julia docker image like this is easy with the provided instructions, and it
provides a REPL,
docker run -it --rm julia, or executes a script,
docker run -it --rm -v "$PWD":/usr/myapp -w /usr/myapp julia julia script.jl arg.
This obviously achieves objectives 1, 2 and 3 at a pinch. Thankfully, at least two more requirements will motivate a second chapter to this adventure; it would have been a pretty dull post otherwise.
Docker as a server
A very similar use case is to run a persistent command that provides a server from a container.
The Julia image can be extended to run as a JupyterLab server; however, this requires
building on the existing Julia image to run
using Pkg; Pkg.add("...") and install
Jupyterlab. This also requires a little bit more configuration
to set-up the notebook server: a
jupyter_notebook_config.py file needs to be modified
token='' (which should be fine because we’re only using these images for local
ip="0.0.0.0" (so the ports can be bound to the host machine) and
port=8888 (which is the default but seems like good practice to set it explicitly,
given only that port will exposed from the container).
The last thing to sort out is mounting a working directory and permissions.
provides a good explanation about container users and permissions; however, the
proposed solution is a little unsatisfactory: it requires a selection of id at the
container build time that corresponds to the host
uid at container run time.
I think my options are
- Pass my
uidto the container as an environment variable so that an
entrypointscript can create that user and assign permissions before executing any other code.
- Create images for each developer user account, with their
uidsetup in the container. This might be okay when I’m making these images for myself but wouldn’t scale to an engineering/data science team.
- Run the image as root and
chowna lot to correct ownership of files produced within the image. This solution may also have security implications, given it is not recommended to run containers and Jupyter as root.
After toying with this for some time, I yielded and went with option 3. I don’t like it but it is the easiest of these options and expedites my Julia programming. I hope to return to this point and explore option 1 in more detail.
Serving Jupyter soundly satisfies requirement 5 and requirement 4 if you’re happy to develop in Jupyter… I am not, so the voyage continues into chapter three.
A local IDE with containerised execution
VSCode offers the Remote Development
extension to operate in this configuration. Docker containers can be, launched if
necessary and, connected to via the extension’s interface. Containers for a range of
languages are provided as well as options to use local
Dockerfiles or connect to
existing containers. Four options exist for the file system:
- Open a folder, which will get mounted to the container.
- Clone a git repository into the container.
- Connect to the container as is. The provided containers are configured with a volume for persistent storage, but only through the container.
The provided Python image was expectedly seamless and the result provided a nice environment to develop as all VSCode functionality seemed to work more smoothly than in my local environment. Specifically code-completion, running tests and the terminal, which all have shown issues for me locally.
Using my own image, the Julia image I used for the previous section, wasn’t as smooth.
It turns out that extensions must be installed in each of the containers. For this
container, that included the
Julia extension as well as a number of others that crop
up through development.
There is a
Jupyter extension that supports notebooks within VSCode, but I have not
been able to understand where it collects kernels from, and hence, haven’t got it
running the Julia kernel yet. This is also not an issue because the image will continue
serving while VSCode is connected to it so Jupyter can be open in a browser.
This soundly satisfies requirement 4 and with a bit of configuration of VSCode can also satisfy requirement 3, which brings this saga to a close.
An aside: VSCode remote connection
The same VSCode extension, Remote Development,
allows you to connect to remote instances, defined in
.ssh/config, and develop on that machine
using the local VSCode environment. This provides the same experience as working in a docker container
but for a remote instance instead. Amazing!