Jeremy Minton About
Developing in Docker containers
Posted by Jeremy Minton, ,

Why develop in docker environments

I’m wanting to experiment with Julia. It’s been mostly Python for me over the last four or five years so the first hurdle is to get Julia installed and running smoothly. Conda works nicely to manage environments for Python so I gave that a try with installing Julia from CondaForge. I started following this tutorial and quickly hit a problem: the Distributions package wouldn’t install.

Perhaps a different Julia version would help, CondaForge offers v1.1.1 which is between the LTS v1.0.5 and the latest v1.4.1. Unfortunately, the Julia package manager doesn’t manage it’s own, or Julia’s, version and so I’m left to install it from the downloads page. I’m sure I could manage that but it seems error prone to manually install Julia for each new environment.

As added complexity, Jupyter is installed via Conda, and comes with IPython and an array of other python packages. I use Jupyter with Python for work and so am reluctant to risk polluting my base environment so I want stronger encapsulation. Perhaps developing within Docker containers?

Here’s one for Julia!

Before embarking on this expedition, I think it is worth outlining what functionality I require for development. I would consider the following necessary:

  1. Compilation (for the general case, but obviously not relevant for Julia);
  2. Execution (to run a server, ML fitting/inference, whatever else it is meant to do);
  3. Test suits (for good development practices);
  4. IDE with code completion/linting (for expressive and efficient development); and,
  5. Interactive/data-science workflows (for all important data-analysis).

These seem like worthy acceptance criteria for this expedition.

Docker from the terminal

An easy use case of containers is executing them with a single command such as compiling/building code or running a stage in a data pipeline, so they run in an open-run-close fashion. For example, I use a Jekyll container to build and serve this website locally to check the markdown now that I build and deploy it with GitLab. It is this container, which provides a Rails environment with Jekyll so that Jekyll commands can be executed without requiring Rails or Gems to be installed locally. For example, to build and serve a website on localhost:4000 docker run --rm --volume="$PWD:/srv/jekyll" -p 4000:4000 -it jekyll/jekyll jekyll serve.

Running the Julia docker image like this is easy with the provided instructions, and it provides a REPL, docker run -it --rm julia, or executes a script, docker run -it --rm -v "$PWD":/usr/myapp -w /usr/myapp julia julia script.jl arg.

This obviously achieves objectives 1, 2 and 3 at a pinch. Thankfully, at least two more requirements will motivate a second chapter to this adventure; it would have been a pretty dull post otherwise.

Docker as a server

A very similar use case is to run a persistent command that provides a server from a container.

The Julia image can be extended to run as a JupyterLab server; however, this requires building on the existing Julia image to run using Pkg; Pkg.add("...") and install IJulia, Conda and Jupyterlab. This also requires a little bit more configuration to set-up the notebook server: a jupyter_notebook_config.py file needs to be modified with token='' (which should be fine because we’re only using these images for local development), ip="0.0.0.0" (so the ports can be bound to the host machine) and port=8888 (which is the default but seems like good practice to set it explicitly, given only that port will exposed from the container).

The last thing to sort out is mounting a working directory and permissions. This post provides a good explanation about container users and permissions; however, the proposed solution is a little unsatisfactory: it requires a selection of id at the container build time that corresponds to the host uid at container run time.

I think my options are

  1. Pass my uid to the container as an environment variable so that an entrypoint script can create that user and assign permissions before executing any other code.
  2. Create images for each developer user account, with their uid setup in the container. This might be okay when I’m making these images for myself but wouldn’t scale to an engineering/data science team.
  3. Run the image as root and chown a lot to correct ownership of files produced within the image. This solution may also have security implications, given it is not recommended to run containers and Jupyter as root.

After toying with this for some time, I yielded and went with option 3. I don’t like it but it is the easiest of these options and expedites my Julia programming. I hope to return to this point and explore option 1 in more detail.

The image can be found here, and the source code here.

Serving Jupyter soundly satisfies requirement 5 and requirement 4 if you’re happy to develop in Jupyter… I am not, so the voyage continues into chapter three.

A local IDE with containerised execution

VSCode offers the Remote Development extension to operate in this configuration. Docker containers can be, launched if necessary and, connected to via the extension’s interface. Containers for a range of languages are provided as well as options to use local Dockerfiles or connect to existing containers. Four options exist for the file system:

  1. Open a folder, which will get mounted to the container.
  2. Clone a git repository into the container.
  3. Connect to the container as is. The provided containers are configured with a volume for persistent storage, but only through the container.

The provided Python image was expectedly seamless and the result provided a nice environment to develop as all VSCode functionality seemed to work more smoothly than in my local environment. Specifically code-completion, running tests and the terminal, which all have shown issues for me locally.

Using my own image, the Julia image I used for the previous section, wasn’t as smooth. It turns out that extensions must be installed in each of the containers. For this container, that included the Julia extension as well as a number of others that crop up through development.

There is a Jupyter extension that supports notebooks within VSCode, but I have not been able to understand where it collects kernels from, and hence, haven’t got it running the Julia kernel yet. This is also not an issue because the image will continue serving while VSCode is connected to it so Jupyter can be open in a browser.

This soundly satisfies requirement 4 and with a bit of configuration of VSCode can also satisfy requirement 3, which brings this saga to a close.

An aside: VSCode remote connection

The same VSCode extension, Remote Development, allows you to connect to remote instances, defined in .ssh/config, and develop on that machine using the local VSCode environment. This provides the same experience as working in a docker container but for a remote instance instead. Amazing!