Computer stuff, OCaml

Setting up a private OPAM repository on GitLab step-by-step

Introduction

OCaml is my language of heart. I don't use it in my daily activities, it has a very small community and lacks a lot of things that any modern language has, but I simply love it. Functional languages have this capacity to challenge the way I think about algorithms, and I was interested in the Inria (the French National Institute for Research in Digital Science and Technology) back when I was a teenager. So one thing leading to another, I ended up learning a bit of OCaml on the side while I was in engineering school. And even though I never got around to work with it, I still spend some nights hacking together small bits and pieces just to challenge myself, and I'm having a grand old time.

As a matter of fact, I would love to give it a try in some real-world scenarios. For example, I would love to port some internal tooling of the company I work for, from Ruby to OCaml. This, however, might be a bit tricky, because working on corporate stuff requires me to do everything privately. I can't just release libraries on the public OPAM repository like this.

So here is what I need if I want to start working on private libraries:

  1. A private OPAM repository. My company uses GitLab, so let's roll with it;
  2. A way to publish packages onto this repository;
  3. A way to install packages from this repository.

This project occupied most of my free time for a couple of week-ends, so I'm very happy and proud to present to you the result of my work! Let's dive in!

What is an OPAM repository, anyway?

Before going into more detail about what I have done, I believe it's a good idea to talk about the OCaml package pipeline and architecture, just so we're on the same page.

OPAM is OCaml's package manager. Think NPM, RubyGems, Pip, Cargo... Its function is to manage the different instances of the OCaml compiler on a computer (we call them switches), and to interact with package repositories in order to download, unpack and compile libraries from the community.

An OPAM repository is a big directory that lists all available packages and their metadata: versions, dependencies, build instructions, and more importantly for us, the URL of an archive with the source code in it.

A diagram describing the basic workflow of releasing an OCaml package.
The basic release pipeline of an OCaml package. Build and store a release tarball, and push a metadata file to an OPAM repository. Simple enough, right?

When you try to install a package on one of your switches, OPAM retrieves its metadata from the repository, solves for dependencies and then downloads and compile sources using the URL mentioned above.

The official repository

The OCaml team maintains an official OPAM repository on GitHub. This repository is public and any developer can submit pull requests to add their packages to the mix. An example here, with my package textrazor, an OCaml wrapper for the TextRazor API.

I have to admit that this way of handling a community-driven package base is quite unorthodox, and probably doesn't scale very well, but I believe it suits the size of the community. The problem is that it does not suit my needs for private packages, but we'll come to that.

Existing tooling

Of course, building these packages, archiving them in tarballs and creating pull requests with metadata files is pretty tedious. Which is why the community has come up with different tools to make the OCaml developer's life easier.

  • OPAM-publish is a plugin for OPAM that does most of the heavy-lifting, provided that your package and the repository you want to push to are on GitHub.
  • Dune-release is very similar, except that it builds on top of Dune, a very popular build system. It suffers from the same drawbacks, in that it is tightly coupled with GitHub and the official OPAM repository.

Both of these tools are great on paper, and I have tried to tweak them for my use-case, but no cigar. Maybe one day, when I'm actually proficient with OCaml, will I try to contribute to these projects and make them more generic. But, for the time being, we'll have to make do without the help of these tools. We're going homemade, baby!

Episode IV: A new (h)OP(e)AM repository

(I am unreasonably proud of this segue.)

Now that the goal is clear, let's begin! The first step is pretty easy, we'll create a new GitLab repository that will become our OPAM repository.

Go to GitLab.com (or your own GitLab instance), hit "+", "New project/repository" and select "Create blank project". (Maybe one day OCaml will be big enough to get its own templates, but let's not get our hopes up too much.) If you are as original as I am, you can name it opam-repository and set it to "Private". You can add a README if you want, more documentation never hurts.

And just like that, we're up and running! Well, not quite, because there is one thing that the repository needs to be recognized as an OPAM repository, and that's a repo file.

Clone the repo on your machine, and create a file called repo at the root. Inside, write the following.

opam-version: "2.0"
upstream: "<REPO_URL>/tree/<MAIN_BRANCH>"

Replace <REPO_URL> with the URL of the repository and <MAIN_BRANCH> with the name of your default branch (usually master or main). Add, commit, push, and voilà!

Package repository

Now, we will need something to push to this repository. Let's create a new, minimal package as an example.

I won't go into too much detail here about the usage of Dune. If you're not familiar with it, I would suggest giving their quickstart guide a read!

dune init proj hello
cd hello

Let's make a Hello module with a hello function, as well as an executable that runs that function.

(* lib/hello.mli *)
val hello : unit -> unit

(* lib/hello.ml *)
let hello () =
  print_endline "Hello world!"

(* bin/main.ml *)
let () =
  Hello.hello ()

Let's run it just to make sure it works.

dune exec hello

Hello world!

Great. We'll also take a look at the dune-project file to make sure that everything is in order.

(lang dune 3.11)

(name hello)

(generate_opam_files true)

(source
 (gitlab richard.degenne/hello))

(authors "Richard Degenne")

(maintainers "Richard Degenne")

(license "Proprietary")

(package
 (name hello)
 (synopsis "It says hello!")
 (description "It says hello to the world!")
 (depends ocaml dune)
 (tags
  (hello)))

; See the complete stanza docs at https://dune.readthedocs.io/en/stable/dune-files.html#dune-project

I have updated the fields as I saw fit. In particular, the source stanza is interesting because you can use gitlab if you happen to use GitLab.com (the SaaS version). Otherwise, you always have the option to use uri and write the URL of your remote. Run dune build to update the hello.opam file and we're good to go.

In order to push this to GitLab, we'll set up a new Git repository.

git init
echo _build/ > .gitignore
git add .
git commit -m 'Initial commit'

Great. And now, we have to setup the remote. Go back to GitLab, "+", "New project/repository", and make a new, blank repository for the package.

Repositories everywhere!

Set up the remote in your local repository and push.

git remote add origin git@gitlab.com:richard.degenne/hello.git
git push

Now for the tricky part...

Alright, we are up and running! We have out neat little package on GitLab, and all that is left is the CI.

Just as a reminder, the objective of our release pipeline is to

  1. Package the source code into a tarball and push it to a package registry;
  2. Build the metadata file for the OPAM repository;
  3. Commit the metadata file onto the OPAM repository and push.

Let's tackle these steps one at a time.

Just like before, I won't go into too much detail about the syntax of the GitLab CI file. Should you need a refresher, you can find the full reference here.

Building the release tarball

When working on CI scripts, I usually try to test the commands on my own machine before writing them down in the .gitlab-ci.yml file, just to try and minimize the amount of time-consuming back and forth. So let's try to build the archive locally first. And to do this, we are going to use the tar command.

Obligatory XKCD strip whenever tar is mentioned.

In its simplest form, the command we are going to use looks like this.

tar -cjvf /tmp/artefact.tbz .
  • c for "create" ;
  • j to compress with bzip2. (I think we could use other compression methods but the packages I looked at in the official repository used bzip2, so we'll go with it)
  • v for "verbose", just to see what is being put in the archive.
  • f /tmp/artefact.tbz to write the resulting archive to the /tmp folder.
  • . to say we want to compress the current folder.

We can have a look at what was generated.

tar -tjvf /tmp/artefact.tbz

Some remarks:

  • There are a bunch of files that we don't want in there, like the .git and the _build folders.
  • The files are stored at the root of the archive. I don't know how important this is, but in tarballs from the official repository, the files were stored within a folder whose name was <PACKAGE_NAME>.<VERSION_NUMBER>.

For the first point, we can create a .tarignore file with the following contents.

./.git
./_build

We can tell tar to ignore these patterns by adding --exclude-ignore='.tarignore' to the command.

For the second point, tar offers a --transform option that applies a sed transformation to all the file paths in the archive. Here, we want a substitution to replace the first . by hello.<VERSION>.

s/\./hello.0.1.0/

That should do the trick. We'll replace that hard-coded 0.1.0 with a CI variable in the actual script, but for now, we can give the new and improved command another try.

tar --transform "s/\./hello.0.1.0/" \
  --exclude-ignore='.tarignore' \
  -cjvf /tmp/artefact.tbz .

Let's have another look.

$ tar -tjvf /tmp/artefact.tbz 
drwxrwxr-x richard/richard   0 2023-10-19 22:24 hello.0.1.0/
-rw-rw-r-- richard/richard 429 2023-10-19 21:32 hello.0.1.0/dune-project
-rw-r--r-- richard/richard 660 2023-10-19 21:32 hello.0.1.0/hello.opam
drwxrwxr-x richard/richard   0 2023-10-19 21:21 hello.0.1.0/bin/
-rw-rw-r-- richard/richard  66 2023-10-19 21:21 hello.0.1.0/bin/dune
-rw-rw-r-- richard/richard  26 2023-10-19 21:24 hello.0.1.0/bin/main.ml
drwxrwxr-x richard/richard   0 2023-10-19 21:23 hello.0.1.0/lib/
-rw-rw-r-- richard/richard  24 2023-10-19 21:21 hello.0.1.0/lib/dune
-rw-rw-r-- richard/richard  25 2023-10-19 21:23 hello.0.1.0/lib/hello.mli
-rw-rw-r-- richard/richard  46 2023-10-19 21:23 hello.0.1.0/lib/hello.ml
drwxrwxr-x richard/richard   0 2023-10-19 21:21 hello.0.1.0/test/
-rw-rw-r-- richard/richard  21 2023-10-19 21:21 hello.0.1.0/test/dune
-rw-rw-r-- richard/richard   0 2023-10-19 21:21 hello.0.1.0/test/hello.ml
-rw-rw-r-- richard/richard   8 2023-10-19 21:36 hello.0.1.0/.gitignore
-rw-rw-r-- richard/richard   0 2023-10-19 21:54 hello.0.1.0/.gitlab-ci.yml
-rw-rw-r-- richard/richard  16 2023-10-19 22:24 hello.0.1.0/.tarignore

Looks great! We could probably add some more files to the .tarignore filters, but that's good enough for now! Let's add this to the CI script.

# .gitlab-ci.yml
---
release:
  only:
    - tags
  image: ocaml/opam
  script:
    - |-
      tar --transform "s/\./hello.${CI_COMMIT_TAG}/" \
        --exclude-ignore='.tarignore' \
        -cjvf /tmp/artefact.tbz .

Uploading the tarball to GitLab

In order to store that kind of build artifacts, GitLab offers a feature called Package registries. It supports an array of popular package managers but not OPAM. (who would have thunk!) So we're going to go with the "generic" provider, that works like a sort of file storage. In order to upload to this registry, the documentation tells us to send a PUT HTTP request to

/projects/:id/packages/generic/:package_name/:package_version/:file_name

As far as authentication goes, for the actual CI we'll use the access token provided to the CI job. However, in order to test the upload ourselves, we'll create a personal access token. In GitLab, click on you profile picture and go to "Access tokens", "Add new token". Give a name and an expiration date to your token, and make sure that the api scope is checked.

Click "Create" and save the token somewhere safe.

Alright, let's give this upload a shot!

export TOKEN=<PERSONAL ACCESS TOKEN>
curl -H "PRIVATE-TOKEN: ${TOKEN}" --upload-file /tmp/artefact.tbz \
  "https://gitlab.com/api/v4/projects/<PROJECT ID>/packages/generic/opam/0.1.0/hello.0.1.0.tbz"

(You can find the project ID on the home page of the repository on GitLab.)

You can check in GitLab that the file was uploaded correctly by going to "Deploy", "Package registry".

Looks good! We can add this to the CI script.

# .gitlab-ci.yml
---
release:
  only:
    - tags
  image: ocaml/opam
  script:
    - |-
      tar --transform "s/\./hello.${CI_COMMIT_TAG}/" \
        --exclude-ignore='.tarignore' \
        -cjvf /tmp/artefact.tbz .
    - |-
      curl --fail --header "JOB-TOKEN: ${CI_JOB_TOKEN}" \
        --upload-file /tmp/artefact.tbz \
        "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/opam/${CI_COMMIT_TAG}/hello.${CI_COMMIT_TAG}.tbz"

Pushing the metadata file to OPAM

Now that the tarball is dealt with, we need to make the metadata file, add it to the OPAM repository and push a commit. That means that we'll need to clone the OPAM Git repository during the pipeline. I use the before_script step to do these kinds of setup commands.

# .gitlab-ci.yml
---
release:
  # ...
  before_script:
    - git clone https://gitlab.com/richard.degenne/opam-repository.git /tmp/opam-repository
  script:
    # ...

We'll have another problem with authentication, though. The job does not have the required privileges to pull or push to that repository. So we need create another project access token, but this time for the OPAM repository.

On GitLab, go to your OPAM repository, "Settings", "Access token" and create a new access token. Make sure to it the "Maintainers" role and the scopes "read_repository" and "write_repository".

Spoiler alert, this is not the last access token we'll need on this journey.

Save the token somewhere safe. Now, we could simply paste the token in the CI script, but this is not very safe to leak credentials like this. Instead, we'll create a CI variable to store that token. Go back to the "hello" repository, and go to "Settings", "CI/CD", "Variables". Create a new variable and call it OPAM_REPOSITORY_ACCESS_TOKEN. Put the token in the value and save.

Let's update our CI script to use this new variable. And we can add the instructions to copy the hello.opam file in the repository in the right place, commit and push.

# .gitlab-ci.yml
---
release:
  # ...
  before_script:
    - git clone "https://ci-token:${OPAM_REPOSITORY_ACCESS_TOKEN}@gitlab.com/richard.degenne/opam-repository.git" /tmp/opam-repository
  script:
    # (tar and curl...)
    - package_dir="/tmp/opam-repository/packages/hello/hello.${CI_COMMIT_TAG}"
    - mkdir -p ${package_dir}
    - cp hello.opam ${package_dir}/opam
    - git -C ${package_dir} add .
    - git -C ${package_dir} commit -m "Added hello ${CI_COMMIT_TAG}"
    - git -C ${package_dir} push

Adding the release tarball URL

We are almost there! The only that is left to do is to add the tarball URL inside of the metadata file. Basically, we need to append the following section at the end of the opam file after it has been copied to the repository folder.

url {
  src: "<TARBALL URL>"
  checksum: "sha256=<TARBALL FINGERPRINT>"
}

For the tarball URL, it's pretty easy, it is the same URL than what we used to upload. What's less easy is that the package registry is private, so we need to provide credentials. You guessed it, it's round three of project access token!

Go to the "hello" project, "Settings", "Access Tokens", and create a new token with the "Reporter" role and the "read_api" scope. I wish there was a scope that only granted access to the package registry, because this token is going to be hard-coded into the metadata files in the OPAM repository.

Save the token somewhere safe. Like before, we don't want the token to leak in the .gitlab-ci.yml file, so we'll use a CI variable. Go to the "hello" project, "Settings", "CI/CD", "Variables", and create a new variable.

We can go back to the .gitlab-ci.yml file and add a step to append stuff to the OPAM metadata file. We'll need sed to bake the credentials into the URL, by substituting the // with //opam-registry:<TOKEN>@.

# .gitlab-ci.yml
---
release:
  script:
    # ...
    - cp hello.opam ${package_dir}/opam
    - |-
      cat <<-EOF >> ${package_dir}/opam

         url {
           src: "$(sed -e "s^//^//opam-registry:${REGISTRY_ACCESS_TOKEN}@^" <<< ${url})"
         }
      EOF
    - git -C ${package_dir} add .
    - git -C ${package_dir} commit -m "Added hello ${CI_COMMIT_TAG}"
    - git -C ${package_dir} push

Adding the checksum to the metadata file

And finally, the checksum. The good part is that GitLab can provide us with the checksum when we do the upload, all we have to do is add a parameter to the request to retrieve a JSON description of the file (more info in the docs here).

# .gitlab-ci.yml
---
release:
  # ...
  script:
    # ...
    - url="${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/opam/${CI_COMMIT_TAG}/hello-${CI_COMMIT_TAG}.tbz"
    - |-
      curl --fail --header "JOB-TOKEN: ${CI_JOB_TOKEN}" \
        --upload-file /tmp/artefact.tbz \
        --output /tmp/package_file.json \
        "${url}?select=package_file"
    # ...

And in order to extract the fingerprint from the JSON file, we are going to use jq. Unfortunately. jq does not come pre-installed in the Docker image, so we'll also add a before_script step to install it.

# .gitlab-ci.yml
---
release:
  # ...
  before_script:
    # ...
    - sudo apt-get install -y jq
  script:
    # ...
    - url="${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/opam/${CI_COMMIT_TAG}/hello-${CI_COMMIT_TAG}.tbz"
    - |-
      curl --fail --header "JOB-TOKEN: ${CI_JOB_TOKEN}" \
        --upload-file /tmp/artefact.tbz \
        --output /tmp/package_file.json \
        "${url}?select=package_file"
    - sha=$(jq -r '.file_sha256' /tmp/package_file.json)
    # ...
    - |-
      cat <<- EOF >> ${opam_dir}/opam

        url {
          src: "$(sed -e "s^//^//opam-registry:${REGISTRY_ACCESS_TOKEN}@^" <<< ${url})"
          checksum: "sha256=${sha}"
        }
      EOF
    # ...

Are we done yet? 😭 Yes. I know this was a journey and a half, but I believe we are done. Just for the sake of completeness, here is the full .gitlab-ci.yml file below, with added comments.

# .gitlab-ci.yml
---
release:
  only:
    - tags
  image: ocaml/opam
  before_script:
    - git clone "https://ci-token:${OPAM_REPOSITORY_ACCESS_TOKEN}@gitlab.com/richard.degenne/opam-repository.git" /tmp/opam-repository
    - sudo apt-get install -y jq
  script:
    # Build the release archive
    - |-
      tar --transform "s/\./hello.${CI_COMMIT_TAG}/" \
        --exclude-ignore='.tarignore' \
        -cjvf /tmp/artefact.tbz .

    # Upload the archive to GitLab
    - url="${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/opam/${CI_COMMIT_TAG}/hello-${CI_COMMIT_TAG}.tbz"
    - |-
      curl --fail --header "JOB-TOKEN: ${CI_JOB_TOKEN}" \
        --upload-file /tmp/artefact.tbz \
        --output /tmp/package_file.json \
        "${url}?select=package_file"

    # Generate the OPAM metadata file
    - sha=$(jq -r '.file_sha256' /tmp/package_file.json)
    - package_dir="/tmp/opam-repository/packages/hello/hello.${CI_COMMIT_TAG}"
    - mkdir -p ${package_dir}
    - cp hello.opam ${package_dir}/opam
    - |-
      cat <<- EOF >> ${package_dir}/opam

        url {
          src: "$(sed -e "s^//^//opam-registry:${REGISTRY_ACCESS_TOKEN}@^" <<< ${url})"
          checksum: "sha256=${sha}"
        }
      EOF

    # Push a commit to the OPAM repository
    - git -C ${package_dir} add .
    - git -C ${package_dir} commit -m "Added hello ${CI_COMMIT_TAG}"
    - git -C ${package_dir} push

Now, let's commit this.

git add .
git commit -m 'Added CI'

The first release!

In order to make a new release, we'll add a version stanza to our dune-project file.

; dune-project

(package
  (name hello)
  (synopsis "It says hello!")
  (description "It says hello to the world!")
  (version 0.1.0)
  ; ...
)

We can now commit, make a tag and push the tag.

Don't forget to regenerate the OPAM file with dune build!

dune build
git add .
git commit -m 'Released 0.1.0'
git push

git tag 0.1.0
git push origin 0.1.0

Go to you project on GitLab and check "Build", "Pipelines". Hopefully, you'll see something like this.

You can also go to "Deploy", "Package registry" and check that the release tarball is there.

If you see multiple versions of the file (like in the screenshot above), it's the ones that we uploaded during our tests. You can delete the older ones and only keep the one you've just created through CI. You can also download it to check that it is indeed the correct artifact.

And finally, you can check in the OPAM repository that a new commit has appeared with your OPAM file in it. In particular, check the url section that was built during CI: the credentials, the shasum, it should all be here!

From now on, you can push new releases of the hello library as often as needed!

Installing packages

Of course, the ultimate goal of releasing packages is to install them somewhere else. In order order illustrate that, let's try to install our package using OPAM. As you would expect, simply running opam install hello won't cut it. By default, OPAM only knows about the official repository.

opam install hello

[ERROR] No package named hello found.

In order to add a new repository, run the following.

opam repository --all --set-default add gitlab git+ssh://git@gitlab.com/richard.degenne/opam-repository.git
  • --all adds the repository to all of your existing switches. Feel free to remove it if you don't want that.
  • --set-default adds the repository to the default repositories added to new switches. Again, feel free to remove it. You can always add or remove the repository to any switch after creating it.
  • I use SSH to work with Git locally, so I don't need to add extra credentials. If you use Git with HTTPS, you will need to provide an access token in the credentials. Otherwise, OPAM will not be able to pull from the repository since it is private.

Now that OPAM knows about our repository, we can try again.

opam install hello

The following actions will be performed:
  ∗ install hello 0.1.0

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><><><>
⬇ retrieved hello.0.1.0  (https://opam-registry:<ACCESS TOKEN>@gitlab.com/api/v4/projects/<PROJECT ID>/packages/generic/opam/0.1.0/hello-0.1.0.tbz)
∗ installed hello.0.1.0

You can see in the logs that OPAM downloaded our release tarball and installed it! And because our tarball contained an executable file, you can even invoke our hello command straight in your terminal!

hello

Hello world!

Final thoughts

And there you have it! A private OPAM repository hosted on GitLab, private packages hosted on GitLab, released through CI on a private package registry! Absolutely beautiful. Now that I have all of this set up, I feel much more enthusiastic at the idea of small bits and pieces of company code to OCaml, since they would be both safe and usable in a real-world situation. And I might just do that, in fact.

I still think, however that there are things that could be made better with this setup, and I'll leave it as an exercise to the reader for the time being. 😉

  • Having one package registry by package is kinda tedious. We could also have the CI push the tarball to the OPAM repository's package registry. That way, the repository holds all of the metadata and the tarballs. Probably cleaner.
  • On GitLab.com, access tokens expire after one year, which means that all the hard-coded URLs in the metadata files will become unusable.
    • You could make a new token and push a big commit where you replace all the hard-coded tokens.
    • Maybe there is a better way of telling OPAM how to authenticate against the registry when downloading packages, but I have not found it.
  • Copying and pasting the .gitlab-ci.yml file across repositories is pretty tedious. GitLab offers an "includes" feature that lets a CI file importing stuff from other CI files. We could DRY up our script and let all the packages simply include it, passing a couple of variables like the name of the package or the registry access token. I might do that in the future, be on the lookout for more posts on this!

Anyway, this was a lot of fun to build and this post was a lot of fun to write! I hope it will be useful for someone out there; let me know in the comments if you managed to replicate or if you're facing other issues!