Lerna and Turborepo with Gitlab CI/CD

Updated on

In this post we're going to cover a simple approach to a monorepo setup with a fully automated CI/CD workflow with Lerna, Turborepo, and Gitlab CI/CD.

Below is a table of contents to quickly jump to sections of interest.

Here is the project link to follow along and use for your own monorepo projects.

Tools

This workflow includes the following tools.

  • Lerna: for versioning and publishing packages.
  • Turborepo: for tasking running.
  • Gitlab CI/CD: for our CI/CD server.
  • Husky for Git hooks.
  • Commitizen: for its interactive commit cli and changelog adapter. We will tie this into one of our Git hooks through husky - they work nicely together.

Typescript and Jest Configurations

One helpful way to manage Typescript and Jest configurations in a monorepo is to define core configurations in the root of the repository, and extend from the core configurations inside of packages to suit individual package needs.

The tsconfig.json file in the root of the repository has some basic configurations.

tsconfig.json
{
  "compilerOptions": {
    "module": "commonjs",
    "moduleResolution": "node",
    "declaration": true,
    "noImplicitAny": true,
    "removeComments": true,
    "target": "es6",
    "sourceMap": true,
    "strict": true
  },
  "exclude": ["node_modules"]
}

The packages also have their own tsconfig.json files that make use of the "extends" option that will inherit from the root, and any new settings will override root configuration values.

packages/package-one/tsconfig.json
{
  "extends": "../../tsconfig.json",
  "compilerOptions": {
    "outDir": "./lib"
  },
  "include": ["./src"]
}

Similarly for jest we have a root configuration that is extended and overridable in packages.

jest.config.ts
import type { Config } from "jest";

const config: Config = {
  verbose: true,
  preset: "ts-jest",
  collectCoverage: true,
};

export default config;
packages/package-one/jest.config.ts
import type { Config } from "jest";
import { default as rootConfig } from "../../jest.config";

const config: Config = {
  ...rootConfig,
};

export default config;

Git, Versioning and Publishing Strategy

Publishing Strategy

Let's break down what our strategy is in the diagram above. In this diagram, P1 and P2 stand for package-one and package-two. Since we're adhering to a trunk-based branching strategy, we have one main branch and as short-lived as possible working branches.

The main branch is where all stable versions of packages will be versioned and published.

Any commits to branches named anything other than main (working branches) will publish canary versions of packages if they have related changes in the commits. The git sha is attached to the version to avoid collisions. This gives us "nightly builds" on working branches so that work in progress can be quickly tested by anyone.

# example from lerna docs

lerna publish --canary
# 1.0.0 => 1.0.1-alpha.0+${SHA} of packages changed since the previous commit
# a subsequent canary publish will yield 1.0.1-alpha.1+${SHA}, etc
package.json
# this script we use in our root package.json

"publish:canary": "lerna publish --canary --no-git-tag-version --no-push --yes"

Any commits git commit to working branches are met with a Husky prepare-commit-msg hook that triggers Commitizen's interactive CLI that will prompt you with questions to construct a commit message based on Angular's commit message conventions. The hook also skips if running in CI. Gitlab CI/CD has a predefined variable CI that we can use to identify this.

.husky/prepare-commit-msg
#!/usr/bin/env sh
. "$(dirname -- "$0")/_/husky.sh"

[ -n "$CI" ] && exit 0

exec < /dev/tty && npx cz --hook || true

Keeping commits very specific and short is helpful and important for a number of reasons. Depending on what kind of commit you select in the Commitizen interactive CLI will determine which stable version of the package ends up getting updated when merging to main branch - we go more in depth on this in the next sub-section. Commit messages will also be added to CHANGELOG.md files which are ideal to keep clean for keeping track of and reporting changes to consumers of your packages.

When merging back to the main branch, the pipeline will run lerna version with some additional arguments which will create and push tags for changed packages as well as update, commit, and push version bumps in their package.json files and CHANGELOG.md files. In order for the pipeline to push changes back to the repository, there are some CI/CD specific things we will need to do which are covered in the Gitlab CI/CD Strategy section.

package.json
"version:stable": "lerna version --yes"

We also have some settings in lerna.json for the version command that only allows versioning on the main branch and an adherance to conventional commits.

lerna.json
"command": {
  "version": {
    "allowBranch": "main",
    "conventionalCommits": true
  }
},

Since the first versioning pipeline that runs on a merge to main produces a commit and push, another pipeline is triggered which will use lerna publish with additional arguments. The from-git argument publishes packages tagged in the current commit from our versioning pipeline.

package.json
"publish:stable": "lerna publish from-git --yes"

Commit-driven Versioning (Commitizen)

We went over the core strategy, but let's linger on how Commitizen plays an important role for versioning.

Being a library maintainer comes with a lot of responsibility. Your users are depending on your packages to be reliable and there is a set of rules and requirements that we should follow to make sure our users are getting what they expect. What we're referring to is Semantic Versioning.

If we are making changes to a package, we need to think carefully about what kind of change we're making and what would be the impacts to existing users of the package.

When making commits, Commitizen helps streamline this thought process by leading us through a series of interactive CLI questions which in turn takes the answers and constructs a standardized commit message that Lerna interprets to decide which version numbers to bump when the commits are merged into the main branch.

If you've been a library maintainer before inside of your company or maybe open-source project, you may have witnessed irresponsible versioning. As the author of this post I have personally seen many instances of internal company libraries bumping only PATCH or MINOR versions of packages despite changes being breaking which should always result in MAJOR version bumps. This behavior can be disastrous to other libraries or applications consuming the packages - especially when a consumer is using a caret(^) or tilde(~) symbol in their package.json which is a common practice.

Before becoming a contributor to a library or when onboarding new developers into your library - consider making an understanding of Semantic Versioning a strict pre-requisite. Commitizen can help in automating a lot of the manual work around this, but it's only as useful as the contributors' knowledge of semver.

When committing a new change, Commitizen will prompt you with the following questions which we'll go over below:

  1. Select the type of change that you're committing

Keep the changes in each commit focused on 1 of these change types at a time. For instance, instead of sprinkling in a new feature, some bug fixes, and documentation changes into 1 commit - try breaking these up into three smaller more focused commits. Depending on which type of change you select, the end result may be not bumping a version at all, bumping a PATCH version (X.X.PATCH), or bumping a MINOR version (X.MINOR.X). For example, a ci change will not bump a package version, fix will bump PATCH and feat will bump MINOR.

ChangeDescription
featA new feature
fixA bug fix
docsDocumentation only changes
styleChanges that do not affect the meaning of the code (formatting)
refactorA code change that neither fixes a bug nor adds a feature
perfA code change that improves performance
testAdding missing tests or correct existing tests
buildChanges that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
ciChanges to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
choreOther changes that don't modify src or test files
revertReverts a previous commit
  1. What is the scope of this change (e.g. component or file name)

The answer to this question is not strict, this could be a specific file, a directory, or some abstract component in your codebase. This answer will show up in the commit logs and CHANGELOG.md which is useful for communicating what is the scope of the impact of the change.

  1. Write a short, imperative tense description of the change (max 86 chars)

The answer to this is really important for the CHANGELOG.md. Here you want to be succint in describing the change.

  1. Provide a longer description of the change

This should include the motivation for the change and contrast with previous behavior. This can be important especially later in a projects future for understanding the motivation of original contributers and can be useful in preventing tribal knowledge.

  1. Are there any breaking changes?

This one is critical. Whether your change is a bug fix, new feature, refactor, or any other type - if your change results in existing users needing to make code changes in order to install the new update, this would be indicative of a BREAKING CHANGE.

An example of this in Typescript world would be adding an additional required argument to an existing function.

If you choose Yes, that this is a breaking change, the end result of merging to the main branch will be a MAJOR version update to the relevant package.

  1. Does this change affect any open issues?

Here is where you can tag any open issues.

Turborepo

Turborepo is an intelligent build system optimized for Javascript and Typescript codebases.

While Lerna is particularly good at versioning and publishing, Turborepo is good at running tasks such as linting, building, testing and other computationally expensive processes.

Turborepo is in large part built in Go which brings some performance benefits versus "tools for Javascript built with Javascript".

One of it's core principals is to avoid running computations that have already been run before. It handles this by caching the results and logs of tasks so that depending on your configuration - will only recompute things that have changed from prior task runs.

Turbo Configuration

Turborepo can be configured by having a turbo.json file in the root directory of the monorepo. Let's take a look at the minimal setup we have:

turbo.json
{
  "$schema": "https://turborepo.org/schema.json",
  "pipeline": {
    "build": {
      "outputs": ["build/**"],
      "dependsOn": ["^build"]
    },
    "test": {
      "inputs": ["src/**/*.ts"],
      "outputs": [],
      "dependsOn": ["build"]
    }
  }
}

One thing to note is that for the purpose of this post, the specifics of this configuration is not necessarily important since it's meant to be minimal, and as your project grows, you will most likely make changes. For specifics on configurations check out the Turborepo Configuration Options docs.

What is important in the context of this post is focusing on the distinctions between Turborepo pipeline versus Gitlab CI/CD pipelines and how we're using both of them together.

The Turbo pipeline is where you can define which tasks in the monorepo depend on others, which inputs (file changes) should cause a task to rerun on subsequent executions, and which outputs (artifacts) of tasks should be cached. This works both during local development, and during CI/CD.

It's possible to achieve similar functionality in a Gitlab pipeline for CI/CD by declaring stages and job dependencies, and caching and uploading artifacts to save re-executions. But this requires additional overhead and can be more costly than just letting Turborepo do the heavy lifting.

What we can instead do is primarily rely on the Turbo pipeline for task ordering and execution in place of splitting these up into separate Gitlab stages or jobs - to reduce complexity. You can jump to the next Gitlab CI/CD Strategy section to see that we have 1 stage and 1 job per each pipeline run that is performing ALL of our CI/CD tasks.

As your monorepo grows much larger, it may be beneficial to introduce more advanced Gitlab CI/CD functionality. If you wanted to supercharge your Gitlab CI/CD with Turborepo setup - you can consider one of Turborepo's most interesting features Remote Caching. Note that this requires a Vercel account and is only free for Hobby users within usage limits.

In a nutshell, we can leave task running and the dynamics around it to be managed by Turborepo, and use additional Gitlab stages/jobs for doing things like running scripts for publishing packages or (while not relevant in this post) automated application deployments.

Gitlab CI/CD Strategy

While we covered some CI/CD in the Git, Versioning and Publishing section, this section will focus more on tokens, Gitlab repository settings, and the .gitlab-ci.yml file. For this section you'll need to have access to your Gitlab repository settings - if it's your company Gitlab you may not have maintainer access.

Tokens

For this strategy we're going to need two secret tokens:

  • $NPM_TOKEN: to publish packages to the NPM registry or this could be your companies internal Nexus repository. In your npmjs account you can generate a new token in Access Tokens -> Generate New Token and give it an Automation type and make sure to copy the secret value. Now in Gitlab, in Settings -> CI/CD -> Variables add a new variable with key of NPM_TOKEN and value of your copied secret. Make sure to check the Masked option so that your token is masked in job logs. We don't want this token to only be exposed to protected branches, otherwise our canary publish on working branches will not work.

  • GL_TOKEN: our versioning pipeline on the main branch pushes tags and a commit with package version bumps and changelog updates. In order for our CI server to be able to do this, we need to give it permissions to do so. For free personal Gitlab accounts you can create a personal access token otherwise if project access tokens are available create one with api, read_repository and write_repository permissions. Now as we did the the $NPM_TOKEN, add this as another CI/CD variable called $GL_TOKEN. If it's a project access token you'll need to assign it a role that has access to push to the main branch. Also, in case you wanted to have Lerna create Gitlab releases, your token would need registry read and write access - but that is out of the scope of this strategy.

YAML File

Let's pick apart our project's .gitlab-ci.yml file.

Firstly, we have 1 stage in our pipeline that will always run just 1 job. It may be helpful to minimize the number of stages and jobs in your pipelines to keep complexity low - this is a personal or team preference.

.gitlab-ci.yml
stages:
  - build

Next we have a .prepare configuration that we can extend in all of our jobs, since we will need to do this preparation before-hand. This is a cleaner way to reuse logic rather than have the same preparation script code in every job.

.gitlab-ci.yml
.prepare:
  before_script:
    - git config user.email $GITLAB_USER_EMAIL
    - git config user.name $GITLAB_USER_NAME
    - git remote set-url origin
      "https://gitlab-ci-token:$GL_TOKEN@$CI_SERVER_HOST/$CI_PROJECT_PATH.git"
    - echo "//registry.npmjs.org/:_authToken=$NPM_TOKEN" > .npmrc
    - git checkout $CI_COMMIT_BRANCH

We're specifying these to be run as before_script which will run before our jobs' primary scripts. Since Lerna will be performing Git operations during versioning and publishing, we set git config user.email $GITLAB_USER_EMAIL and git config user.name $GITLAB_USER_NAME. The $GITLAB_USER_EMAIL and $GITLAB_USER_NAME variables are predefined variables that are the email and name of the user who started the job.

Next we set the job token to assume the access of the $GL_TOKEN variable that we generated as either a personal or project access token with elevated repository permissions. The $CI_SERVER_HOST and $CI_PROJECT_PATH variables are also predefined variables that point to the url of the repository itself. See CI/CD job token. With this set, our release job will be able to push commits and tags back to the repository.

Next we're writing to a .npmrc file that we're wanting to publish to the public npm registry registry.npmjs.org and passing our $NPM_TOKEN variable as the authentication token so that we can publish packages. This could also be your company's internal Nexus repository in which case the registry would be different.

Now since Gitlab CI/CD jobs by default check out the repository at the commit that triggered the job in a DETACHED HEAD state, we need to actually checkout the branch itself so we run git checkout $CI_COMMIT_BRANCH, otherwise Lerna will throw an error.

Our first job in our YAML file build-publish-canary will only running on working branches and will publish our per commit nightly builds. It looks like this:

.gitlab-ci.yml
# CANARY PUBLISHING ON WORKING BRANCHES
build-publish-canary:
  stage: build
  extends:
    - .prepare
  script:
    - yarn install --frozen-lockfile
    - yarn build
    - yarn test
    - yarn publish:canary
  rules:
    - if:
        $CI_COMMIT_BRANCH != "main" && $CI_PIPELINE_SOURCE !=
        'merge_request_event' && $CI_COMMIT_TITLE != "Publish"

We're first making this job part of our 1 build stage. Then we're extending that .prepare configuration which will run its scripts before the script we have defined in this job.

The first part of our script is yarn install --frozen-lockfile which will install our dependencies using the exact state of our yarn.lock file that we have checked into the commit that triggered this job. This is generally a good practice for CI so that our lock file doesn't change when in CI. Then we run the build script defined in our root package.json file. This is where turbo comes in to run tasks. The build script looks like this:

package.json
"build": "turbo run build",

Turbo will run all of the build scripts defined in all of our monorepo's packages.

We then run the test script also defined in our root package.json in which turbo is also running our test tasks.

package.json
"test": "turbo run test",

Finally we run our publish:canary script which publishes to npm a unique package based on our commit.

An important part of this job are the rules that we have set.

  • $CI_COMMIT_BRANCH != main: ensures we're only running this job on branches that are not the main trunk branch.
  • $CI_PIPELINE_SOURCE != 'merge_request_event': ensures we're not running this job on merge requests.
  • $CI_COMMIT_TITLE != Publish: ensures that we do not run this job after Lerna has produced a commit with the title Publish which is what lerna version will do as we see in the next job.

Our second job build-version-stable will run on merges to main branch or on direct commit pushes to main branch. It looks like this:

.gitlab-ci.yml
# STABLE VERSIONING ON MAIN BRANCH
build-version-stable:
  stage: build
  extends:
    - .prepare
  script:
    - yarn install --frozen-lockfile
    - yarn build
    - yarn test
    - yarn version:stable
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_COMMIT_TITLE != "Publish"

We'll skip what's already been explained in the previous job straight to our version:stablescript which we are running after testing. In the Git, Versioning and Publishing we covered that version:stable is going to create and push tags, update package.json versions and changelogs in our packages which will push a commit back to the main branch. That will then trigger our final job that we have defined.

.gitlab-ci.yml
# STABLE PUBLISHING ON MAIN BRANCH
build-publish-stable:
  stage: build
  extends:
    - .prepare
  script:
    - yarn install --frozen-lockfile
    - yarn build
    - yarn test
    - yarn publish:stable
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_COMMIT_TITLE == "Publish"

This job will finally publish the new versions of our packages! 🥳

Tips

  • Don't squash commits on merging feature branches to main branch, the changelog will not reflect the changes that were made. It could be helpful to apply a constraint in Gitlab Settings -> Merge Requests -> Squash commits when merging and do not allow this.
  • When updating multiple packages at once on a feature branch, if you want canary publishes for all updated packages, you will need to push committed changes related to each package separately as canary publishing will refer to most recent commit on a feature branch to determine which packages to publish.
  • Avoid long running working branches. Commit and merge changes back to main as often as possible.
  • Work with two branches only, main and working branches. If bugs are found, simply use a working branch and merge fixes to main.
  • Communicate with your team on the value of Semantic Versioning, how to use Commitizen, and to be descriptive and intentional about commits.
  • If you're working with a team and certain members are experts or lead maintainers of specific packages, using Gitlab's Code Owners feature can be helpful for specifying specific maintainers who can approve merge requests for certain packages. This can be really helpful for splitting up responsibility. Code Owners requires Gitlab premium.

Newsletter