North American spatial analogues for projected Canadian climates

The methodology below explains how we obtained the numbers presented in the interactive story See how summers in your city could feel like by the end of the century, published on June 26, 2025.

While we performed calculations for all seasons and many Canadian locations, as described here, we found our methodology worked better for summers in most locations. This is why the interactive story focuses on summers for a selection of Canadian cities.

This analysis compares future climate conditions in Canadian cities with current conditions in North American cities. The similarity between them is calculated using the Mahalanobis distance, considering temperature and precipitation patterns.

The concept is inspired by ClimateData.ca and CCExplorer.

Environment and Climate Change Canada experts and Arthur Charpentier, professor in the mathematics department at the Université du Québec à Montréal, provided valuable feedback on the methodology and calculations.

Questions or comments? Reach out to Senior Data Producer Nael Shiab.

Spatial analogues

Analogue points: / Core and border points: / Outliers points: / Clusters: / Calculations done in / Map drawn in

Options

Scatter plot

We can also visualize the data in a scatter plot. When displaying cities, the size of the circles is proportional to each city's population.

Temperature and precipitation data

The CMIP6 seasonal projections come from the Power Analytics and Visualization for Climate Science (PAVICS) platform. Six NetCDF files (2 variables x 3 scenarios) with a total size of 516 MB were provided by Environment Canada. To extract the data, we used the Python package Xarray.

The seasonal normals (1981-2010) come from ClimateNA. They consist of eight TIF files (2 variables x 4 seasons) with a total size of 26.5 MB. To extract the data, we used the journalism library, specifically the getGeoTiffDetails and getGeoTiffValues functions.

Instead of using the absolute values of the projections directly, we summed the normals and projection delta values. Since the spatial scales of the normals (~1x1 km) and the projections (~10x10 km) are different, this approach provides more accurate projection estimates.

We used the 2071-2100 delta values for three scenarios (SSP1-2.6, SSP2-4.5, and SSP5-8.5) based on the 1981-2010 baseline. We worked with the first decile, the median, and last decile values.

We extracted the temperature and precipitation projections for a selection of Canadian cities (more details below).

The temperature and precipitation normals were extracted for a selection of Canadian, American, and Mexican cities (more details below) and for a grid of points (50 km grid).

The orange points on the map have temperature and precipitation data, while the grey dots do not. We need this padding of points without values to avoid visual bugs when drawing contours on the map.

The data has been preprocessed in using mostly the JavaScript libraries simple-data-analysis and Turf.

After processing, two files were generated:

One with the normals for all North American cities and projections for Canadian cities (3 MB / 506 KB gzipped)
One with the normals for the points grid (2.6 MB / 298 KB gzipped)

These are the two files containing all the data needed for the interactive experience. When the page loads, users download them, so we need to keep them as small as possible.

Cities selection and boundaries

The original list of Canadian cities comes from the 2021 Census Population Centres. A population centre has a population of at least 1,000 and a population density of 400 persons or more per square kilometre.

The list of US and Mexican cities comes from Simple Maps.

To make our selection, we divided each country into 50 square kilometre cells. We then selected the most populated city within each cell. This ensures good spatial coverage with cities that users are likely to recognize.

There are Canadian cities and US cities in our data.

For the city dropdown menu, we show only cities with a population of 50,000 or more, including Iqaluit, Yellowknife, and Whitehorse. Users can choose between locations.

The boundaries of Canadian provinces, U.S. states, and Mexican states come from Cartography Vectors. The file has been processed and edited with MapShaper.

The black dots show the cities available to the users in the dropdown menu. The red dots are all the other cities used in the project.

Analogues calculations

Thank you to Arthur Charpentier, professor in the mathematics department at the Université du Québec à Montréal, who helped with the statistical approach.

There are three steps to our calculations:

Filtering the data
Calculating the Mahalanobis distance
Removing outliers with DBSCAN

These operations are triggered each time a user picks a different city, season, or projection scenario and are run by the user's browser. Therefore, the computations need to be lean and fast.

1) Filtering the data

In our context, we consider a data point or a city to be a spatial analogue if:

Its 1981-2010 average temperature is within the projected temperature range (first and last decile) of the city selected by the user.
Its 1981-2010 average total precipitation amount is within the projected precipitation range (first and last decile) of the city selected by the user.

However, in some cases, the range of temperatures or precipitation is very narrow. To ensure we still capture cities with similar climate patterns, we add two conditions:

If the projected temperature range is less than 2°C, we instead use the median projected temperature +/- 1°C of the city selected by the user.
If the projected precipitation range is less than 50 mm, we instead use the median projected precipitation amount +/- 25 mm of the city selected by the user.

Overall, these steps are straightforward and computationally inexpensive.

In our code, we add a key excluded to each data point. If it is outside the ranges, it is excluded.

// Types have been removed for clarity.
export default function exclude(data, cityData) {
  const tempMinRange = 2;
  const precipMinRange = 50;

  const tempRange = cityData.temp.p90 - cityData.temp.p10;
  const precipRange = cityData.precip.p90 - cityData.precip.p10;

  data.forEach((d) => {
    if (d.temp === null || d.precip === null) {
      d.excluded = true;
    } else {
      d.excluded = false;
    }

    if (d.excluded === false) {
      if (tempRange > tempMinRange) {
        if (d.temp < cityData.temp.p10 || d.temp > cityData.temp.p90) {
          d.excluded = true;
        }
      } else {
        if (Math.abs(d.temp - cityData.temp.p50) > tempMinRange / 2) {
          d.excluded = true;
        }
      }
    }

    if (d.excluded === false) {
      if (precipRange > precipMinRange) {
        if (d.precip < cityData.precip.p10 || d.precip > cityData.precip.p90) {
          d.excluded = true;
        }
      } else {
        if (Math.abs(d.precip - cityData.precip.p50) > precipMinRange / 2) {
          d.excluded = true;
        }
      }
    }
  });
}

For example, summers in Montreal with a medium-emissions scenario could average between 21.8°C (first decile) and 24.5°C (last decile) with 269 mm (first decile) to 323 mm (last decile) of precipitation.

The projected temperature range is 2.7°C, and the projected precipitation range is 54 mm.

In our data, 322 points (from our grid) and 348 cities have normals within these ranges.

2) Mahalanobis distance

Now that we have our spatial analogues, we want to rank them. We need a ranking for two reasons:

To find the most similar cities.
To draw the contours on our map.

The Mahalanobis distance is a useful method for this. It's unitless and scale-invariant, which is handy since we have temperature in Celsius and precipitation in millimeters.

To use it, we first need to calculate the inverted covariance matrix. Our gridded points provide better coverage of the continent, so we use them for these calculations.

We use the function getCovarianceMatrix from the journalism library to compute our matrix using the temp and precip variables.

const gridMatrix = getCovarianceMatrix(
  pointsAnalogues.map((d) => [d["temp"], d["precip"]]),
  { invert: true },
);

Our matrix looks like this.

[
  [1.492761540636337, -0.00043617887010252053],
  [-0.00043617887010252053, 0.004905019128766616],
];

We are now ready to compute the Mahalanobis distance for gridded points and cities. We use the function addMahalanobisDistance from the journalism library.

The function also computes a similarity index from 0 to 1 based on the distance. This index is easier to work with and understand for our audience. It is also useful for our map legend.

To compute the distance, we use the median projected values for temperature and precipitation for the city selected by the user, along with the normals for the grid points and North American cities.

const justAnalogues = [...pointsAnalogues, ...citiesAnalogues];

addMahalanobisDistance(
  { temp: cityData.temp.p50, precip: cityData.precip.p50 },
  justAnalogues,
  { similarity: true, matrix: gridMatrix },
);

For example, here are the distances and similarities between Montreal (medium-emissions summer projections) and a few cities (1981-2010 normals).

As we can see, Chicago is the closest (0.88) and most similar (0.71), which is expected since both its temperature and precipitation are closer to Montreal's median projections (22.8°C / 291 mm).

3) Removing outliers with DBSCAN

Since we use only two variables (temperature and precipitation), some analogues might appear in unusual locations.

For example, if you pick Summer in Montreal with a medium-emissions scenario and switch Remove outliers to False in the options below the map, you'll notice some analogue spots in Mexico.

Outliers on Montreal's analogues map for summer.

While it's true these areas have similar temperature and precipitation patterns, they are just small spots located far from the rest of the group.

To identify these outliers, we use a modified version of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. It groups together points that are closely packed, marking as outliers those that lie alone in low-density regions.

We adapted the TypeScript implementation addClusters from the journalism library. Traditionally, the algorithm uses a minimum number of points within a radius to define a cluster. However, since we have coastal points and lack data for points on water, we instead use a minimum percentage of neighbours with data, with a hard minimum of 3 (more about it below).

We use a radius of km and a minimum of % of neighbours with data, which means one of these two conditions needs to be met for a point to be considered part of a cluster:

At least % (with a hard minimum of 3 points) of the points with data within km are analogues (which makes it a core point)
It's within km of a core point (which makes it a border point)

If not part of a cluster, the point is considered an outlier and, in our process, is removed.

There is one exception: if the scenario is Normals, we consider the city as a core point by default.

The distance is calculated with the distance function from the journalism library, which implements the Haversine formula.

Here's how we call the different functions.

const clusterRadius = 250;
const minPercNeighbours = 0.2;

// If the scenario is 'normals', we add the city as a point.
if (scenario === "normals") {
  points.push({
    lat: cityData.lat,
    lon: cityData.lon,
    name: "cityData",
    similarity: 1,
  });
}

// To speed up the process, we filter the gridded points
// for a given season based on the bounding box of the analogues.
const normalsMapSeasonFiltered = filterGriddedPoints(points, normalsMapSeason);

// addCluster adds to two keys to each point: clusterId and clusterType.
addClusters(
  points,
  normalsMapSeasonFiltered,
  clusterRadius,
  minPercNeighbours,
  (a, b) => distance(a.lon, a.lat, b.lon, b.lat),
);

// Outliers don't have a clusterId.
const pointsNoOutliers = points.filter((d) => d.clusterId !== null);

And here is the addClusters function.

// Types has been removed for clarity.
export default function addClusters(
  data,
  normalsMapSeason,
  minDistance,
  minPercNeighbours,
  distance,
  options = {},
) {
  if (options.reset) {
    data.forEach((d) => {
      d.clusterId = undefined;
      d.clusterType = undefined;
    });
  }

  let clusterId = 0;

  for (const point of data) {
    // Skip points that are already assigned to a cluster.
    if (point.clusterId !== undefined) continue;

    // Find the neighbours of the current point.
    const neighbours = getNeighbours(point);

    // Get the minimum number of neighbours a point needs to be a core point.
    // This depends on the minimum percentage of neighbours with data. So it's
    // point specific. Exception for the city point which is added just when
    // scenario is 'normals'.
    const minNeighbours = point.name === "cityData"
      ? 0
      : getMinNeighbours(point);

    // If the point has less than the minimum number of neighbours,
    // it's noise or a border point.
    if (neighbours.length < minNeighbours) {
      // We mark the point as noise initially. We double-check later
      // if it's a border point.
      point.clusterId = null;
      point.clusterType = "noise";
    } else {
      // We know that the point is a core point, so we assign it to a new cluster.
      clusterId++;
      const clusterLabel = `cluster${clusterId}`;
      point.clusterId = clusterLabel;
      point.clusterType = "core";

      // We look for more points to add to the cluster.
      for (let i = 0; i < neighbours.length; i++) {
        const neighbour = neighbours[i];

        if (neighbour.clusterId === undefined) {
          // If the neighbour is not assigned to a cluster, we add it to the cluster.
          neighbour.clusterId = clusterLabel;

          // Find the neighbours.
          const newNeighbours = getNeighbours(neighbour);

          // Get the minimum number of neighbours it needs to be a core point.
          // Exception for the city point which is added just when scenario is 'normals'.
          const minNeighbours = neighbour.name === "cityData"
            ? 0
            : getMinNeighbours(neighbour);

          // If the neighbour is a core point, we add its neighbours to
          // the list of points to be checked.
          if (newNeighbours.length >= minNeighbours) {
            neighbour.clusterType = "core";
            neighbours.push(
              ...newNeighbours.filter((n) => !neighbours.includes(n)),
            );
          } else {
            // If not core, it's a border point.
            neighbour.clusterType = "border";
          }
        } else if (neighbour.clusterId === null) {
          // The neighbour has been classified as noise previously,
          // but now we know it's is a border point, so we add it to the cluster.
          neighbour.clusterId = clusterLabel;
          neighbour.clusterType = "border";
        }
      }
    }
  }

  // We double check if the remaining noise points are reachable
  // from a core point.
  for (const point of data) {
    if (point.clusterId === null) {
      const neighbours = getNeighbours(point);
      const clusterNeighbours = neighbours.find(
        (n) => n.clusterType === "core",
      );
      // If it's the case, it's not noise, but a border point.
      if (clusterNeighbours) {
        point.clusterId = clusterNeighbours.clusterId;
        point.clusterType = "border";
      }
    }
  }

  // Find the neighbours of a point.
  // Here, we use the analogues.
  function getNeighbours(point) {
    return data.filter((p) => distance(point, p) <= minDistance);
  }

  // Get the minimum number of neighbours a point needs to be a
  // core point. Here, we use all gridded points.
  function getMinNeighbours(point) {
    const minNeighbours = Math.round(
      normalsMapSeason.filter(
        (d) =>
          typeof d.temp === "number" &&
          typeof d.precip === "number" &&
          distance(point, d) <= minDistance,
      ).length * minPercNeighbours,
    );

    // We want a minimum of 3 neighbours.
    return Math.max(minNeighbours, 3);
  }
}

If you switch Remove outliers to false and Show clusters to true, you can see the clusters on the map.

The squares are the core points, the triangles are the border points, and the crosses are the outliers.

The bold grey lines are the combined km radii around the core points. Only points and cities within these boundaries are kept. The rest are considered noise.

Clusters to exclude outliers on Montreal's analogues map for summers.

If you switch Remove outliers back to true, you can see the result: the outliers have been removed, and the map is zoomed in on the remaining analogues.

Outliers have been excluded on Montreal's analogues map for summers.

This step to exclude outliers is quite intuitive and fast, which is critical for our interactive application.

The DBSCAN algorithm could be slow with many points, but since we have just hundreds of analogues in most cases, it is able to run in just a few milliseconds in our context.

To ensure that our city analogues are within the boundaries of the clusters (which are computed on gridded points), we run a simple distance test. If a city is within 50 km of a point, we keep it.

const cities = justAnalogues.filter(
  (d) => typeof d.name === "string" && d.similarity > 0,
);

const citiesNoOutliers = cities.filter((d) => {
  for (const p of pointsNoOutliers) {
    if (distance(d.lon, d.lat, p.lon, p.lat) <= 50) {
      return true;
    }
  }
  return false;
});

Percentage of points instead of number of neighbours

To demonstrate why we use a percentage of points instead of a number of points with our DBSCAN algorithm, let's take Montreal summers with a medium-emissions scenario as an example.

If we use New York as a point (the star on the map), we can see gridded points within km of it.

However, New York is by the water, and we don't have temperature and precipitation data for points in the ocean (black points). Only points in the cluster have data; some are analogues (yellow points), and some are not (red points).

So it doesn't make sense to use a minimum number of points to define a core point. We need to use a percentage instead to take into account the context of each point.

So, if we use a minimum percentage of %, it means that at least * = analogues should be within km of New York for it to be considered a core point.

Here, we have .

Map and labels

The map is made with Observable Plot. For the contours, we use both cities and gridded points.

To pick the most relevant cities to be labeled, we first divide the area covered by the analogues into four cells.

We preselect cities that have a similarity index greater than the median.

Then, in each cell, we pick the city that has:

The highest similarity, if any
The largest population, if any

If this step results in two cities or fewer, we repeat the process with gridded points to make sure we label the map.

At this step, the map looks like this for Montreal in Summer with a Medium-emissions scenario.

We remove any overlapping labels by first rendering the map and then looping through the labels (which are now SVG elements). In case of an overlap of their bounding boxes (plus some padding), the city with the smallest population is removed.

Here's the result.

This technique works great with all screen sizes. On smaller screens, more overlapping will occur, so more labels will be removed, but there will always be a good mix of relevant and populous cities.

Here's the result on an iPhone SE (375 x 667).

These steps ensure that we have a good mix of relevant cities that users have a chance to recognize.

Tests

All possible combinations of the dropdown menus ( Canadian cities * 4 scenarios * 4 seasons) have been tested.

Overall, combinations have failed to find any spatial analogue clusters.

Here's the breakdown per season and scenario:

Winter:
- Normals:
- Low-emissions:
- Medium-emissions:
- High-emissions:
Spring:
- Normals:
- Low-emissions:
- Medium-emissions:
- High-emissions:
Summer:
- Normals:
- Low-emissions:
- Medium-emissions:
- High-emissions:
Fall:
- Normals:
- Low-emissions:
- Medium-emissions:
- High-emissions:

And here's the table with all the results. A ✅ means that at least one cluster has been found, while ❌ means no cluster has been found.

Contact

Questions? Comments? Reach out to Senior Data Producer Nael Shiab.