Converting oceanographic data from MYO Arakawa-C ORCA025 native grid to WGS84

These days, there is a plethora of global oceanographic data available. To name but a few, ESRI’s Ecological Marine Units offers fifty-year averages of temperature, salinity, dissolved oxygen, nitrate, phosphate, and silicate at a horizontal resolution of 0.25˚ x 0.25˚ (~27 km x 27 km at the equator), and across 102 different depth zones. Bio-ORACLE offers average measurements of some 18 different environmental variables at the surface and the bottom of the ocean at three different time period (present: 2000 - 2014; future 1: 2040 - 2050; future 2: 2090 - 2100). NASA’s MODIS-AQUA provides satellite remote-sensed measurements of, amongst other variables, chlorophyll, calcite, and particulate organic carbon on a as short a time period as a day - and at a resolution of just 4 km. Because it is daily, the data is much more patchy across the globe.

For my work on the spatial-temporal variability of capelin habitat in Atlantic Canada, I have decided to go with two “off the shelf” products from Mercator Ocean - GLORYS and BIOMER. Together these datasets give me temperature, salinity, oxygen, chlorophyll, sea surface height, and mixed layer depth.

Both GLORYS and BIOMER offer daily and monthly average measurements at a horizontal resolution of 0.25˚ x 0.25˚ across 75 different depth levels. Because they are models (GLORYS is a reanalysis, and BIOMER a non-assimilative hindcast), they provide global coverage. Both are available to download for free from the E.U.’s Copernicus Marine Environment Monitoring Service (CMEMS) in netCDF format.

For people like me, however, the BIOMER netCDFS aren’t quite “off the shelf” usable…

The GLORYS netCDFs come in good old WGS 84 (EPSG 4326). This datum is easy to project into whatever projected coordinate system we need to do our analysis or create our maps - like an Albers equal-area or a Lambert conformal conic. The BIOMER netCDFs, on the other hand, come in the rather funky MYO Arakawa-C ORCA025 native grid. On a basic level, the ORCA025 grid is a tri-polar grid used in the PICES biogeochemical model (which BIOMER utilises). It comprises of a series of embedded ellipses which creates a non-uniform grid. What this means is that it is not as straightforward to project into our chosen coordinate system - but it is possible.

Below are the steps that I took to convert my BIOMER netCDFs into WGS 84. Note I used a Windows-based computer.

The BIOMER product comes in a MYO Arakawa-C ORCA025 native grid. This particular layer shows average Chlorophyll concentrations at the surface in January 1998.

The BIOMER product comes in a MYO Arakawa-C ORCA025 native grid. This particular layer shows average Chlorophyll concentrations at the surface in January 1998.

Download the data from Copernicus

You can browse all the data available in Copernicus’ catalogue, but to download any data you will need to set up an account. The process is very easy, and is completely free of charge. Once you have an account, the data is also free. Note that it took a day for my account to be set up, so make sure you don’t leave this part of the process to the last minute!

Downloading the data is very easy and is described by Copernicus in this helpful guide. The TLDR version is…go to the catalogue, find the product you want, and hit the “add to cart” button. Once you have the products you want in your cart (you can see it on the top right of the catalogue page), click on the cart then select ‘data download’. You’ll then be taken to another page where you can select the geographical area, the time range, the depth range, and the variables in the product you actually want.

If you want to download a little data, this process works well. If like me however you have a lot of data, this is not the process for you. First you need to download the variables you want one by one, then there is the data download limit of 2048 MB. This means that if you want to download 20 years worth of global data at all depths, going through the steps over and over and over, you are going to find the process excruciating.

Thankfully David Bazin (CMEMS) has created a rather nice method for us to download lots of data.

Downloading a lot of data from Copernicus

There are a few things that you will need to have installed to be able to bulk-download:

  • A .gz extractor - I used Peazip

  • Python 2.7.12

  • A source code editor for the Python script (I use the IDLE GUI that comes with Python)

  • The latest version of the motu-client-python. Download it from GitHub then extract the archive (this is why you need a .gz extractor). Then extract the resulting .tar file. Make sure you keep a note of where you extract the files to!

David’s method involves running a simple Python script. For us non-Python/non-technical coding types, thankfully David explains his script well and it is easy to edit it to get the exact data you want. Read his instructions and access the script here.

Once you have edited the script for things like the product name and depth range, all you have to do is save the script then run it. I ran my script straight from IDLE (just hit F5). Depending on how much data you are downloading, and the speed of your connection etc., this can take a while. When I downloaded the BIOMER dataset, I selected the whole world, all depths, and for the time range from 1998 to 2016. It took two days!

A couple of things to note:

  • The Python version: In David’s post, he recommends Python 2.7.14 to 2.7.15. For me, the Motu-Client would not run with these versions of Python. I was successful with Python 2.7.12.

  • If you need to install Python, the script assumes Python is stored directly on your home drive (\Python27\). If you want to install it somewhere else, you will need to modify the script to take this into account.

  • I found that a few of the datasets did not download for one reason or another. Thankfully David thought about this when he wrote his script. All failed downloads are recorded in a logfile which you will find in the same folder as your data has been downloaded to. Just open it (it’s a simple text file), see what didn’t work, and then edit the Python script to download these missing datasets.

Convert the BIOMER NetCDFs

So now you have a bunch of BIOMER NetCDFs. To convert the NetCDFs to WGS 84, I used the "modestR" methods starting on slide 13, which uses the Climate Data Operators (CDO) tools on Linux. For Windows users, you can use Cygwin to run these commands which is a "Unix-like environmental and command-line interface". The methods from modestR (with some minor changes from me) are as follows:

1. Download and install Cygwin. Accept the default settings for installation until you reach the 'select packages' step.

2. At the 'select packages' step, change 'View' to 'Full' and then search for and install* the following packages:

  • netcdf

  • udunits

  • libproj

  • gcc-core

  • gcc-g++

*To install, click on 'Skip' until the status changes to install. Note that there may be several version options you can just click through. Pick the newest version (the newest version number is indicated in the 'Current' column on the left).

3. Continue the installation until it is complete.

4. Download the latest version of CDO. Unzip the file, and extract to the \cygwin\bin folder that has been automatically created (note there is only one file - cdo.exe).

5. You need to create a ‘grid file’ so CDO can reproject/interpolate the ORCA grid to WGS84. Create a "grid.txt" file to tell CDO with the following information:

gridtype = lonlat
xsize = Number of columns you want the raster to have
ysize = Number of rows you want the raster to have
xfirst = Longitude of the left side of the raster
xinc = Horizontal (x) size of the cells you want the raster to have
yfirst = Latitude of the bottom side of the raster
yinc = vertical (y) size of the cells you want the raster to haved!</p>

The example grid file ModestR offers in their instructions are for a NetCDF with longitude boundaries -180 to 180, and latitude boundaries -77 to 89:

gridtype = lonlat
xsize = 1440
ysize = 664
xfirst = -180
xinc = 0.25
yfirst = -77
yinc = 0.25

Note that xfirst + xsize * xinc should = the longitude of the right side of the raster, and that yfirst + ywize * yinc should = the latitude at the top side of the raster.

Make sure you save your grid file to \cygwin\home\username (the username is the Windows account username you installed Cygwin on).

6. Copy the netCDF files that you need processing to \cygwin\home\username.

7. In \cygwin\home\username create a new folder called output.

8. Launch the Cygwin program. In the command prompt enter the following:

for i in $(ls); do cdo remapnn,grid.txt ${i} output/${i}; done 

This code does the following:

  • for i in $(ls); - this says run through all the files in the default folder (\cygwin\home\username)

  •  do cdo remapnn - run the CDO program and use the remapnn function (that lets you reproject using nearest neighbour interpolation - this helps to preserve the values)

  • grid.txt - tells CDO where to find the instructions for reprojecting (this is the grid file you created)

  • ${i} output/${i} - add each reprojected file to the folder output, and append to the end of the file name

  • done - once the loop is complete, stop running

The output files can be found in your \cygwin\home\username\output folder.

The surface Chlorophyll dataset from BIOMER in WGS 84, after conversion using the CDO operators

The surface Chlorophyll dataset from BIOMER in WGS 84, after conversion using the CDO operators

'Trimming' the netCDFs by longitude/latitude, and then depth.

Here I have data that covers the whole world, but in reality I want a smaller region for the first part of my project. To trim by longitude/latitude and by depth, no further installations are needed.  As above for the projection, I have created loops to run through all the files in the default folder (\cygwin\home\username) and then put the output into the folder output. Make sure you remove your ‘raw’ netCDFs out of the default folder, and add them to your newly projected netCDFs. You will also want to make sure your output folder is empty.

Note I ran the trim by longitude/latitude and depth in two separate loops.

Trim by longitude/latitude

To trim to a longitude/latitude box from longitude -70 to -43, and latitude 62 to 38 enter the following in the command prompt:

for i in $(ls); do cdo sellonlatbox,-70.0,-43.0,62.0,38.0 ${i} output/${i}; done 

Trim by depth

You can get a list of the different depth layers available in an individual netCDF by entering the following in the command prompt (all the BIOMER netCDFs have the same depth layers, so this process only needs to be done once)

cdo showlevel file_name_you_want_to_check

To keep only certain depths (e.g. 1045.85425, 1151.99121, 1265.86145, and 1387.37695) enter the following in the command prompt:

for i in $(ls); do cdo sellevel,1045.85425,1151.99121,1265.86145,1387.37695 ${i} output/${i}; done
The BIOMER netCDF has now been trimmed to a smaller longitude/latitude

The BIOMER netCDF has now been trimmed to a smaller longitude/latitude

CDO is great! What else can you do?

 Well… lots! you can find a complete list of the CDO operators including some simple examples at the MPIMET.MPG website.