MODULE read_csv_mod

Overview

The read_csv_mod module provides lightweight utility routines for reading simple ASCII, CSV-like tabular data products. It is intended for use in DART observation converters and preprocessing workflows that need column-based access to non-NetCDF data without additional external dependencies.

The module supports files with exactly one header row followed by data rows, where fields are separated by a single-character delimiter (typically comma or semicolon).

Public types

type(csv_file_type)

A handle that stores cached information about an open CSV file. All components are private and must be accessed via module procedures.

type csv_file_type
   private

   character(len=256) :: filename
   integer            :: nrows
   integer            :: ncols
   integer            :: iunit
   character          :: delim
   character(len=512) :: fields(MAX_NUM_FIELDS)
   logical            :: is_open
end type csv_file_type

Public interfaces and routines

csv_open(fname, cf, forced_delim, context)

Opens a delimited text file, reads and caches the header, determines the number of data rows, and initializes the CSV handle.

csv_close(cf)

Closes the file unit (if open) and resets the handle.

csv_get_nrows(cf)

Returns the number of data rows in the file (excluding the header).

csv_get_field(cf, varname, varvals, context)

Generic interface to read a column by header name. Dispatches to type-specific implementations for character, integer, or real output arrays.

Note

  • The output array must be sized exactly to csv_get_nrows(cf).

  • Header name matching is case-insensitive.

  • The file is rewound and the header skipped internally for each call.

csv_get_field_index(cf, varname)

Returns the 1-based column index of varname or -1 if the field is not found.

csv_field_exists(cf, varname)

Returns .true. if varname exists in the cached header.

csv_print_header(cf)

Prints the cached header fields and their indices to standard output.

get_csv_words_from_string(inline, delim, wordcount, words)

Splits a single line into delimiter-separated fields using CSV-like parsing rules.

Typical usage

A CSV file is opened using a CSV-type handle. Output arrays are then allocated using the number of data rows in the file. Data are read one column at a time as follows:

use read_csv_mod, only : csv_file_type, csv_open, csv_close, csv_get_nrows, csv_get_field
use types_mod,    only : r8

type(csv_file_type) :: cf
real(r8), allocatable :: lat(:)

call csv_open('input.csv', cf)

allocate(lat(csv_get_nrows(cf)))
call csv_get_field(cf, 'lat', lat)

call csv_close(cf)

What this module does

  • Reads delimited ASCII tables with:

    • one header row containing field names

    • one data record per subsequent line

    • a single-character delimiter

  • Opens the file once and caches metadata in a csv_file_type handle:

    • filename and Fortran unit number

    • detected or user-specified delimiter

    • header field names and number of columns

    • number of data rows (excluding the header)

  • Provides column-based access by header name using the generic interface csv_get_field(), returning:

    • character strings (raw field contents)

    • integers (via string_to_integer)

    • reals (via string_to_real)

  • Allows repeated access to different columns without reopening the file. Internally, the file is rewound and the header skipped for each read.

  • Handles numeric conversion failures non-fatally:

    • values that cannot be converted to integer are returned as MISSING_I

    • values that cannot be converted to real are returned as MISSING_R8

  • If a file has extra lines on top (typically including links and data source information) before the actual header, these can be skipped when opening the file with csv_open using the optional argument skip_lines

  • Treats backslash (\) as an escape character, preventing interpretation of the following character during parsing.

What this module does not do

  • It does not implement the full CSV specification (e.g., RFC 4180). The parser is intentionally simple and designed for common, well-behaved tabular files.

  • It does not support:

    • comment lines or metadata blocks

    • embedded newlines inside quoted fields

  • It does not infer or mix data types within a single read. Each call to csv_get_field() reads a single column into a single output type (character, integer, or real). If a column contains mixed content (e.g., numeric and non-numeric values), numeric conversion will produce missing values for the non-conforming entries rather than raising an error.

  • It does not read multiple columns in a single pass. Each call to csv_get_field() rewinds and scans the file again, which may be inefficient when many columns are required.

Delimiter behavior: forced vs detected

By default, csv_open() detects the delimiter from the header line using a simple heuristic that distinguishes between comma (,) and semicolon (;) characters. If semicolons occur more frequently than commas in the header, the delimiter is assumed to be a semicolon; otherwise a comma is used.

If the optional forced_delim argument is provided to csv_open(), delimiter detection is skipped and the specified delimiter is used instead.