MODULE read_csv_mod
Overview
The read_csv_mod module provides lightweight utility routines for reading
simple ASCII, CSV-like tabular data products. It is intended for use in DART
observation converters and preprocessing workflows that need column-based
access to non-NetCDF data without additional external dependencies.
The module supports files with exactly one header row followed by data rows, where fields are separated by a single-character delimiter (typically comma or semicolon).
Public types
type(csv_file_type)
A handle that stores cached information about an open CSV file. All components are private and must be accessed via module procedures.
type csv_file_type
private
character(len=256) :: filename
integer :: nrows
integer :: ncols
integer :: iunit
character :: delim
character(len=512) :: fields(MAX_NUM_FIELDS)
logical :: is_open
end type csv_file_type
Public interfaces and routines
csv_open(fname, cf, forced_delim, context)
Opens a delimited text file, reads and caches the header, determines the number of data rows, and initializes the CSV handle.
csv_close(cf)
Closes the file unit (if open) and resets the handle.
csv_get_nrows(cf)
Returns the number of data rows in the file (excluding the header).
csv_get_field(cf, varname, varvals, context)
Generic interface to read a column by header name. Dispatches to type-specific implementations for character, integer, or real output arrays.
Note
The output array must be sized exactly to
csv_get_nrows(cf).Header name matching is case-insensitive.
The file is rewound and the header skipped internally for each call.
csv_get_field_index(cf, varname)
Returns the 1-based column index of varname or -1 if the field is not found.
csv_field_exists(cf, varname)
Returns .true. if varname exists in the cached header.
csv_print_header(cf)
Prints the cached header fields and their indices to standard output.
get_csv_words_from_string(inline, delim, wordcount, words)
Splits a single line into delimiter-separated fields using CSV-like parsing rules.
Typical usage
A CSV file is opened using a CSV-type handle. Output arrays are then allocated using the number of data rows in the file. Data are read one column at a time as follows:
use read_csv_mod, only : csv_file_type, csv_open, csv_close, csv_get_nrows, csv_get_field
use types_mod, only : r8
type(csv_file_type) :: cf
real(r8), allocatable :: lat(:)
call csv_open('input.csv', cf)
allocate(lat(csv_get_nrows(cf)))
call csv_get_field(cf, 'lat', lat)
call csv_close(cf)
What this module does
Reads delimited ASCII tables with:
one header row containing field names
one data record per subsequent line
a single-character delimiter
Opens the file once and caches metadata in a
csv_file_typehandle:filename and Fortran unit number
detected or user-specified delimiter
header field names and number of columns
number of data rows (excluding the header)
Provides column-based access by header name using the generic interface
csv_get_field(), returning:character strings (raw field contents)
integers (via
string_to_integer)reals (via
string_to_real)
Allows repeated access to different columns without reopening the file. Internally, the file is rewound and the header skipped for each read.
Handles numeric conversion failures non-fatally:
values that cannot be converted to integer are returned as
MISSING_Ivalues that cannot be converted to real are returned as
MISSING_R8
If a file has extra lines on top (typically including links and data source information) before the actual header, these can be skipped when opening the file with
csv_openusing the optional argumentskip_linesTreats backslash (
\) as an escape character, preventing interpretation of the following character during parsing.
What this module does not do
It does not implement the full CSV specification (e.g., RFC 4180). The parser is intentionally simple and designed for common, well-behaved tabular files.
It does not support:
comment lines or metadata blocks
embedded newlines inside quoted fields
It does not infer or mix data types within a single read. Each call to
csv_get_field()reads a single column into a single output type (character, integer, or real). If a column contains mixed content (e.g., numeric and non-numeric values), numeric conversion will produce missing values for the non-conforming entries rather than raising an error.It does not read multiple columns in a single pass. Each call to
csv_get_field()rewinds and scans the file again, which may be inefficient when many columns are required.
Delimiter behavior: forced vs detected
By default, csv_open() detects the delimiter from the header line using a
simple heuristic that distinguishes between comma (,) and semicolon (;)
characters. If semicolons occur more frequently than commas in the header,
the delimiter is assumed to be a semicolon; otherwise a comma is used.
If the optional forced_delim argument is provided to csv_open(),
delimiter detection is skipped and the specified delimiter is used instead.