MODULE read_csv_mod

Overview 

The read_csv_mod module provides lightweight utility routines for reading simple ASCII, CSV-like tabular data products. It is intended for use in DART observation converters and preprocessing workflows that need column-based access to non-NetCDF data without additional external dependencies.

The module supports files with exactly one header row followed by data rows, where fields are separated by a single-character delimiter (typically comma or semicolon).

Public types 

type(csv_file_type)

A handle that stores cached information about an open CSV file. All components are private and must be accessed via module procedures.

type csv_file_type
   private

   character(len=256) :: filename
   integer            :: nrows
   integer            :: ncols
   integer            :: iunit
   character          :: delim
   character(len=512) :: fields(MAX_NUM_FIELDS)
   logical            :: is_open
end type csv_file_type

Public interfaces and routines 

csv_open(fname, cf, forced_delim, context)

Opens a delimited text file, reads and caches the header, determines the number of data rows, and initializes the CSV handle.

csv_close(cf)

Closes the file unit (if open) and resets the handle.

csv_get_nrows(cf)

Returns the number of data rows in the file (excluding the header).

csv_get_field(cf, varname, varvals, context)

Generic interface to read a column by header name. Dispatches to type-specific implementations for character, integer, or real output arrays.

Note

The output array must be sized exactly to csv_get_nrows(cf).
Header name matching is case-insensitive.
The file is rewound and the header skipped internally for each call.

csv_get_field_index(cf, varname)

Returns the 1-based column index of varname or -1 if the field is not found.

csv_field_exists(cf, varname)

Returns .true. if varname exists in the cached header.

csv_print_header(cf)

Prints the cached header fields and their indices to standard output.

get_csv_words_from_string(inline, delim, wordcount, words)

Splits a single line into delimiter-separated fields using CSV-like parsing rules.

Typical usage 

A CSV file is opened using a CSV-type handle. Output arrays are then allocated using the number of data rows in the file. Data are read one column at a time as follows:

use read_csv_mod, only : csv_file_type, csv_open, csv_close, csv_get_nrows, csv_get_field
use types_mod,    only : r8

type(csv_file_type) :: cf
real(r8), allocatable :: lat(:)

call csv_open('input.csv', cf)

allocate(lat(csv_get_nrows(cf)))
call csv_get_field(cf, 'lat', lat)

call csv_close(cf)

What this module does 

Reads delimited ASCII tables with:
- one header row containing field names
- one data record per subsequent line
- a single-character delimiter
Opens the file once and caches metadata in a csv_file_type handle:
- filename and Fortran unit number
- detected or user-specified delimiter
- header field names and number of columns
- number of data rows (excluding the header)
Provides column-based access by header name using the generic interface csv_get_field(), returning:
- character strings (raw field contents)
- integers (via string_to_integer)
- reals (via string_to_real)
Allows repeated access to different columns without reopening the file. Internally, the file is rewound and the header skipped for each read.
Handles numeric conversion failures non-fatally:
- values that cannot be converted to integer are returned as MISSING_I
- values that cannot be converted to real are returned as MISSING_R8
If a file has extra lines on top (typically including links and data source information) before the actual header, these can be skipped when opening the file with csv_open using the optional argument skip_lines
Treats backslash (\) as an escape character, preventing interpretation of the following character during parsing.

What this module does not do 

It does not implement the full CSV specification (e.g., RFC 4180). The parser is intentionally simple and designed for common, well-behaved tabular files.
It does not support:
- comment lines or metadata blocks
- embedded newlines inside quoted fields
It does not infer or mix data types within a single read. Each call to csv_get_field() reads a single column into a single output type (character, integer, or real). If a column contains mixed content (e.g., numeric and non-numeric values), numeric conversion will produce missing values for the non-conforming entries rather than raising an error.
It does not read multiple columns in a single pass. Each call to csv_get_field() rewinds and scans the file again, which may be inefficient when many columns are required.

Delimiter behavior: forced vs detected 

By default, csv_open() detects the delimiter from the header line using a simple heuristic that distinguishes between comma (,) and semicolon (;) characters. If semicolons occur more frequently than commas in the header, the delimiter is assumed to be a semicolon; otherwise a comma is used.

If the optional forced_delim argument is provided to csv_open(), delimiter detection is skipped and the specified delimiter is used instead.

MODULE read_csv_mod

Overview

Public types

Public interfaces and routines

Typical usage

What this module does

What this module does not do

Delimiter behavior: forced vs detected