.. index :: csv MODULE read_csv_mod =================== .. contents:: :depth: 2 :local: Overview -------- The ``read_csv_mod`` module provides lightweight utility routines for reading simple ASCII, CSV-like tabular data products. It is intended for use in DART observation converters and preprocessing workflows that need column-based access to non-NetCDF data without additional external dependencies. The module supports files with exactly one header row followed by data rows, where fields are separated by a single-character delimiter (typically comma or semicolon). Public types ------------ .. rubric:: ``type(csv_file_type)`` A handle that stores cached information about an open CSV file. All components are private and must be accessed via module procedures. .. container:: routine :: type csv_file_type private character(len=256) :: filename integer :: nrows integer :: ncols integer :: iunit character :: delim character(len=512) :: fields(MAX_NUM_FIELDS) logical :: is_open end type csv_file_type Public interfaces and routines ------------------------------ .. rubric:: ``csv_open(fname, cf, forced_delim, context)`` Opens a delimited text file, reads and caches the header, determines the number of data rows, and initializes the CSV handle. .. rubric:: ``csv_close(cf)`` Closes the file unit (if open) and resets the handle. .. rubric:: ``csv_get_nrows(cf)`` Returns the number of data rows in the file (excluding the header). .. rubric:: ``csv_get_field(cf, varname, varvals, context)`` Generic interface to read a column by header name. Dispatches to type-specific implementations for character, integer, or real output arrays. .. note:: - The output array must be sized exactly to ``csv_get_nrows(cf)``. - Header name matching is case-insensitive. - The file is rewound and the header skipped internally for each call. .. rubric:: ``csv_get_field_index(cf, varname)`` Returns the 1-based column index of ``varname`` or -1 if the field is not found. .. rubric:: ``csv_field_exists(cf, varname)`` Returns ``.true.`` if ``varname`` exists in the cached header. .. rubric:: ``csv_print_header(cf)`` Prints the cached header fields and their indices to standard output. .. rubric:: ``get_csv_words_from_string(inline, delim, wordcount, words)`` Splits a single line into delimiter-separated fields using CSV-like parsing rules. Typical usage ------------- A CSV file is opened using a CSV-type handle. Output arrays are then allocated using the number of data rows in the file. Data are read one column at a time as follows: .. code-block:: fortran use read_csv_mod, only : csv_file_type, csv_open, csv_close, csv_get_nrows, csv_get_field use types_mod, only : r8 type(csv_file_type) :: cf real(r8), allocatable :: lat(:) call csv_open('input.csv', cf) allocate(lat(csv_get_nrows(cf))) call csv_get_field(cf, 'lat', lat) call csv_close(cf) What this module does --------------------- - Reads delimited ASCII tables with: - one header row containing field names - one data record per subsequent line - a single-character delimiter - Opens the file once and caches metadata in a ``csv_file_type`` handle: - filename and Fortran unit number - detected or user-specified delimiter - header field names and number of columns - number of data rows (excluding the header) - Provides column-based access by header name using the generic interface ``csv_get_field()``, returning: - character strings (raw field contents) - integers (via ``string_to_integer``) - reals (via ``string_to_real``) - Allows repeated access to different columns without reopening the file. Internally, the file is rewound and the header skipped for each read. - Handles numeric conversion failures non-fatally: - values that cannot be converted to integer are returned as ``MISSING_I`` - values that cannot be converted to real are returned as ``MISSING_R8`` - Treats backslash (``\``) as an escape character, preventing interpretation of the following character during parsing. What this module does not do ---------------------------- - It does not implement the full CSV specification (e.g., RFC 4180). The parser is intentionally simple and designed for common, well-behaved tabular files. - It does not support: - multiple header lines - comment lines or metadata blocks - embedded newlines inside quoted fields - It does not infer or mix data types within a single read. Each call to ``csv_get_field()`` reads a *single column* into a single output type (character, integer, or real). If a column contains mixed content (e.g., numeric and non-numeric values), numeric conversion will produce missing values for the non-conforming entries rather than raising an error. - It does not read multiple columns in a single pass. Each call to ``csv_get_field()`` rewinds and scans the file again, which may be inefficient when many columns are required. Delimiter behavior: forced vs detected -------------------------------------- By default, ``csv_open()`` detects the delimiter from the header line using a simple heuristic that distinguishes between comma (`,`) and semicolon (`;`) characters. If semicolons occur more frequently than commas in the header, the delimiter is assumed to be a semicolon; otherwise a comma is used. If the optional ``forced_delim`` argument is provided to ``csv_open()``, delimiter detection is skipped and the specified delimiter is used instead.