Reference#

yarm#

yarm: Yet Another Report Maker.

This page contains documentation for almost every function in yarm.

Developers are most likely to find it useful, but users may find some critical necessary detail here as well.

No matter what, you’ll want to read the introductory documentation first, such as:

Main#

Command-line interface.

See Usage.

Settings#

Settings class.

class yarm.settings.Settings#

Define global settings.

When you need a setting in a function, make an instance of this class. These settings should not be changed elsewhere. Treat them as constants.

Validate#

Validate configuration file.

class yarm.validate.Slug#

Class to use slugify to make spelling consistent.

validate_scalar(chunk)#

Use slugify to make spelling consistent.

Note

We use _ rather than - for the separator.

The underscore seems more Pythonic.

Also, ansible config files seem to favor underscores. See: https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html.

class yarm.validate.StrNotEmpty#

A string that must not be empty.

static validate_scalar(chunk)#

Invalidate if string is empty.

Parameters:

chunk (YAMLChunk) – YAML to be validated

Returns:

Validated string

Return type:

str

yarm.validate.check_is_file(list_of_paths, key)#

For each item in a list, check that the value is a file.

Parameters:
  • list_of_paths (List[str | Dict]) – List of strings or dictionaries

  • key (str | None) – If dictionaries, this is the key for the path (e.g. path)

yarm.validate.check_key(key, config_yaml)#

Check whether a key exists in configuration YAML.

Parameters:
  • key (str) – Name of key

  • config_yaml (YAML) – Configuration to check

Returns:

name of key if present, None if not

Return type:

str | None

yarm.validate.get_default_config()#

Return default configuration.

Return type:

Nob

yarm.validate.msg_validating_key(key, suffix=None, verbose=1)#

Show a message that a key is being validated.

Parameters:
  • key (str) – Key being validated

  • suffix (str | None) – String to add after message

  • verbose (int) – Minimum verbosity level required to show this message

yarm.validate.revalidate_yaml(yaml, schema, config_path, msg_key=None, msg_suffix=None)#

Revalidate configuration YAML from config_path according to schema.

Parameters:
  • yaml (YAML) – YAML to revalidate

  • schema (Map | MapPattern | Seq) – Schema to revalidate this YAML against

  • config_path (str) – File in which this configuration YAML was found

  • msg_key (str | None) – Message that this key is validating

  • msg_suffix (str | None) – Message suffix

yarm.validate.validate_config(config_path)#

Validate config file before running report.

Parameters:

config_path (str) – Path to configuration file

Returns:

Validated configuration

Return type:

YAML

yarm.validate.validate_config_edited(config_yaml)#

Check whether config has been edited.

Parameters:

config_yaml (YAML) – Report configuration

Returns:

True if configuration has been edited, aborts otherwise.

Return type:

bool

yarm.validate.validate_config_schema(config_path)#

Return YAML for config file if it validates against top-level schema.

Parameters:

config_path (str) – Path to config file

Returns:

Configuration validated against top-level schema.

Return type:

YAML

yarm.validate.validate_key_import(config_yaml, config_path)#

Validate config key: import.

import:
  - path: MODULE_A.py
  - path: MODULE_B.py

This key allows the user to import their own custom Python code. Any imported function can be applied to the results of a query using the postprocess key.

Warning

If more than one module in this list defines the same function, the later module in the list will silently override the previous definition.

This may be desired behavior, but only if you expect it.

Parameters:
  • config_yaml (YAML) – Configuration to validate

  • config_path (str) – Configuration file

See also

yarm.validate.validate_key_input(config_yaml, config_path)#

Validate config key: input.

input:
  strip: false
  slugify_columns: false
  lowercase_columns: false
  uppercase_rows: false
  include_index: false
Parameters:
  • config_yaml (YAML) – Configuration to validate

  • config_path (str) – Configuration file

yarm.validate.validate_key_output(config_yaml, config_path)#

Validate config key: output.

output:
  dir: output
  basename: BASENAME
  export_tables: csv
  export_queries: csv
  styles:
    column_width: 15
Parameters:
  • config_yaml (YAML) – Configuration to validate

  • config_path (str) – Configuration file

yarm.validate.validate_key_output_dir(config_yaml)#

Prepare output directory.

Parameters:

config_yaml (YAML) – Report configuration

yarm.validate.validate_key_queries(config_yaml, config_path)#

Validate config key: queries.

queries:
  - name: QUERY A
    sql: SELECT * FROM table_from_spreadsheet AS s;

  - name: QUERY B
    # For the SQL, you can use a multiline string for readability.
    sql: >
      SELECT
      *
      FROM
      table_from_spreadsheet AS s
      JOIN
      table_from_csv AS c
      ON
      s.id = c.id
      ;
    replace:
      COLUMN_A:
        MATCH A1: REPLACE A1
        # You may want to quote strings with spaces and punctuation.
        "MATCH A2": "REPLACE A2"
      COLUMN_B:
        MATCH B1: REPLACE B1
    postprocess: postprocess_function
Parameters:
  • config_yaml (YAML) – Configuration to validate

  • config_path (str) – Configuration file

Important

A postprocess function is defined by the user in a separate Python file, which must be imported with the import: key. See yarm.validate.validate_key_import()

See also

yarm.validate.validate_key_tables_config(config_yaml, config_path)#

Validate config key: tables_config.

tables_config:
  TABLE_NAME_A:
    - path: SOURCE_A1.csv
      include_index: false
    - path: SOURCE_A2.csv
  TABLE_NAME_B:
    - path: SOURCE_B.xlsx
      sheet: B.1
      pivot:
        index: ID_COLUMN
        columns: KEY_COLUMN
        values: VALUE_COLUMN
      datetime:
        # You can supply a custom format string.
        COLUMN_1: "%Y-%m"
        # To use default datetime format, omit format string.
        COLUMN_2:
        # Spaces or punctuation in the column name? Add quotes.
        "COLUMN 3":
Parameters:
  • config_yaml (YAML) – Configuration to validate

  • config_path (str) – Configuration file

yarm.validate.validate_minimum_required_keys(config_yaml)#

Check whether config has minimum required keys.

Parameters:

config_yaml (YAML) – Configuration to validate

Returns:

True if config has minimum required keys, abort otherwise.

Return type:

bool

Important

To modify which keys are required to run a report, update this function.

Tables#

Create tables from validated configuration.

yarm.tables.back_up_column(df, col)#

Save a backup copy of a column.

Parameters:
  • df (DataFrame) – data we are manipulating

  • col (str) – column to back up

Returns:

Data with copy of column col that has _raw appended to the column name. The original column can now safely be manipulated by further code.

yarm.tables.concat_dfs(conn, table_name, orig_df, new_df, input_file)#

Merge two dataframes with pd.concat into a single table.

Parameters:
  • conn – Temporary database in memory

  • table_name (str) – Table we are creating or appending to

  • orig_df (DataFrame) – Existing dataframe

  • new_df (DataFrame) – New dataframe we want to merge

  • input_file (str) – Actual file with source data

Returns:

Merged dataframe

Return type:

DataFrame

yarm.tables.create_table_df(conn, config, table_df, table_name, table, source, exists_mode)#

Create or append to a table from a configured source.

Note

Each table is defined by a list of one or more sources, all of which are merged into a single table.

This function is called separately for each source.

(See yarm.validate.validate_key_tables_config.)

Parameters:
  • conn (Connection) – Temporary database in memory

  • config (Nob) – Report configuration

  • table_df (None | DataFrame) – None if this table is new, otherwise the existing table

  • table_name (str) – Table we are creating or appending to

  • table (NobView) – Table configuration

  • source – Source configuration

  • exists_mode (str) – replace for a new table, otherwise append

Returns:

DataFrame of our new or updated table

Return type:

DataFrame

yarm.tables.create_tables(conn, config)#

Read data from tables_config: into database and create tables.

Parameters:
  • conn (Connection) – Temporary database in memory

  • config (Nob) – Report configuration

yarm.tables.df_input_options(df, config)#

Process input data using the options in input: key.

These options are applied to every input file.

Important

If you modify these options, you must also modify yarm.validate.validate_key_input()

Note

For per-source options, see df_tables_config_options().

Parameters:
  • df (DataFrame) – Data we will manipulate

  • config (Nob) – Report configuration

Returns:

Data with options applied

Return type:

DataFrame

yarm.tables.df_tables_config_options(df, source_config, table_name, input_file)#

Process options for a particular source in a table.

Important

If you modify these source options, you must also modify yarm.validate.validate_key_tables_config().

That function also ensures that all necessary keys are present (e.g., that if a pivot stanza is present, it also has index, columns, and values).

Note

For input: options applied to all input files, see df_input_options().

Parameters:
  • df (DataFrame) – Table we will modify

  • source_config (NobView) – Configuration for this source

  • table_name (str) – Name of this table

  • input_file – Path to this source data

Returns:

Updated table, with options applied from this source

Return type:

DataFrame

yarm.tables.get_include_index_all(config)#

Set default input:include_index for all tables.

Parameters:

config (Nob) – Report configuration

Returns:

Default value for whether to include the index in each table

Return type:

bool

Note

This value can be overridden by each particular table.

yarm.tables.get_include_index_table(table, table_name, include_index_all)#

Set include_index for a particular table.

Parameters:
  • table (NobView) – Configuration for this table

  • table_name (str) – Name for this table

  • include_index_all (bool) – Default include_index value

Returns:

Whether to include the index in this table

Return type:

bool

yarm.tables.input_source(input_format, conn, config, source_config, table_name, table_df, input_file, input_sheet)#

Input a source into a table DataFrame.

Parameters:
  • input_format (str) – Format for this source (e.g. CSV)

  • conn – Temporary database in memory

  • config (Nob) – Report configuration

  • source_config (NobView) – Configuration for this source

  • table_name (str) – Table we are creating or appending to

  • table_df (DataFrame | None) – None if this table is new, otherwise the existing table

  • input_file (str) – Actual file with source data

  • input_sheet (int | str | None) – Name of sheet if source is spreadsheet, otherwise None

Returns:

New or updated table

Return type:

DataFrame

Important

If a table has multiple sources, each subsequent source is merged with an outer join.

Queries#

Run queries on tables.

yarm.queries.df_query_postprocess(df, config, query_config)#

Process postprocess function for a particular query.

Parameters:
  • df (DataFrame) – Results of query

  • config (Nob) – Report configuration

  • query_config (NobView) – Configuration for this query

Returns:

Query data after applying postprocess function

Return type:

DataFrame

Important

A postprocess function is defined by the user in a separate Python file, which must be imported with the import: key. See yarm.validate.validate_key_import()

yarm.queries.df_query_replace(df, query_config)#

Process replace: keys for a particular query.

Parameters:
  • df (DataFrame) – Query results

  • query_config (NobView) – Configuration for this query

Returns:

Query data with replacements applied

Return type:

DataFrame

yarm.queries.query_options(df, config, query_config)#

Process options for a particular query.

Parameters:
  • df (DataFrame) – Data with initial query results

  • config (Nob) – Report configuration

  • query_config (NobView) – configuration for this query

Returns:

Query data with all options applied for this query.

Return type:

DataFrame

yarm.queries.run_queries(conn, config)#

Run all the queries.

Parameters:
  • config (Nob) – Report configuration

  • conn (Connection) – Temporary database in memory

yarm.queries.run_query(config, query, conn, sql, name)#

Run a query, apply the options, and return a DataFrame.

Parameters:
  • config (Nob) – Report configuration

  • query (NobView) – Configuration for this query

  • conn (Connection) – Temporary database in memory

  • sql (str) – SQL statement for this query

  • name (str) – Name for this query

Returns:

Initial query results

See also

yarm.queries.save_query_to_database(df, conn, name)#

Save the processed query to the database.

Parameters:
  • df (DataFrame) – Query data after all processing

  • conn (Connection) – Temporary database in memory

  • name (str) – Name for this query

Export#

Export data.

yarm.export.export_database(conn, config)#

Export database to sqlite3 database file.

Parameters:
  • conn (Connection) – Temporary database in memory

  • config (Nob) – Report configuration

yarm.export.export_database_tables(config, conn, ext, msg_table_exported_csv, msg_table_exported_sheet, export_basename, indent=1, verbose=2)#

Export all database tables as file(s).

Note

In this context, a database “table” may be either a table defined in tables_config or a query defined in queries:. Both are saved as type table in the database.

Parameters:
  • config (Nob) – Report configuration

  • conn (Connection) – Temporary database in memory (see note)

  • ext (str) – Extension for output file

  • msg_table_exported_csv (str) – Message after exporting table to CSV

  • msg_table_exported_sheet (str) – Message after exporting table to sheet

  • export_basename (str) – Basename for single output file

  • indent (int) – Number of indents before message

  • verbose (int) – Minimum verbosity required to show the message

yarm.export.export_df_csv(config, df, name, msg_export, indent=1, verbose=1)#

Export a single dataframe to CSV.

Parameters:
  • config (Nob) – Report configuration

  • df (DataFrame) – Data to export

  • name (str) – Name of dataframe, used as basename for output CSV

  • msg_export (str) – Confirmation message

  • indent (int) – Number of indents before message

  • verbose (int) – Minimum verbosity required to show the message

yarm.export.export_df_list_xlsx(config, df_list, export_basename, msg_export, indent=1, verbose=1)#

Export a list of dataframes to XLSX.

Parameters:
  • config (Nob) – Report configuration

  • df_list (List[Tuple[str, DataFrame]]) – List of tuples (see note in export_queries())

  • export_basename (str) – Basename for output spreadsheet

  • msg_export (str) – Confirmation message

  • indent (int) – Number of indents before message

  • verbose (int) – Minimum verbosity required to show the message

yarm.export.export_queries(config, df_list)#

Export all queries.

Parameters:
  • config (Nob) – Report configuration

  • df_list – List of tuples (see note)

Important

Each item in df_list should be a tuple of the form: (name, df)

Note

The default output format is XLSX, but this can be overriden with export_queries: csv under output:.

yarm.export.export_tables(config, conn, indent=1, verbose=2)#

Export the tables created from configuration.

Parameters:
  • config (Nob) – Report configuration

  • conn (Connection) – Temporary database in memory (see note)

  • indent (int) – Number of indents before message

  • verbose (int) – Minimum verbosity required to show the message

Important

This function expects the connected database to contain only the tables in the config, not the queries yet. Queries are exported later, in export_queries().

yarm.export.get_full_output_basename(config)#

Return full basename for output files, with path to output dir.

Parameters:

config (Nob) – Report configuration

Returns:

Basename for output files

Return type:

str

yarm.export.get_output_dir_path(config, filename)#

Get full path to filename in output dir.

Parameters:
  • config (Nob) – Report configuration

  • filename (str) – output filename

Returns:

Full path to output filename

Return type:

str

Helpers#

Helper functions.

yarm.helpers.abort(msg, error=None, file_path=None, data=None, ps=None, indent=0, suggest_verbose=1)#

Abort with error message and status 1.

Parameters:
  • msg (str) – Message

  • error (str | None) – Error message (see note in msg_options())

  • data (str | None) – Display this data after message, shown on same line

  • file_path (str | None) – File associated with this message, shown on separate line

  • ps (str | None) – Final postscript to add at end of message

  • indent (int) – Number of indents before message

  • suggest_verbose (int) – Verbosity level to pass to msg_suggest_verbose()

yarm.helpers.key_show_message(key_msg, config, verbose=1)#

For each key, if that key is in config, show the matching message.

Important

key_msg must be a list of tuples of the form: (key, message).

key_msg: list = [
    (s.KEY_INPUT__STRIP, s.MSG_STRIP_WHITESPACE),
    (s.KEY_INPUT__SLUGIFY_COLUMNS, s.MSG_SLUGIFY_COLUMNS),
    (s.KEY_INPUT__LOWERCASE_COLUMNS, s.MSG_LOWERCASE_COLUMNS),
    (s.KEY_INPUT__UPPERCASE_ROWS, s.MSG_UPPERCASE_ROWS),
]
key_show_message(key_msg, config, verbose=1)
Parameters:
  • key_msg (List[Tuple[str, str]]) – List of tuples of the form: (key, message)

  • config (Nob) – Report configuration

  • verbose (int) – Minimum verbosity level required to show this message

yarm.helpers.load_yaml_file(input_file, schema)#

Read YAML file into strictyaml, and validate against a schema.

Parameters:
  • input_file (str) – path to YAML file

  • schema (Any) – strictyaml schema

Returns:

Validated YAML

Return type:

YAML

yarm.helpers.msg(msg, verbose=0, indent=0)#

Show message.

Note

By default, this message will still show even if user did not use a -v flag.

Parameters:
  • msg (str) – Message to display

  • verbose (int) – Minimum verbosity required to show this message

  • indent (int) – Number of indents before message

yarm.helpers.msg_options(msg, prefix=None, prefix_color=None, error=None, file_path=None, data=None, ps=None, indent=0)#

Show a message with various options.

Important

This function is not normally used directly. Instead, use:

Parameters:
  • msg (str) – Message

  • prefix (str | None) – e.g. Error

  • prefix_color (str | None) – e.g. red

  • error (str | None) – Error message (see note)

  • data (str | None) – Display this data after message, shown on same line

  • file_path (str | None) – File associated with this message, shown on separate line

  • ps (str | None) – Final postscript to add at end of message

  • indent (int) – Number of indents before message

Note

This function expects error to be type str, but the error returned by an except: clause may need to be converted with str().

yarm.helpers.msg_suggest_verbose(suggest_verbose)#

Show message suggesting rerunning with a higher level of verbosity.

Note

This message will only be shown if the verbosity level is set lower than suggest_verbose.

Parameters:

suggest_verbose (int) – Verbosity level that message will suggest

yarm.helpers.msg_with_data(msg, data, verbose=1, indent=0)#

Show message with accompanying data.

Note

By default, the message will only be shown if the user used at least one -v flag.

Parameters:
  • msg (str) – Message to display

  • data (str) – Display this data after message, shown on same line

  • verbose (int) – Minimum verbosity required to show this message

  • indent (int) – Number of indents before message

yarm.helpers.overwrite_file(path, indent=1)#

Overwrite a file if it exists.

Note

Technically, this function only removes the file. The new file must be written separately.

Note

If a prompt question is shown, it is not indented.

Parameters:
  • path (str) – File to overwrite

  • indent (int) – Number of indents before message

Returns:

True if file existed and was removed, False otherwise

Return type:

bool

yarm.helpers.show_df(df, data, verbose=3)#

Display a dataframe.

Parameters:
  • df (DataFrame) – Data to display

  • data (str) – Display this data after message, shown on same line

  • verbose (int) – Minimum verbosity level required to show this message

yarm.helpers.success(msg, file_path=None, data=None, ps=None)#

Show success message.

Parameters:
  • msg (str) – Message

  • data (str | None) – Display this data after message, shown on same line

  • file_path (str | None) – File associated with this message, shown on separate line

  • ps (str | None) – Final postscript to add at end of message

Return type:

None

yarm.helpers.verbose_ge(verbose)#

Return True if verbosity >= verbose.

Parameters:

verbose (int) – verbosity level

Returns:

True if user used verbose or more -v flags, otherwise False

Return type:

bool

yarm.helpers.warn(msg, error=None, file_path=None, data=None, ps=None, indent=0)#

Show warning, but proceed.

Parameters:
  • msg (str) – Message

  • error (str | None) – Error message (see note in msg_options())

  • data (str | None) – Display this data after message, shown on same line

  • file_path (str | None) – File associated with this message, shown on separate line

  • ps (str | None) – Final postscript to add at end of message

  • indent (int) – Number of indents before message

Return type:

None

Testing#