Reference#

yarm#

yarm: Yet Another Report Maker.

This page contains documentation for almost every function in yarm.

Developers are most likely to find it useful, but users may find some critical necessary detail here as well.

No matter what, you’ll want to read the introductory documentation first, such as:

Main#

Command-line interface.

See Usage.

Settings#

Settings class.

class yarm.settings.Settings#

Define global settings.

When you need a setting in a function, make an instance of this class. These settings should not be changed elsewhere. Treat them as constants.

Validate#

Validate configuration file.

class yarm.validate.Slug#

Class to use slugify to make spelling consistent.

validate_scalar(chunk)#: Use slugify to make spelling consistent.

Note

We use _ rather than - for the separator.

The underscore seems more Pythonic.

Also, ansible config files seem to favor underscores. See: https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html.

class yarm.validate.StrNotEmpty#

A string that must not be empty.

static validate_scalar(chunk)#

Invalidate if string is empty.

Parameters:: chunk (YAMLChunk) – YAML to be validated
Returns:: Validated string
Return type:: str

yarm.validate.check_is_file(list_of_paths, key)#

For each item in a list, check that the value is a file.

Parameters:

list_of_paths (List[str | Dict]) – List of strings or dictionaries
key (str | None) – If dictionaries, this is the key for the path (e.g. path)

yarm.validate.check_key(key, config_yaml)#

Check whether a key exists in configuration YAML.

Parameters:

key (str) – Name of key
config_yaml (YAML) – Configuration to check

Returns:

name of key if present, None if not

Return type:

str | None

yarm.validate.get_default_config()#

Return default configuration.

Return type:: Nob

yarm.validate.msg_validating_key(key, suffix=None, verbose=1)#

Show a message that a key is being validated.

Parameters:

key (str) – Key being validated
suffix (str | None) – String to add after message
verbose (int) – Minimum verbosity level required to show this message

yarm.validate.revalidate_yaml(yaml, schema, config_path, msg_key=None, msg_suffix=None)#

Revalidate configuration YAML from config_path according to schema.

Parameters:

yaml (YAML) – YAML to revalidate
schema (Map | MapPattern | Seq) – Schema to revalidate this YAML against
config_path (str) – File in which this configuration YAML was found
msg_key (str | None) – Message that this key is validating
msg_suffix (str | None) – Message suffix

yarm.validate.validate_config(config_path)#

Validate config file before running report.

Parameters:: config_path (str) – Path to configuration file
Returns:: Validated configuration
Return type:: YAML

See also

validate_config_schema()
validate_config_edited()
validate_minimum_required_keys()

yarm.validate.validate_config_edited(config_yaml)#

Check whether config has been edited.

Parameters:: config_yaml (YAML) – Report configuration
Returns:: True if configuration has been edited, aborts otherwise.
Return type:: bool

yarm.validate.validate_config_schema(config_path)#

Return YAML for config file if it validates against top-level schema.

Parameters:: config_path (str) – Path to config file
Returns:: Configuration validated against top-level schema.
Return type:: YAML

See also

validate_key_tables_config()
validate_key_import()
validate_key_output()
validate_key_input()
validate_key_queries()

yarm.validate.validate_key_import(config_yaml, config_path)#

Validate config key: import.

import:
  - path: MODULE_A.py
  - path: MODULE_B.py

This key allows the user to import their own custom Python code. Any imported function can be applied to the results of a query using the postprocess key.

Warning

If more than one module in this list defines the same function, the later module in the list will silently override the previous definition.

This may be desired behavior, but only if you expect it.

Parameters:

config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file

See also

validate_key_queries()
yarm.query.df_query_postprocess()

yarm.validate.validate_key_input(config_yaml, config_path)#

Validate config key: input.

input:
  strip: false
  slugify_columns: false
  lowercase_columns: false
  uppercase_rows: false
  include_index: false

Parameters:

config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file

See also

yarm.tables.df_input_options()

yarm.validate.validate_key_output(config_yaml, config_path)#

Validate config key: output.

output:
  dir: output
  basename: BASENAME
  export_tables: csv
  export_queries: csv
  styles:
    column_width: 15

Parameters:

config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file

yarm.validate.validate_key_output_dir(config_yaml)#

Prepare output directory.

Parameters:: config_yaml (YAML) – Report configuration

yarm.validate.validate_key_queries(config_yaml, config_path)#

Validate config key: queries.

queries:
  - name: QUERY A
    sql: SELECT * FROM table_from_spreadsheet AS s;

  - name: QUERY B
    # For the SQL, you can use a multiline string for readability.
    sql: >
      SELECT
      *
      FROM
      table_from_spreadsheet AS s
      JOIN
      table_from_csv AS c
      ON
      s.id = c.id
      ;
    replace:
      COLUMN_A:
        MATCH A1: REPLACE A1
        # You may want to quote strings with spaces and punctuation.
        "MATCH A2": "REPLACE A2"
      COLUMN_B:
        MATCH B1: REPLACE B1
    postprocess: postprocess_function

Parameters:

config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file

Important

A postprocess function is defined by the user in a separate Python file, which must be imported with the import: key. See yarm.validate.validate_key_import()

See also

validate_key_import()
yarm.query.df_query_postprocess()

yarm.validate.validate_key_tables_config(config_yaml, config_path)#

Validate config key: tables_config.

tables_config:
  TABLE_NAME_A:
    - path: SOURCE_A1.csv
      include_index: false
    - path: SOURCE_A2.csv
  TABLE_NAME_B:
    - path: SOURCE_B.xlsx
      sheet: B.1
      pivot:
        index: ID_COLUMN
        columns: KEY_COLUMN
        values: VALUE_COLUMN
      datetime:
        # You can supply a custom format string.
        COLUMN_1: "%Y-%m"
        # To use default datetime format, omit format string.
        COLUMN_2:
        # Spaces or punctuation in the column name? Add quotes.
        "COLUMN 3":

Parameters:

config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file

yarm.validate.validate_minimum_required_keys(config_yaml)#

Check whether config has minimum required keys.

Parameters:: config_yaml (YAML) – Configuration to validate
Returns:: True if config has minimum required keys, abort otherwise.
Return type:: bool

Important

To modify which keys are required to run a report, update this function.

Tables#

Create tables from validated configuration.

yarm.tables.back_up_column(df, col)#

Save a backup copy of a column.

Parameters:

df (DataFrame) – data we are manipulating
col (str) – column to back up

Returns:

Data with copy of column col that has _raw appended to the column name. The original column can now safely be manipulated by further code.

yarm.tables.concat_dfs(conn, table_name, orig_df, new_df, input_file)#

Merge two dataframes with pd.concat into a single table.

Parameters:

conn – Temporary database in memory
table_name (str) – Table we are creating or appending to
orig_df (DataFrame) – Existing dataframe
new_df (DataFrame) – New dataframe we want to merge
input_file (str) – Actual file with source data

Returns:

Merged dataframe

Return type:

DataFrame

yarm.tables.create_table_df(conn, config, table_df, table_name, table, source, exists_mode)#

Create or append to a table from a configured source.

Note

Each table is defined by a list of one or more sources, all of which are merged into a single table.

This function is called separately for each source.

(See yarm.validate.validate_key_tables_config.)

Parameters:

conn (Connection) – Temporary database in memory
config (Nob) – Report configuration
table_df (None | DataFrame) – None if this table is new, otherwise the existing table
table_name (str) – Table we are creating or appending to
table (NobView) – Table configuration
source – Source configuration
exists_mode (str) – replace for a new table, otherwise append

Returns:

DataFrame of our new or updated table

Return type:

DataFrame

See also

input_source()
create_tables()
df_input_options()
df_tables_config_options()
yarm.validate.validate_key_tables_config()

yarm.tables.create_tables(conn, config)#

Read data from tables_config: into database and create tables.

Parameters:

conn (Connection) – Temporary database in memory
config (Nob) – Report configuration

See also

create_table_df()
yarm.export.export_tables()

yarm.tables.df_input_options(df, config)#

Process input data using the options in input: key.

These options are applied to every input file.

Important

If you modify these options, you must also modify yarm.validate.validate_key_input()

Note

For per-source options, see df_tables_config_options().

Parameters:

df (DataFrame) – Data we will manipulate
config (Nob) – Report configuration

Returns:

Data with options applied

Return type:

DataFrame

See also

yarm.validate.validate_key_input()
create_tables()
df_tables_config_options()

yarm.tables.df_tables_config_options(df, source_config, table_name, input_file)#

Process options for a particular source in a table.

Important

If you modify these source options, you must also modify yarm.validate.validate_key_tables_config().

That function also ensures that all necessary keys are present (e.g., that if a pivot stanza is present, it also has index, columns, and values).

Note

For input: options applied to all input files, see df_input_options().

Parameters:

df (DataFrame) – Table we will modify
source_config (NobView) – Configuration for this source
table_name (str) – Name of this table
input_file – Path to this source data

Returns:

Updated table, with options applied from this source

Return type:

DataFrame

See also

create_tables()
yarm.validate.validate_key_tables_config()
df_input_options().

yarm.tables.get_include_index_all(config)#

Set default input:include_index for all tables.

Parameters:: config (Nob) – Report configuration
Returns:: Default value for whether to include the index in each table
Return type:: bool

Note

This value can be overridden by each particular table.

See also

get_include_index_table()

yarm.tables.get_include_index_table(table, table_name, include_index_all)#

Set include_index for a particular table.

Parameters:

table (NobView) – Configuration for this table
table_name (str) – Name for this table
include_index_all (bool) – Default include_index value

Returns:

Whether to include the index in this table

Return type:

bool

See also

get_include_index_all()

yarm.tables.input_source(input_format, conn, config, source_config, table_name, table_df, input_file, input_sheet)#

Input a source into a table DataFrame.

Parameters:

input_format (str) – Format for this source (e.g. CSV)
conn – Temporary database in memory
config (Nob) – Report configuration
source_config (NobView) – Configuration for this source
table_name (str) – Table we are creating or appending to
table_df (DataFrame | None) – None if this table is new, otherwise the existing table
input_file (str) – Actual file with source data
input_sheet (int | str | None) – Name of sheet if source is spreadsheet, otherwise None

Returns:

New or updated table

Return type:

DataFrame

Important

If a table has multiple sources, each subsequent source is merged with an outer join.

See also

create_table_df()

Queries#

Run queries on tables.

yarm.queries.df_query_postprocess(df, config, query_config)#

Process postprocess function for a particular query.

Parameters:

df (DataFrame) – Results of query
config (Nob) – Report configuration
query_config (NobView) – Configuration for this query

Returns:

Query data after applying postprocess function

Return type:

DataFrame

Important

A postprocess function is defined by the user in a separate Python file, which must be imported with the import: key. See yarm.validate.validate_key_import()

yarm.queries.df_query_replace(df, query_config)#

Process replace: keys for a particular query.

Parameters:

df (DataFrame) – Query results
query_config (NobView) – Configuration for this query

Returns:

Query data with replacements applied

Return type:

DataFrame

yarm.queries.query_options(df, config, query_config)#

Process options for a particular query.

Parameters:

df (DataFrame) – Data with initial query results
config (Nob) – Report configuration
query_config (NobView) – configuration for this query

Returns:

Query data with all options applied for this query.

Return type:

DataFrame

See also

df_query_replace()
df_query_postprocess()
yarm.validate.validate_key_queries()

yarm.queries.run_queries(conn, config)#

Run all the queries.

Parameters:

config (Nob) – Report configuration
conn (Connection) – Temporary database in memory

See also

run_query()
save_query_to_database()
yarm.export.export_queries()

yarm.queries.run_query(config, query, conn, sql, name)#

Run a query, apply the options, and return a DataFrame.

Parameters:

config (Nob) – Report configuration
query (NobView) – Configuration for this query
conn (Connection) – Temporary database in memory
sql (str) – SQL statement for this query
name (str) – Name for this query

Returns:

Initial query results

See also

query_options()

yarm.queries.save_query_to_database(df, conn, name)#

Save the processed query to the database.

Parameters:

df (DataFrame) – Query data after all processing
conn (Connection) – Temporary database in memory
name (str) – Name for this query

Export#

Export data.

yarm.export.export_database(conn, config)#

Export database to sqlite3 database file.

Parameters:

conn (Connection) – Temporary database in memory
config (Nob) – Report configuration

yarm.export.export_database_tables(config, conn, ext, msg_table_exported_csv, msg_table_exported_sheet, export_basename, indent=1, verbose=2)#

Export all database tables as file(s).

Note

In this context, a database “table” may be either a table defined in tables_config or a query defined in queries:. Both are saved as type table in the database.

Parameters:

config (Nob) – Report configuration
conn (Connection) – Temporary database in memory (see note)
ext (str) – Extension for output file
msg_table_exported_csv (str) – Message after exporting table to CSV
msg_table_exported_sheet (str) – Message after exporting table to sheet
export_basename (str) – Basename for single output file
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message

yarm.export.export_df_csv(config, df, name, msg_export, indent=1, verbose=1)#

Export a single dataframe to CSV.

Parameters:

config (Nob) – Report configuration
df (DataFrame) – Data to export
name (str) – Name of dataframe, used as basename for output CSV
msg_export (str) – Confirmation message
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message

yarm.export.export_df_list_xlsx(config, df_list, export_basename, msg_export, indent=1, verbose=1)#

Export a list of dataframes to XLSX.

Parameters:

config (Nob) – Report configuration
df_list (List[Tuple[str, DataFrame]]) – List of tuples (see note in export_queries())
export_basename (str) – Basename for output spreadsheet
msg_export (str) – Confirmation message
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message

yarm.export.export_queries(config, df_list)#

Export all queries.

Parameters:

config (Nob) – Report configuration
df_list – List of tuples (see note)

Important

Each item in df_list should be a tuple of the form: (name, df)

Note

The default output format is XLSX, but this can be overriden with export_queries: csv under output:.

See also

export_df_csv()
export_df_list_xlsx()

yarm.export.export_tables(config, conn, indent=1, verbose=2)#

Export the tables created from configuration.

Parameters:

config (Nob) – Report configuration
conn (Connection) – Temporary database in memory (see note)
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message

Important

This function expects the connected database to contain only the tables in the config, not the queries yet. Queries are exported later, in export_queries().

yarm.export.get_full_output_basename(config)#

Return full basename for output files, with path to output dir.

Parameters:: config (Nob) – Report configuration
Returns:: Basename for output files
Return type:: str

yarm.export.get_output_dir_path(config, filename)#

Get full path to filename in output dir.

Parameters:

config (Nob) – Report configuration
filename (str) – output filename

Returns:

Full path to output filename

Return type:

str

Helpers#

Helper functions.

yarm.helpers.abort(msg, error=None, file_path=None, data=None, ps=None, indent=0, suggest_verbose=1)#

Abort with error message and status 1.

Parameters:

msg (str) – Message
error (str | None) – Error message (see note in msg_options())
data (str | None) – Display this data after message, shown on same line
file_path (str | None) – File associated with this message, shown on separate line
ps (str | None) – Final postscript to add at end of message
indent (int) – Number of indents before message
suggest_verbose (int) – Verbosity level to pass to msg_suggest_verbose()

yarm.helpers.key_show_message(key_msg, config, verbose=1)#

For each key, if that key is in config, show the matching message.

Important

key_msg must be a list of tuples of the form: (key, message).

key_msg: list = [
    (s.KEY_INPUT__STRIP, s.MSG_STRIP_WHITESPACE),
    (s.KEY_INPUT__SLUGIFY_COLUMNS, s.MSG_SLUGIFY_COLUMNS),
    (s.KEY_INPUT__LOWERCASE_COLUMNS, s.MSG_LOWERCASE_COLUMNS),
    (s.KEY_INPUT__UPPERCASE_ROWS, s.MSG_UPPERCASE_ROWS),
]
key_show_message(key_msg, config, verbose=1)

Parameters:

key_msg (List[Tuple[str, str]]) – List of tuples of the form: (key, message)
config (Nob) – Report configuration
verbose (int) – Minimum verbosity level required to show this message

yarm.helpers.load_yaml_file(input_file, schema)#

Read YAML file into strictyaml, and validate against a schema.

Parameters:

input_file (str) – path to YAML file
schema (Any) – strictyaml schema

Returns:

Validated YAML

Return type:

YAML

yarm.helpers.msg(msg, verbose=0, indent=0)#

Show message.

Note

By default, this message will still show even if user did not use a -v flag.

Parameters:

msg (str) – Message to display
verbose (int) – Minimum verbosity required to show this message
indent (int) – Number of indents before message

yarm.helpers.msg_options(msg, prefix=None, prefix_color=None, error=None, file_path=None, data=None, ps=None, indent=0)#

Show a message with various options.

Important

This function is not normally used directly. Instead, use:

msg()
msg_with_data()
abort()
warn()
success()

Parameters:

msg (str) – Message
prefix (str | None) – e.g. Error
prefix_color (str | None) – e.g. red
error (str | None) – Error message (see note)
data (str | None) – Display this data after message, shown on same line
file_path (str | None) – File associated with this message, shown on separate line
ps (str | None) – Final postscript to add at end of message
indent (int) – Number of indents before message

Note

This function expects error to be type str, but the error returned by an except: clause may need to be converted with str().

yarm.helpers.msg_suggest_verbose(suggest_verbose)#

Show message suggesting rerunning with a higher level of verbosity.

Note

This message will only be shown if the verbosity level is set lower than suggest_verbose.

Parameters:: suggest_verbose (int) – Verbosity level that message will suggest

yarm.helpers.msg_with_data(msg, data, verbose=1, indent=0)#

Show message with accompanying data.

Note

By default, the message will only be shown if the user used at least one -v flag.

Parameters:

msg (str) – Message to display
data (str) – Display this data after message, shown on same line
verbose (int) – Minimum verbosity required to show this message
indent (int) – Number of indents before message

yarm.helpers.overwrite_file(path, indent=1)#

Overwrite a file if it exists.

Note

Technically, this function only removes the file. The new file must be written separately.

Note

If a prompt question is shown, it is not indented.

Parameters:

path (str) – File to overwrite
indent (int) – Number of indents before message

Returns:

True if file existed and was removed, False otherwise

Return type:

bool

yarm.helpers.show_df(df, data, verbose=3)#

Display a dataframe.

Parameters:

df (DataFrame) – Data to display
data (str) – Display this data after message, shown on same line
verbose (int) – Minimum verbosity level required to show this message

yarm.helpers.success(msg, file_path=None, data=None, ps=None)#

Show success message.

Parameters:

msg (str) – Message
data (str | None) – Display this data after message, shown on same line
file_path (str | None) – File associated with this message, shown on separate line
ps (str | None) – Final postscript to add at end of message

Return type:

None

yarm.helpers.verbose_ge(verbose)#

Return True if verbosity >= verbose.

Parameters:: verbose (int) – verbosity level
Returns:: True if user used verbose or more -v flags, otherwise False
Return type:: bool

yarm.helpers.warn(msg, error=None, file_path=None, data=None, ps=None, indent=0)#

Show warning, but proceed.

Parameters:

msg (str) – Message
error (str | None) – Error message (see note in msg_options())
data (str | None) – Display this data after message, shown on same line
file_path (str | None) – File associated with this message, shown on separate line
ps (str | None) – Final postscript to add at end of message
indent (int) – Number of indents before message

Return type:

None

Reference#

yarm#

Main#

Settings#

Validate#

Tables#

Queries#

Export#

Helpers#

Testing#