Reference#
yarm#
yarm: Yet Another Report Maker.
This page contains documentation for almost every function in yarm.
Developers are most likely to find it useful, but users may find some critical necessary detail here as well.
No matter what, you’ll want to read the introductory documentation first, such as:
Main#
Command-line interface.
See Usage.
Settings#
Settings class.
- class yarm.settings.Settings#
Define global settings.
When you need a setting in a function, make an instance of this class. These settings should not be changed elsewhere. Treat them as constants.
Validate#
Validate configuration file.
- class yarm.validate.Slug#
Class to use slugify to make spelling consistent.
- validate_scalar(chunk)#
Use slugify to make spelling consistent.
Note
We use _ rather than - for the separator.
The underscore seems more Pythonic.
Also, ansible config files seem to favor underscores. See: https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html.
- class yarm.validate.StrNotEmpty#
A string that must not be empty.
- static validate_scalar(chunk)#
Invalidate if string is empty.
- Parameters:
chunk (YAMLChunk) – YAML to be validated
- Returns:
Validated string
- Return type:
str
- yarm.validate.check_is_file(list_of_paths, key)#
For each item in a list, check that the value is a file.
- Parameters:
list_of_paths (List[str | Dict]) – List of strings or dictionaries
key (str | None) – If dictionaries, this is the key for the path (e.g.
path)
- yarm.validate.check_key(key, config_yaml)#
Check whether a key exists in configuration YAML.
- Parameters:
key (str) – Name of key
config_yaml (YAML) – Configuration to check
- Returns:
name of key if present, None if not
- Return type:
str | None
- yarm.validate.get_default_config()#
Return default configuration.
- Return type:
Nob
- yarm.validate.msg_validating_key(key, suffix=None, verbose=1)#
Show a message that a key is being validated.
- Parameters:
key (str) – Key being validated
suffix (str | None) – String to add after message
verbose (int) – Minimum verbosity level required to show this message
- yarm.validate.revalidate_yaml(yaml, schema, config_path, msg_key=None, msg_suffix=None)#
Revalidate configuration YAML from config_path according to schema.
- Parameters:
yaml (YAML) – YAML to revalidate
schema (Map | MapPattern | Seq) – Schema to revalidate this YAML against
config_path (str) – File in which this configuration YAML was found
msg_key (str | None) – Message that this key is validating
msg_suffix (str | None) – Message suffix
- yarm.validate.validate_config(config_path)#
Validate config file before running report.
- Parameters:
config_path (str) – Path to configuration file
- Returns:
Validated configuration
- Return type:
YAML
- yarm.validate.validate_config_edited(config_yaml)#
Check whether config has been edited.
- Parameters:
config_yaml (YAML) – Report configuration
- Returns:
True if configuration has been edited, aborts otherwise.
- Return type:
bool
- yarm.validate.validate_config_schema(config_path)#
Return YAML for config file if it validates against top-level schema.
- Parameters:
config_path (str) – Path to config file
- Returns:
Configuration validated against top-level schema.
- Return type:
YAML
- yarm.validate.validate_key_import(config_yaml, config_path)#
Validate config key: import.
import: - path: MODULE_A.py - path: MODULE_B.py
This key allows the user to import their own custom Python code. Any imported function can be applied to the results of a query using the
postprocesskey.Warning
If more than one module in this list defines the same function, the later module in the list will silently override the previous definition.
This may be desired behavior, but only if you expect it.
- Parameters:
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
See also
yarm.query.df_query_postprocess()
- yarm.validate.validate_key_input(config_yaml, config_path)#
Validate config key: input.
input: strip: false slugify_columns: false lowercase_columns: false uppercase_rows: false include_index: false
- Parameters:
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
See also
- yarm.validate.validate_key_output(config_yaml, config_path)#
Validate config key: output.
output: dir: output basename: BASENAME export_tables: csv export_queries: csv styles: column_width: 15
- Parameters:
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
- yarm.validate.validate_key_output_dir(config_yaml)#
Prepare output directory.
- Parameters:
config_yaml (YAML) – Report configuration
- yarm.validate.validate_key_queries(config_yaml, config_path)#
Validate config key: queries.
queries: - name: QUERY A sql: SELECT * FROM table_from_spreadsheet AS s; - name: QUERY B # For the SQL, you can use a multiline string for readability. sql: > SELECT * FROM table_from_spreadsheet AS s JOIN table_from_csv AS c ON s.id = c.id ; replace: COLUMN_A: MATCH A1: REPLACE A1 # You may want to quote strings with spaces and punctuation. "MATCH A2": "REPLACE A2" COLUMN_B: MATCH B1: REPLACE B1 postprocess: postprocess_function
- Parameters:
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
Important
A postprocess function is defined by the user in a separate Python file, which must be imported with the
import:key. Seeyarm.validate.validate_key_import()See also
yarm.query.df_query_postprocess()
- yarm.validate.validate_key_tables_config(config_yaml, config_path)#
Validate config key: tables_config.
tables_config: TABLE_NAME_A: - path: SOURCE_A1.csv include_index: false - path: SOURCE_A2.csv TABLE_NAME_B: - path: SOURCE_B.xlsx sheet: B.1 pivot: index: ID_COLUMN columns: KEY_COLUMN values: VALUE_COLUMN datetime: # You can supply a custom format string. COLUMN_1: "%Y-%m" # To use default datetime format, omit format string. COLUMN_2: # Spaces or punctuation in the column name? Add quotes. "COLUMN 3":
- Parameters:
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
- yarm.validate.validate_minimum_required_keys(config_yaml)#
Check whether config has minimum required keys.
- Parameters:
config_yaml (YAML) – Configuration to validate
- Returns:
True if config has minimum required keys, abort otherwise.
- Return type:
bool
Important
To modify which keys are required to run a report, update this function.
Tables#
Create tables from validated configuration.
- yarm.tables.back_up_column(df, col)#
Save a backup copy of a column.
- Parameters:
df (DataFrame) – data we are manipulating
col (str) – column to back up
- Returns:
Data with copy of column
colthat has_rawappended to the column name. The original column can now safely be manipulated by further code.
- yarm.tables.concat_dfs(conn, table_name, orig_df, new_df, input_file)#
Merge two dataframes with pd.concat into a single table.
- Parameters:
conn – Temporary database in memory
table_name (str) – Table we are creating or appending to
orig_df (DataFrame) – Existing dataframe
new_df (DataFrame) – New dataframe we want to merge
input_file (str) – Actual file with source data
- Returns:
Merged dataframe
- Return type:
DataFrame
- yarm.tables.create_table_df(conn, config, table_df, table_name, table, source, exists_mode)#
Create or append to a table from a configured source.
Note
Each table is defined by a list of one or more sources, all of which are merged into a single table.
This function is called separately for each source.
- Parameters:
conn (Connection) – Temporary database in memory
config (Nob) – Report configuration
table_df (None | DataFrame) –
Noneif this table is new, otherwise the existing tabletable_name (str) – Table we are creating or appending to
table (NobView) – Table configuration
source – Source configuration
exists_mode (str) –
replacefor a new table, otherwiseappend
- Returns:
DataFrame of our new or updated table
- Return type:
DataFrame
- yarm.tables.create_tables(conn, config)#
Read data from
tables_config:into database and create tables.- Parameters:
conn (Connection) – Temporary database in memory
config (Nob) – Report configuration
- yarm.tables.df_input_options(df, config)#
Process input data using the options in
input:key.These options are applied to every input file.
Important
If you modify these options, you must also modify
yarm.validate.validate_key_input()Note
For per-source options, see
df_tables_config_options().- Parameters:
df (DataFrame) – Data we will manipulate
config (Nob) – Report configuration
- Returns:
Data with options applied
- Return type:
DataFrame
- yarm.tables.df_tables_config_options(df, source_config, table_name, input_file)#
Process options for a particular source in a table.
Important
If you modify these source options, you must also modify
yarm.validate.validate_key_tables_config().That function also ensures that all necessary keys are present (e.g., that if a pivot stanza is present, it also has index, columns, and values).
Note
For
input:options applied to all input files, seedf_input_options().- Parameters:
df (DataFrame) – Table we will modify
source_config (NobView) – Configuration for this source
table_name (str) – Name of this table
input_file – Path to this source data
- Returns:
Updated table, with options applied from this source
- Return type:
DataFrame
- yarm.tables.get_include_index_all(config)#
Set default
input:include_indexfor all tables.- Parameters:
config (Nob) – Report configuration
- Returns:
Default value for whether to include the index in each table
- Return type:
bool
Note
This value can be overridden by each particular table.
See also
- yarm.tables.get_include_index_table(table, table_name, include_index_all)#
Set
include_indexfor a particular table.- Parameters:
table (NobView) – Configuration for this table
table_name (str) – Name for this table
include_index_all (bool) – Default
include_indexvalue
- Returns:
Whether to include the index in this table
- Return type:
bool
See also
- yarm.tables.input_source(input_format, conn, config, source_config, table_name, table_df, input_file, input_sheet)#
Input a source into a table DataFrame.
- Parameters:
input_format (str) – Format for this source (e.g.
CSV)conn – Temporary database in memory
config (Nob) – Report configuration
source_config (NobView) – Configuration for this source
table_name (str) – Table we are creating or appending to
table_df (DataFrame | None) –
Noneif this table is new, otherwise the existing tableinput_file (str) – Actual file with source data
input_sheet (int | str | None) – Name of sheet if source is spreadsheet, otherwise
None
- Returns:
New or updated table
- Return type:
DataFrame
Important
If a table has multiple sources, each subsequent source is merged with an outer join.
See also
Queries#
Run queries on tables.
- yarm.queries.df_query_postprocess(df, config, query_config)#
Process postprocess function for a particular query.
- Parameters:
df (DataFrame) – Results of query
config (Nob) – Report configuration
query_config (NobView) – Configuration for this query
- Returns:
Query data after applying postprocess function
- Return type:
DataFrame
Important
A postprocess function is defined by the user in a separate Python file, which must be imported with the
import:key. Seeyarm.validate.validate_key_import()
- yarm.queries.df_query_replace(df, query_config)#
Process
replace:keys for a particular query.- Parameters:
df (DataFrame) – Query results
query_config (NobView) – Configuration for this query
- Returns:
Query data with replacements applied
- Return type:
DataFrame
- yarm.queries.query_options(df, config, query_config)#
Process options for a particular query.
- Parameters:
df (DataFrame) – Data with initial query results
config (Nob) – Report configuration
query_config (NobView) – configuration for this query
- Returns:
Query data with all options applied for this query.
- Return type:
DataFrame
- yarm.queries.run_queries(conn, config)#
Run all the queries.
- Parameters:
config (Nob) – Report configuration
conn (Connection) – Temporary database in memory
- yarm.queries.run_query(config, query, conn, sql, name)#
Run a query, apply the options, and return a DataFrame.
- Parameters:
config (Nob) – Report configuration
query (NobView) – Configuration for this query
conn (Connection) – Temporary database in memory
sql (str) – SQL statement for this query
name (str) – Name for this query
- Returns:
Initial query results
See also
- yarm.queries.save_query_to_database(df, conn, name)#
Save the processed query to the database.
- Parameters:
df (DataFrame) – Query data after all processing
conn (Connection) – Temporary database in memory
name (str) – Name for this query
Export#
Export data.
- yarm.export.export_database(conn, config)#
Export database to sqlite3 database file.
- Parameters:
conn (Connection) – Temporary database in memory
config (Nob) – Report configuration
- yarm.export.export_database_tables(config, conn, ext, msg_table_exported_csv, msg_table_exported_sheet, export_basename, indent=1, verbose=2)#
Export all database tables as file(s).
Note
In this context, a database “table” may be either a table defined in
tables_configor a query defined inqueries:. Both are saved as typetablein the database.- Parameters:
config (Nob) – Report configuration
conn (Connection) – Temporary database in memory (see note)
ext (str) – Extension for output file
msg_table_exported_csv (str) – Message after exporting table to CSV
msg_table_exported_sheet (str) – Message after exporting table to sheet
export_basename (str) – Basename for single output file
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message
- yarm.export.export_df_csv(config, df, name, msg_export, indent=1, verbose=1)#
Export a single dataframe to CSV.
- Parameters:
config (Nob) – Report configuration
df (DataFrame) – Data to export
name (str) – Name of dataframe, used as basename for output CSV
msg_export (str) – Confirmation message
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message
- yarm.export.export_df_list_xlsx(config, df_list, export_basename, msg_export, indent=1, verbose=1)#
Export a list of dataframes to XLSX.
- Parameters:
config (Nob) – Report configuration
df_list (List[Tuple[str, DataFrame]]) – List of tuples (see note in
export_queries())export_basename (str) – Basename for output spreadsheet
msg_export (str) – Confirmation message
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message
- yarm.export.export_queries(config, df_list)#
Export all queries.
- Parameters:
config (Nob) – Report configuration
df_list – List of tuples (see note)
Important
Each item in
df_listshould be a tuple of the form:(name, df)Note
The default output format is
XLSX, but this can be overriden withexport_queries: csvunderoutput:.See also
- yarm.export.export_tables(config, conn, indent=1, verbose=2)#
Export the tables created from configuration.
- Parameters:
config (Nob) – Report configuration
conn (Connection) – Temporary database in memory (see note)
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message
Important
This function expects the connected database to contain only the tables in the config, not the queries yet. Queries are exported later, in
export_queries().
- yarm.export.get_full_output_basename(config)#
Return full basename for output files, with path to output dir.
- Parameters:
config (Nob) – Report configuration
- Returns:
Basename for output files
- Return type:
str
- yarm.export.get_output_dir_path(config, filename)#
Get full path to filename in output dir.
- Parameters:
config (Nob) – Report configuration
filename (str) – output filename
- Returns:
Full path to output filename
- Return type:
str
Helpers#
Helper functions.
- yarm.helpers.abort(msg, error=None, file_path=None, data=None, ps=None, indent=0, suggest_verbose=1)#
Abort with error message and status 1.
- Parameters:
msg (str) – Message
error (str | None) – Error message (see note in
msg_options())data (str | None) – Display this data after message, shown on same line
file_path (str | None) – File associated with this message, shown on separate line
ps (str | None) – Final postscript to add at end of message
indent (int) – Number of indents before message
suggest_verbose (int) – Verbosity level to pass to
msg_suggest_verbose()
- yarm.helpers.key_show_message(key_msg, config, verbose=1)#
For each key, if that key is in
config, show the matching message.Important
key_msgmust be a list of tuples of the form: (key, message).key_msg: list = [ (s.KEY_INPUT__STRIP, s.MSG_STRIP_WHITESPACE), (s.KEY_INPUT__SLUGIFY_COLUMNS, s.MSG_SLUGIFY_COLUMNS), (s.KEY_INPUT__LOWERCASE_COLUMNS, s.MSG_LOWERCASE_COLUMNS), (s.KEY_INPUT__UPPERCASE_ROWS, s.MSG_UPPERCASE_ROWS), ] key_show_message(key_msg, config, verbose=1)
- Parameters:
key_msg (List[Tuple[str, str]]) – List of tuples of the form:
(key, message)config (Nob) – Report configuration
verbose (int) – Minimum verbosity level required to show this message
- yarm.helpers.load_yaml_file(input_file, schema)#
Read YAML file into
strictyaml, and validate against a schema.- Parameters:
input_file (str) – path to YAML file
schema (Any) –
strictyamlschema
- Returns:
Validated YAML
- Return type:
YAML
- yarm.helpers.msg(msg, verbose=0, indent=0)#
Show message.
Note
By default, this message will still show even if user did not use a
-vflag.- Parameters:
msg (str) – Message to display
verbose (int) – Minimum verbosity required to show this message
indent (int) – Number of indents before message
- yarm.helpers.msg_options(msg, prefix=None, prefix_color=None, error=None, file_path=None, data=None, ps=None, indent=0)#
Show a message with various options.
Important
This function is not normally used directly. Instead, use:
- Parameters:
msg (str) – Message
prefix (str | None) – e.g.
Errorprefix_color (str | None) – e.g.
rederror (str | None) – Error message (see note)
data (str | None) – Display this data after message, shown on same line
file_path (str | None) – File associated with this message, shown on separate line
ps (str | None) – Final postscript to add at end of message
indent (int) – Number of indents before message
Note
This function expects
errorto be typestr, but the error returned by anexcept:clause may need to be converted withstr().
- yarm.helpers.msg_suggest_verbose(suggest_verbose)#
Show message suggesting rerunning with a higher level of verbosity.
Note
This message will only be shown if the verbosity level is set lower than
suggest_verbose.- Parameters:
suggest_verbose (int) – Verbosity level that message will suggest
- yarm.helpers.msg_with_data(msg, data, verbose=1, indent=0)#
Show message with accompanying data.
Note
By default, the message will only be shown if the user used at least one
-vflag.- Parameters:
msg (str) – Message to display
data (str) – Display this data after message, shown on same line
verbose (int) – Minimum verbosity required to show this message
indent (int) – Number of indents before message
- yarm.helpers.overwrite_file(path, indent=1)#
Overwrite a file if it exists.
Note
Technically, this function only removes the file. The new file must be written separately.
Note
If a prompt question is shown, it is not indented.
- Parameters:
path (str) – File to overwrite
indent (int) – Number of indents before message
- Returns:
True if file existed and was removed, False otherwise
- Return type:
bool
- yarm.helpers.show_df(df, data, verbose=3)#
Display a dataframe.
- Parameters:
df (DataFrame) – Data to display
data (str) – Display this data after message, shown on same line
verbose (int) – Minimum verbosity level required to show this message
- yarm.helpers.success(msg, file_path=None, data=None, ps=None)#
Show success message.
- Parameters:
msg (str) – Message
data (str | None) – Display this data after message, shown on same line
file_path (str | None) – File associated with this message, shown on separate line
ps (str | None) – Final postscript to add at end of message
- Return type:
None
- yarm.helpers.verbose_ge(verbose)#
Return
Trueif verbosity >=verbose.- Parameters:
verbose (int) – verbosity level
- Returns:
True if user used verbose or more -v flags, otherwise False
- Return type:
bool
- yarm.helpers.warn(msg, error=None, file_path=None, data=None, ps=None, indent=0)#
Show warning, but proceed.
- Parameters:
msg (str) – Message
error (str | None) – Error message (see note in
msg_options())data (str | None) – Display this data after message, shown on same line
file_path (str | None) – File associated with this message, shown on separate line
ps (str | None) – Final postscript to add at end of message
indent (int) – Number of indents before message
- Return type:
None