Reference#
yarm#
yarm: Yet Another Report Maker.
This page contains documentation for almost every function in yarm.
Developers are most likely to find it useful, but users may find some critical necessary detail here as well.
No matter what, you’ll want to read the introductory documentation first, such as:
Main#
Command-line interface.
See Usage.
Settings#
Settings class.
- class yarm.settings.Settings#
Define global settings.
When you need a setting in a function, make an instance of this class. These settings should not be changed elsewhere. Treat them as constants.
Validate#
Validate configuration file.
- class yarm.validate.Slug#
Class to use slugify to make spelling consistent.
- validate_scalar(chunk)#
Use slugify to make spelling consistent.
Note
We use _ rather than - for the separator.
The underscore seems more Pythonic.
Also, ansible config files seem to favor underscores. See: https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html.
- class yarm.validate.StrNotEmpty#
A string that must not be empty.
- static validate_scalar(chunk)#
Invalidate if string is empty.
- Parameters
chunk (YAMLChunk) – YAML to be validated
- Returns
Validated string
- Return type
str
- yarm.validate.check_is_file(list_of_paths, key)#
For each item in a list, check that the value is a file.
- Parameters
list_of_paths (List[Union[str, Dict]]) – List of strings or dictionaries
key (Optional[str]) – If dictionaries, this is the key for the path (e.g.
path)
- yarm.validate.check_key(key, config_yaml)#
Check whether a key exists in configuration YAML.
- Parameters
key (str) – Name of key
config_yaml (YAML) – Configuration to check
- Returns
name of key if present, None if not
- Return type
Optional[str]
- yarm.validate.get_default_config()#
Return default configuration.
- Return type
Nob
- yarm.validate.msg_validating_key(key, suffix=None, verbose=1)#
Show a message that a key is being validated.
- Parameters
key (str) – Key being validated
suffix (Optional[str]) – String to add after message
verbose (int) – Minimum verbosity level required to show this message
- yarm.validate.revalidate_yaml(yaml, schema, config_path, msg_key=None, msg_suffix=None)#
Revalidate configuration YAML from config_path according to schema.
- Parameters
yaml (YAML) – YAML to revalidate
schema (Union[Map, MapPattern, Seq]) – Schema to revalidate this YAML against
config_path (str) – File in which this configuration YAML was found
msg_key (Optional[str]) – Message that this key is validating
msg_suffix (Optional[str]) – Message suffix
- yarm.validate.validate_config(config_path)#
Validate config file before running report.
- Parameters
config_path (str) – Path to configuration file
- Returns
Validated configuration
- Return type
YAML
- yarm.validate.validate_config_edited(config_yaml)#
Check whether config has been edited.
- Parameters
config_yaml (YAML) – Report configuration
- Returns
True if configuration has been edited, aborts otherwise.
- Return type
bool
- yarm.validate.validate_config_schema(config_path)#
Return YAML for config file if it validates against top-level schema.
- Parameters
config_path (str) – Path to config file
- Returns
Configuration validated against top-level schema.
- Return type
YAML
- yarm.validate.validate_key_import(config_yaml, config_path)#
Validate config key: import.
import: - path: MODULE_A.py - path: MODULE_B.py
This key allows the user to import their own custom Python code. Any imported function can be applied to the results of a query using the
postprocesskey.Warning
If more than one module in this list defines the same function, the later module in the list will silently override the previous definition.
This may be desired behavior, but only if you expect it.
- Parameters
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
See also
yarm.query.df_query_postprocess()
- yarm.validate.validate_key_input(config_yaml, config_path)#
Validate config key: input.
input: strip: false slugify_columns: false lowercase_columns: false uppercase_rows: false include_index: false
- Parameters
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
See also
- yarm.validate.validate_key_output(config_yaml, config_path)#
Validate config key: output.
output: dir: output basename: BASENAME export_tables: csv export_queries: csv styles: column_width: 15
- Parameters
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
- yarm.validate.validate_key_output_dir(config_yaml)#
Prepare output directory.
- Parameters
config_yaml (YAML) – Report configuration
- yarm.validate.validate_key_queries(config_yaml, config_path)#
Validate config key: queries.
queries: - name: QUERY A sql: SELECT * FROM table_from_spreadsheet AS s; - name: QUERY B # For the SQL, you can use a multiline string for readability. sql: > SELECT * FROM table_from_spreadsheet AS s JOIN table_from_csv AS c ON s.id = c.id ; replace: COLUMN_A: MATCH A1: REPLACE A1 # You may want to quote strings with spaces and punctuation. "MATCH A2": "REPLACE A2" COLUMN_B: MATCH B1: REPLACE B1 postprocess: postprocess_function
- Parameters
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
Important
A postprocess function is defined by the user in a separate Python file, which must be imported with the
import:key. Seeyarm.validate.validate_key_import()See also
yarm.query.df_query_postprocess()
- yarm.validate.validate_key_tables_config(config_yaml, config_path)#
Validate config key: tables_config.
tables_config: TABLE_NAME_A: - path: SOURCE_A1.csv include_index: false - path: SOURCE_A2.csv TABLE_NAME_B: - path: SOURCE_B.xlsx sheet: B.1 pivot: index: ID_COLUMN columns: KEY_COLUMN values: VALUE_COLUMN datetime: # You can supply a custom format string. COLUMN_1: "%Y-%m" # To use default datetime format, omit format string. COLUMN_2: # Spaces or punctuation in the column name? Add quotes. "COLUMN 3":
- Parameters
config_yaml (YAML) – Configuration to validate
config_path (str) – Configuration file
- yarm.validate.validate_minimum_required_keys(config_yaml)#
Check whether config has minimum required keys.
- Parameters
config_yaml (YAML) – Configuration to validate
- Returns
True if config has minimum required keys, abort otherwise.
- Return type
bool
Important
To modify which keys are required to run a report, update this function.
Tables#
Create tables from validated configuration.
- yarm.tables.back_up_column(df, col)#
Save a backup copy of a column.
- Parameters
df (DataFrame) – data we are manipulating
col (str) – column to back up
- Returns
Data with copy of column
colthat has_rawappended to the column name. The original column can now safely be manipulated by further code.
- yarm.tables.concat_dfs(conn, table_name, orig_df, new_df, input_file)#
Merge two dataframes with pd.concat into a single table.
- Parameters
conn – Temporary database in memory
table_name (str) – Table we are creating or appending to
orig_df (DataFrame) – Existing dataframe
new_df (DataFrame) – New dataframe we want to merge
input_file (str) – Actual file with source data
- Returns
Merged dataframe
- Return type
DataFrame
- yarm.tables.create_table_df(conn, config, table_df, table_name, table, source, exists_mode)#
Create or append to a table from a configured source.
Note
Each table is defined by a list of one or more sources, all of which are merged into a single table.
This function is called separately for each source.
- Parameters
conn (Connection) – Temporary database in memory
config (Nob) – Report configuration
table_df (Union[None, DataFrame]) –
Noneif this table is new, otherwise the existing tabletable_name (str) – Table we are creating or appending to
table (NobView) – Table configuration
source – Source configuration
exists_mode (str) –
replacefor a new table, otherwiseappend
- Returns
DataFrame of our new or updated table
- Return type
DataFrame
- yarm.tables.create_tables(conn, config)#
Read data from
tables_config:into database and create tables.- Parameters
conn (Connection) – Temporary database in memory
config (Nob) – Report configuration
- yarm.tables.df_input_options(df, config)#
Process input data using the options in
input:key.These options are applied to every input file.
Important
If you modify these options, you must also modify
yarm.validate.validate_key_input()Note
For per-source options, see
df_tables_config_options().- Parameters
df (DataFrame) – Data we will manipulate
config (Nob) – Report configuration
- Returns
Data with options applied
- Return type
DataFrame
- yarm.tables.df_tables_config_options(df, source_config, table_name, input_file)#
Process options for a particular source in a table.
Important
If you modify these source options, you must also modify
yarm.validate.validate_key_tables_config().That function also ensures that all necessary keys are present (e.g., that if a pivot stanza is present, it also has index, columns, and values).
Note
For
input:options applied to all input files, seedf_input_options().- Parameters
df (DataFrame) – Table we will modify
source_config (NobView) – Configuration for this source
table_name (str) – Name of this table
input_file – Path to this source data
- Returns
Updated table, with options applied from this source
- Return type
DataFrame
- yarm.tables.get_include_index_all(config)#
Set default
input:include_indexfor all tables.- Parameters
config (Nob) – Report configuration
- Returns
Default value for whether to include the index in each table
- Return type
bool
Note
This value can be overridden by each particular table.
See also
- yarm.tables.get_include_index_table(table, table_name, include_index_all)#
Set
include_indexfor a particular table.- Parameters
table (NobView) – Configuration for this table
table_name (str) – Name for this table
include_index_all (bool) – Default
include_indexvalue
- Returns
Whether to include the index in this table
- Return type
bool
See also
- yarm.tables.input_source(input_format, conn, config, source_config, table_name, table_df, input_file, input_sheet)#
Input a source into a table DataFrame.
- Parameters
input_format (str) – Format for this source (e.g.
CSV)conn – Temporary database in memory
config (Nob) – Report configuration
source_config (NobView) – Configuration for this source
table_name (str) – Table we are creating or appending to
table_df (Optional[DataFrame]) –
Noneif this table is new, otherwise the existing tableinput_file (str) – Actual file with source data
input_sheet (Optional[Union[int, str]]) – Name of sheet if source is spreadsheet, otherwise
None
- Returns
New or updated table
- Return type
DataFrame
Important
If a table has multiple sources, each subsequent source is merged with an outer join.
See also
Queries#
Run queries on tables.
- yarm.queries.df_query_postprocess(df, config, query_config)#
Process postprocess function for a particular query.
- Parameters
df (DataFrame) – Results of query
config (Nob) – Report configuration
query_config (NobView) – Configuration for this query
- Returns
Query data after applying postprocess function
- Return type
DataFrame
Important
A postprocess function is defined by the user in a separate Python file, which must be imported with the
import:key. Seeyarm.validate.validate_key_import()
- yarm.queries.df_query_replace(df, query_config)#
Process
replace:keys for a particular query.- Parameters
df (DataFrame) – Query results
query_config (NobView) – Configuration for this query
- Returns
Query data with replacements applied
- Return type
DataFrame
- yarm.queries.query_options(df, config, query_config)#
Process options for a particular query.
- Parameters
df (DataFrame) – Data with initial query results
config (Nob) – Report configuration
query_config (NobView) – configuration for this query
- Returns
Query data with all options applied for this query.
- Return type
DataFrame
- yarm.queries.run_queries(conn, config)#
Run all the queries.
- Parameters
config (Nob) – Report configuration
conn (Connection) – Temporary database in memory
- yarm.queries.run_query(config, query, conn, sql, name)#
Run a query, apply the options, and return a DataFrame.
- Parameters
config (Nob) – Report configuration
query (NobView) – Configuration for this query
conn (Connection) – Temporary database in memory
sql (str) – SQL statement for this query
name (str) – Name for this query
- Returns
Initial query results
See also
- yarm.queries.save_query_to_database(df, conn, name)#
Save the processed query to the database.
- Parameters
df (DataFrame) – Query data after all processing
conn (Connection) – Temporary database in memory
name (str) – Name for this query
Export#
Export data.
- yarm.export.export_database(conn, config)#
Export database to sqlite3 database file.
- Parameters
conn (Connection) – Temporary database in memory
config (Nob) – Report configuration
- yarm.export.export_database_tables(config, conn, ext, msg_table_exported_csv, msg_table_exported_sheet, export_basename, indent=1, verbose=2)#
Export all database tables as file(s).
Note
In this context, a database “table” may be either a table defined in
tables_configor a query defined inqueries:. Both are saved as typetablein the database.- Parameters
config (Nob) – Report configuration
conn (Connection) – Temporary database in memory (see note)
ext (str) – Extension for output file
msg_table_exported_csv (str) – Message after exporting table to CSV
msg_table_exported_sheet (str) – Message after exporting table to sheet
export_basename (str) – Basename for single output file
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message
- yarm.export.export_df_csv(config, df, name, msg_export, indent=1, verbose=1)#
Export a single dataframe to CSV.
- Parameters
config (Nob) – Report configuration
df (DataFrame) – Data to export
name (str) – Name of dataframe, used as basename for output CSV
msg_export (str) – Confirmation message
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message
- yarm.export.export_df_list_xlsx(config, df_list, export_basename, msg_export, indent=1, verbose=1)#
Export a list of dataframes to XLSX.
- Parameters
config (Nob) – Report configuration
df_list (List[Tuple[str, DataFrame]]) – List of tuples (see note in
export_queries())export_basename (str) – Basename for output spreadsheet
msg_export (str) – Confirmation message
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message
- yarm.export.export_queries(config, df_list)#
Export all queries.
- Parameters
config (Nob) – Report configuration
df_list – List of tuples (see note)
Important
Each item in
df_listshould be a tuple of the form:(name, df)Note
The default output format is
XLSX, but this can be overriden withexport_queries: csvunderoutput:.See also
- yarm.export.export_tables(config, conn, indent=1, verbose=2)#
Export the tables created from configuration.
- Parameters
config (Nob) – Report configuration
conn (Connection) – Temporary database in memory (see note)
indent (int) – Number of indents before message
verbose (int) – Minimum verbosity required to show the message
Important
This function expects the connected database to contain only the tables in the config, not the queries yet. Queries are exported later, in
export_queries().
- yarm.export.get_full_output_basename(config)#
Return full basename for output files, with path to output dir.
- Parameters
config (Nob) – Report configuration
- Returns
Basename for output files
- Return type
str
- yarm.export.get_output_dir_path(config, filename)#
Get full path to filename in output dir.
- Parameters
config (Nob) – Report configuration
filename (str) – output filename
- Returns
Full path to output filename
- Return type
str
Helpers#
Helper functions.
- yarm.helpers.abort(msg, error=None, file_path=None, data=None, ps=None, indent=0, suggest_verbose=1)#
Abort with error message and status 1.
- Parameters
msg (str) – Message
error (Optional[str]) – Error message (see note in
msg_options())data (Optional[str]) – Display this data after message, shown on same line
file_path (Optional[str]) – File associated with this message, shown on separate line
ps (Optional[str]) – Final postscript to add at end of message
indent (int) – Number of indents before message
suggest_verbose (int) – Verbosity level to pass to
msg_suggest_verbose()
- yarm.helpers.key_show_message(key_msg, config, verbose=1)#
For each key, if that key is in
config, show the matching message.Important
key_msgmust be a list of tuples of the form: (key, message).key_msg: list = [ (s.KEY_INPUT__STRIP, s.MSG_STRIP_WHITESPACE), (s.KEY_INPUT__SLUGIFY_COLUMNS, s.MSG_SLUGIFY_COLUMNS), (s.KEY_INPUT__LOWERCASE_COLUMNS, s.MSG_LOWERCASE_COLUMNS), (s.KEY_INPUT__UPPERCASE_ROWS, s.MSG_UPPERCASE_ROWS), ] key_show_message(key_msg, config, verbose=1)
- Parameters
key_msg (List[Tuple[str, str]]) – List of tuples of the form:
(key, message)config (Nob) – Report configuration
verbose (int) – Minimum verbosity level required to show this message
- yarm.helpers.load_yaml_file(input_file, schema)#
Read YAML file into
strictyaml, and validate against a schema.- Parameters
input_file (str) – path to YAML file
schema (Any) –
strictyamlschema
- Returns
Validated YAML
- Return type
YAML
- yarm.helpers.msg(msg, verbose=0, indent=0)#
Show message.
Note
By default, this message will still show even if user did not use a
-vflag.- Parameters
msg (str) – Message to display
verbose (int) – Minimum verbosity required to show this message
indent (int) – Number of indents before message
- yarm.helpers.msg_options(msg, prefix=None, prefix_color=None, error=None, file_path=None, data=None, ps=None, indent=0)#
Show a message with various options.
Important
This function is not normally used directly. Instead, use:
- Parameters
msg (str) – Message
prefix (Optional[str]) – e.g.
Errorprefix_color (Optional[str]) – e.g.
rederror (Optional[str]) – Error message (see note)
data (Optional[str]) – Display this data after message, shown on same line
file_path (Optional[str]) – File associated with this message, shown on separate line
ps (Optional[str]) – Final postscript to add at end of message
indent (int) – Number of indents before message
Note
This function expects
errorto be typestr, but the error returned by anexcept:clause may need to be converted withstr().
- yarm.helpers.msg_suggest_verbose(suggest_verbose)#
Show message suggesting rerunning with a higher level of verbosity.
Note
This message will only be shown if the verbosity level is set lower than
suggest_verbose.- Parameters
suggest_verbose (int) – Verbosity level that message will suggest
- yarm.helpers.msg_with_data(msg, data, verbose=1, indent=0)#
Show message with accompanying data.
Note
By default, the message will only be shown if the user used at least one
-vflag.- Parameters
msg (str) – Message to display
data (str) – Display this data after message, shown on same line
verbose (int) – Minimum verbosity required to show this message
indent (int) – Number of indents before message
- yarm.helpers.overwrite_file(path, indent=1)#
Overwrite a file if it exists.
Note
Technically, this function only removes the file. The new file must be written separately.
Note
If a prompt question is shown, it is not indented.
- Parameters
path (str) – File to overwrite
indent (int) – Number of indents before message
- Returns
True if file existed and was removed, False otherwise
- Return type
bool
- yarm.helpers.show_df(df, data, verbose=3)#
Display a dataframe.
- Parameters
df (DataFrame) – Data to display
data (str) – Display this data after message, shown on same line
verbose (int) – Minimum verbosity level required to show this message
- yarm.helpers.success(msg, file_path=None, data=None, ps=None)#
Show success message.
- Parameters
msg (str) – Message
data (Optional[str]) – Display this data after message, shown on same line
file_path (Optional[str]) – File associated with this message, shown on separate line
ps (Optional[str]) – Final postscript to add at end of message
- Return type
None
- yarm.helpers.verbose_ge(verbose)#
Return
Trueif verbosity >=verbose.- Parameters
verbose (int) – verbosity level
- Returns
True if user used verbose or more -v flags, otherwise False
- Return type
bool
- yarm.helpers.warn(msg, error=None, file_path=None, data=None, ps=None, indent=0)#
Show warning, but proceed.
- Parameters
msg (str) – Message
error (Optional[str]) – Error message (see note in
msg_options())data (Optional[str]) – Display this data after message, shown on same line
file_path (Optional[str]) – File associated with this message, shown on separate line
ps (Optional[str]) – Final postscript to add at end of message
indent (int) – Number of indents before message
- Return type
None