To enrich any text format
M5 version 2.0, document subversion 1, 2024
by Steve Hoover, Redwood EDA, LLC
(steve.hoover@redwoodeda.com)
This document is licensed under the CC0 1.0 Universal license.
The M5 text processing language and tool enhances the Gnu M4 macro preprocessor, adding features typical of programming languages.
- 1. Background Information
- 2. Getting Started with the M5 Tool
- 3. An Overview of M5 Concepts
- 4. Sugar-Free M5 Details
- 5. Syntactic Sugar
- 6. Coding Practices
- 7. Macro Library
- 8. Reference Card
- Index
1. Background Information
1.1. Overview
M5 is a macro preprocessor on steroids. It is built on the simple principle of text substitution but provides features and syntax on par with other simple programming languages. It is an easy and capable tack-on enhancement to any text format as well as a reasonable general-purpose programming language specializing in text processing. Its broad applicability makes M5 a valuable tool in every programmer/engineer/scientist/AI’s toolbelt.
This chapter provides background and general information about M5, guidance about this specification, and instructions for using M5.
1.2. About this Specification
This document covers the M5 language as well as its standard Macro Library. This document’s major version reflects the language version, and the minor version reflects the library version. There is also a document subversion distinguishing versions of this document with no corresponding language or library changes.
1.3. M5’s Origin Story
I created M5 as a preprocessor for the TL-Verilog hardware language and later decoupled it as a stand-alone tool. The original intent was to use an out-of-the box macro preprocessor to provide a stop-gap solutions to missing TL-Verilog language features for "code construction" as TL-Verilog took shape. While other hardware languages build on existing programming languages to provide code construction, I wanted a simpler approach that would be less intimidating to hardware folks. M4 was the obvious choice as the most broadly adopted macro preprocessor.
M4 proved to be capable, but extremely difficult to work with. After a few years fighting with an approach that was intended to allow me to focus my attention elsewhere, I decided I needed to either find a different approach or clean up the one I had. I felt my struggles had led to some worthwhile insights and that there was a place in the world for a better text processing language/tool, so I carved out some time to polish my mountain of hacks.
Though M5 would benefit from a fresh non-M4/Perl-based implementation, I had to draw the line somewhere. At this point, that legacy is mostly behind the scenes, and while it’s not everything I’d like it to be, it’s close, and it’s way better than any other text preprocessor I’m aware of.
So I hope you enjoy the language I never wanted to write. I’m actually rather proud of it and find new uses for it every day.
1.4. M5 Versus M4
M5 uses M4 to implement a text-preprocessing language with some subtle philosophical differences. M5 aims to preserve most of the conceptual simplicity of macro preprocessing while adding features that improve readability, manageability, and debuggability for more complex use cases.
This document is intended to stand on its own, independent of the M4 documentation. The M4 documentation can, in fact, be confusing due to M5’s philosophical differences with M4.
Beyond M4, M5 contributes:
-
features that feel like a typical, simple programming language
-
literal string variables
-
functions with named arguments
-
variable/macro scope
-
an intentionally minimal amount of syntactic sugar
-
document generation assistance
-
debug aids such as stack traces
-
safer parsing and string manipulation
-
a richer core library of utilities
-
a future plan for modular libraries
1.5. Limitations of M5
M4 has certain limitations that M5 is unable to address. M5 uses M4 as is without modifications to the M4 implementation (though these limitations may motivate changes to M4 in the future).
1.5.1. Security
M4 has full access to its host environment (similar to most programming and scripting languages, but unlike many macro preprocessors). Malware can easily do harm. Third- party M5 code should be carefully vetted before use, or M5 should be run within a contained environment. M5 provides a simple mechanism for library inclusion by URL (or it will). This enables easy execution of public third-party code, so use it with extreme caution.
1.5.2. Modularity
M4 does not provide any library, namespace, and version management facilities. Though M5 does not currently address these needs, plans have been sketched in code comments.
1.5.3. String processing
While macro processing is all about string processing, safely manipulating arbitrary
strings is not possible in M4 or it is beyond awkward at best. M4 provides
m4_regexp
, m4_patsubst
, and m4_substr
. These return unquoted strings that will
necessarily be elaborated, potentially altering the string. While M5 is able to jump
through hoops to provide m5_regex
and m5_substr
(for strings of limited length)
that return quoted (literal) text, m4_patsubst
cannot be fixed (though m5_for_each_regex
is similar). The result of m4_patsubst
can be quoted only by quoting the input string,
which can complicate the match expression, or by ensuring that all text is matched,
which can be awkward, and quoting substitutions.
In addition to these issues, care must be taken to ensure that resulting text does not contain mismatching
quotes or parentheses or combine with surrounding text to result in the same. Such
resulting mismatches are difficult to debug. M5 provides a notion of "unquoted strings"
that can be safely manipulated using m5_regex
, and m5_substr
.
Additionally the regex configuration used by M4 is quite dated. For example, it does not support lookahead, lazy matches, and character codes.
1.5.4. Introspection
Introspection is essentially impossible. The only way to see what is defined is to dump definitions to a file and parse this file.
1.5.5. Recursion
Recursion has a fixed (command-line) depth limit, and this limit is not applied reliably.
1.5.6. Unicode
M4 is an old tool and was built for ASCII text. UTF-8 is now the most common text format.
It is a superset of ASCII that encodes additional characters as two or more bytes using byte
codes (0x10-0xFF) that do not conflict by those defined by ASCII (0x00-0x7F). All such bytes
(0x10-0xFF) are treated as characters by M4 with no special meaning, so these characters
pass through, unaffected, in macro processing like most others. There are two
implications to be aware of. First, m5_length
provides a length in bytes, not characters.
Second, m5_substr
and regular expressions manipulate bytes, not characters. This can
result in text being split in the mid-character, resulting in invalid character
encodings.
1.5.7. Debugging features
M4’s facilities for associating output with input only map output lines to line numbers of top-level calls. M4 does not maintain a call stack with line numbers.
M4 and M5 have no debugger to step through code. Printing (see m5_DEBUG
is the debugging mechanism of choice.
1.5.8. Performance
M5 is intended for text processing, not for compute-intensive algorithms. Use a programming language for that.
1.5.9. Graphics
M5 is for text processing only.
1.5.10. Status
Major next steps include:
-
Implementing a better library system.
-
Some syntactic sugar (quotes, code blocks) should not be recognized in source context.
See issues file in the M5 repository for more details.
2. Getting Started with the M5 Tool
2.1. Configuring M5
M5 adds a minimal amount of syntax, and it is important that this syntax is unlikely to conflict with the target language syntax. The syntax that could conflict is listed in Ensure No Impact. Currently, there is no easy mechanisms to configure this syntax.
2.2. Running M5
The Linux command:
m5 in-file > out-file
runs M5 in its default configuration.
(Currently, there’s a dependency on M4 and perl and no installation script.)
2.3. Ensure No Impact
When enabling the use of M5 on a file, first, be sure M5 processing does nothing to the file. M5 should output the input text, unaltered, as long as your file contains no:
-
quotes, e.g.
['
,']
) (This requirement is being removed with no quote processing in source context.) -
m5_
orm4_
-
M5 comments, e.g.
///
,/**
,**/
-
code blocks, e.g.
[
or{
followed by a newline or]
or}
beginning a line after optional whitespace (This requirement is being removed with no processing in source context.)
2.4. Tool Flow
Since M5 is simply substituting text, you can do bizarre things, which can be difficult to debug. Understanding the tool flow can help you look one step under the hood to debug issues or understand how syntax is interpreted.
M5 basically processes files in two steps:
-
Interpret syntactic sugar.
-
Run M4.
(There is a third step as well that is very minor to undo some of the sugaring.)
Object files are generated (run m5 -h
for options) that expose the interpretation of M5 sugar.
Note that quotes and commas are substituted with control characters in these files, so you will
need an appropriate tool to view them. Your shell may recognize them as binary files and prompt you
about viewing them, which is fine to do.
3. An Overview of M5 Concepts
3.1. Macro Preprocessing in General
Macro preprocessors extend a target programming language, text format, or general text file with the ability to define and call (aka instantiate, invoke, expand, evaluate, or elaborate) parameterized macros that provide text substitutions. Macros are generally used to provide convenient shorthand for commonly-used constructs. A macro preprocessor processes a text file sequentially with a default behavior of passing the input text through as output text. When a macro name is encountered, it and its argument list are substituted for new text according to its definition.
M5 provides convenient syntax for macro preprocessing as well as programatic text processing, sharing the same macros for each. This provides advanced text manipulation to supercharge any programming language or text file.
3.2. Macros Overview
A macro that simply outputs a static text string can be defined within the source file like this:
m5_macro(hello, Hello World!)
The above text will substitute with an empty string but will define a macro that can be called like this:
m5_hello()
Resulting in:
Hello World!
Macros can also be parameterized. Here we define a macro that outputs a string with a single
parameter referenced as $1
:
m5_macro(hello, Hello $1!)
And call it like this:
m5_hello(World)
Resulting in:
Hello World!
For more details on macro syntax, see Declaring Macros, Calling Macros, and Macro Arguments.
3.3. Quotes Overview
Quotes (['
and ']
) may be used around text to prevent substitutions. For example, to provide
a macro whose result includes a comma, quotes are needed:
m5_macro(hello, ['Hello, $1!'])
Without these quotes, the comma in Hello, $1!
would be interpreted as a macro argument separator.
Furthermore, a second level of quotes may be needed to prevent the interpretation of the comma after substitution:
m5_macro(hello, ['['Hello, $1!']']) m5_hello(World)
The call substitutes with ['Hello, World!']
(actually ['Hello, World!']['']
), which elaborates to the literal text:
Hello, World!
For more details on quote use, see Quotes.
3.4. Variables Overview
Variables hold string values. They can be thought of as macros without arguments. They are defined as:
m5_var(Hello, ['Hello, World!']) m5_var(Age, 23)
And used as:
m5_Hello I am m5_Age years old.
Resulting in:
Hello, World! I am 23 years old.
Variables are always returned as literal strings, so a second level of quoting is
not required for the definition of Hello
.
Variables are scoped, and by convention, scoped definitions are named in camel case (strictly speaking, Pascal case).
For more details on variable use, see Variables and Variable Sugar.
3.5. Macro Stacks
All macros and variables, are actually stacks of definitions that can be pushed and popped. (These stacks are frequently one entry deep.) The top definition is active, providing the replacement text when the macro/variable is instantiated. The others are only accessible by popping the stack. Pushing and popping are not generally done explicitly, but rather through scoped declarations. See Scoped Code Blocks.
3.6. Code Syntax Overview
The above syntax is convenient in "source context", embedded into another language. It is clear where substitutions
occur because all macro calls and variables are referenced with an m5_
prefix. This syntax, however, quickly becomes
clunky for any substantial text manipulation, requiring excessive m5_
-prefixing.
Additionally, it is difficult to format code readably because carriage returns and other whitespace are generally
taken literally. This results in single-line syntax with many levels of nesting that quickly become difficult
to follow.
To enable code structure that looks more like a programming language, "code context" can be established within which code syntax is supported.
Take for example this one-line definition in source context of an assert
macro:
m5_macro(assert, ['m5_if(['$1'], ['m5_error(['Failed assertion: $1.'])'])'])
This can be written equivalently (though with a slight performance impact) as:
m5_macro(assert, { if(['$1'], [ error(['Failed assertion: $1.']) ]) })
m5_macro(
enters "argument list context", where parentheses and brackets have special meaning.
{
at the end of its line enters code context, where, most notably, text does not implicitly
pass through to the output and m5_
is implied at the beginning of each code statement
(beginning its line). On the final line, }
and )
exit these contexts
For more details, see Code Blocks and Contexts.
3.7. Functions and Scope Overview
M5 also provides a syntax for function declarations with named parameters. The assert macro can be defined as a function as:
fn(assert, Expr, { if(m5_Expr, [ error(Failed assertion: m5_Expr.) ]) })
Like any respectable programming language, Expr
, above, is local to the function.
Functions and other macros may produce result text (see Function Output Example and Code Blocks). They may also produce
side effects including variable declarations (see Aftermath) and STDERR output (see m5_error
).
For more details on functions, see Functions. For more details on scope, see Scoped Code Blocks.
3.8. Function Output Example
We can add output text to this function indicating assertion failures in the resulting text:
fn(assert, Expr, { ~if(m5_Expr, [ error(Failed assertion: m5_Expr.) ~(Failed assertion: m5_Expr.) ]) })
Statements producing output are prefixed with a tilde (~
).
3.9. Libraries and Namespaces Overview
M5 has a simple and effective import mechanism where a macro library file is simply imported by its URI (URL or local file). Libraries can be imported into their own namespace (though this mechanism is not yet implemented).
3.10. Processing Steps
Several of the above constructs, including code blocks and statements are termed "syntactic sugar" and are processed in a first pass before macro substitution—yes as a pre-preprocessing step.
M5 processing involves the following (ordered) steps:
-
Substitute quotes for single control characters.
-
Process syntactic sugar (in a single pass):
-
Strip M5 comments.
-
Process other syntactic sugar, including block and label syntax.
-
Process pragmas; check indentation and quote/parenthesis matching.
-
-
Write the resulting file.
-
Run M4 on this file (substituting macros).
4. Sugar-Free M5 Details
4.1. Defining "Sugar-Free"
M5 can be used "sugar-free". It’s just a bit clunky for humans. Syntactic Sugar is recognized in the
source file. Text that is constructed on the fly and evaluated (e.g. by m5_eval
) is evaluated sugar-free.
4.2. Quotes
Unwanted processing, such as macro substitution, can be avoided using quotes. By default,
these are ['
and ']
(and a configuration mechanism is not yet available to change this).
Like syntactic sugar, they are recognized only when they appear in a
source file and cannot be constructed from their component characters. Quotes, however, are an essential
part of M5, not a syntactic convenience.
Quoted text begins with ['
. The quoted text is parsed only for ['
and ']
and ends at the corresponding ']
.
The quoted text passes through to the resulting text, including internal matching quotes, without any
substitutions. The outer quotes themselves are discarded.
The end quote acts as a word boundary for subsequent text processing.
Within quotes, intervening
characters that would otherwise have special treatment, such as commas, parentheses, and m5_
-prefixed
words (after sugar processing), have no special treatment.
Quotes can be used to delimit words. For example, the empty quotes below:
Index['']m5_Index
enable m5_Index
to substitute, as would:
['Index']m5_Index
(Index\m5_Index
is a shorthand for this. See Backslash Word Boundary.)
Quotes can also be used to avoid the interpretation of m5_foo
as syntactic sugar. (See Macro Call Sugar.)
Special syntax is provided for multi-line literal quoted text. (See Code Blocks.) Outside of those
constructs, quoted text should not contain newlines since newlines are used to format code.
Instead, the m5_nl
variable (or macro) provides a literal newline character, for example:
m5_DEBUG(['Line:']m5_nl[' ']m5_Line)
4.3. Variables
A variable holds a literal text string. Variables are defined using: m5_var
, are reassigned using m5_set
,
and are accessed using m5_get
. For example:
m5_var(Foo, 5) m5_set(Foo, m5_calc(m5_Foo + 1)) m5_get(Foo)
Syntactic sugar provides variable access using, e.g., m5_Foo
rather than m5_get(Foo)
. (See Variable Sugar.)
4.4. Declaring Macros
Here we declare an echo
macro.
m5_macro(echo, ['['$1']'])
where
m5_echo(['Hello, World!'])
substitutes with ['Hello, World!']
, and this elaborates as Hello, World!
.
The most direct way to declare a macro is with m5_macro
. For example:
m5_macro(foo, ['['Args:$1,$2']'])
This defines the macro body as ['Args:$1,$2']
.
A macro call returns the body of the macro definition with numbered parameters substituted with the corresponding arguments. Dollar parameter substitutions are made throughout the entire body string regardless of the use of quotes and adjacent text. The result is then evaluated, so these macros can perform computations, assign variables, provide argument lists, etc. In this case, the body is quoted, so its resulting text is literal. For example:
m5_foo(A,B) ==> Yields: "Args:A,B"
A few special dollar parameters are supported in addition to numbered parameters. The following notations are substituted:
-
$1
,$2
, etc.: These substitute with corresponding arguments. -
$#
: The number of arguments. -
$@
: This substitutes with a comma delimited list of the arguments, each quoted so as to be taken literally. So,m5_macro(foo, ['m5_bar($@)'])
is one way to definem5_foo(…)
to have the same behavior asm5_bar(…)
. -
$*
: This is rarely useful. It is similar to$@
, but arguments are unquoted. -
$0
: The name of the macro itself. It can be convenient for making recursive calls (though seem5_recurse
).$0__
can also be used as a name prefix to localize a macro name to this macro, though this use model is discouraged. (See Variable Masking.) For Functions,$0
is the internal name holding the function body. It should not be used for recursion but can be used as a unique prefix.
Caution
|
Macros may be declared by other macros in which case the inner macro body appears within the outer macro body. Numbered parameters appearing in the inner body would be substituted as parameters of the outer body. It is generally not recommended to use numbered parameters for arguments of nested macros, though it is possible. For more on the topic, see Block Labels. |
4.5. Calling Macros
The following illustrates a call of the macro named foo
:
m5_foo(hello, 5)
Note
|
When this syntax appears in a source file, it is recognized as syntatic sugar and is processed to provide additional checking. Here, we specifically descibe the processing of this syntax when constructed from other processing, noting that syntactic sugar results in similar behavior. (See. Macro Call Sugar.) |
A well-formed M5 macro name is comprised of one or more word
characters (a-z
, A-Z
, 0-9
, and _
).
When elaboration encounters (in unquoted text and without a preceding word character or immediately following
another macro call) m5_
, followed immediately by the
well-formed name of a defined macro, followed immediately by (
(e.g. m5_foo(
) an argument list (see Macro Arguments) is processed,
then the macro is "called" (or "expanded"). $
substitutions are performed on the macro body (see Declaring Macros), the
resulting text replaces the macro name and argument list followed by an implicit ['']
to create a word boundary,
and elaboration is resumed from the start of this substituted text.
Macro names should not be encountered without an argument list. Though this would result in calling the
macro with zero arguments, it is discouraged due to the syntactic confusion with variables. Macros
can be called with zero arguments using m5_call(macro_name)
instead. (See m5_call
.)
Note
|
Though discouraged, it is possible to define macros with names containing non-word characters.
Such macros can only be called indirectly (e.g. m5_call(b@d, args) ). (See m5_call .)
|
Note
|
In addition to m5_ macros,
the M4 macros from which M5 is constructed are available, prefixed by m4_ , though their
direct use is discouraged and this document does not describe their use. Elaboration of the string m4_
should by avoided.
|
4.6. Macro Arguments
Macro calls pass arguments within (
and )
that are comma-separated.
For each argument, preceding whitespace is not part of the argument, while postceding whitespace
is. Specifically, the argument list begins after the unquoted (
. Subsequent text is elaborated
sequentially (invoking macros and interpreting quotes). The text value of the first argument begins
at the first elaborated non-whitespace charater following the (
. Unquoted (
are counted as
an argument is processed. An argument is terminated by the first unquoted and non-parenthetical
,
or )
in the resulting elaborated text. A subsequent argument, similarly,
begins with the first non-whitespace character following the ,
separator. Whitespace includes
spaces, newlines, and tabs. An unquoted )
ends the list.
Some examples to illustrate preceding and postceding whitespace and nested macros:
If, m5_foo(A,B)
echoes its arguments to produce literal text {A;B}
, then:
m5_foo( A, B) ==> Yields: "{A;B}" m5_foo( [''] A,B) ==> Yields: "{ A;B}" m5_foo( A , B ) ==> Yields: "{A ;B }" m5_foo(m5_foo(A, B), C) ==> Yields: "{{A;B};C}" m5_foo(m5_foo([')'],B),C)==> Yields: "{{);B};C}" (with a warning about unbalanced parentheses)
Arguments can be empty text, such as ()
(one empty argument) and (,)
(two empty arguments).
Note that the use of quotes is prefered for clarity. For example, ([''])
and
([''], [''])
are identical to the previous cases.
The above syntax does not permit macro calls with zero arguments, but m5_call(macro_name)
can be used
for this purpose. (See m5_call
.)
Be aware that when argument lists get long, it is useful to break them up on multiple lines. The newlines should precede, not postcede the arguments, so they are not included in the arguments. E.g.:
m5_foo(long-arg1, long-arg2)
Notably, the closing parenthesis should not be on a the next line by itself. This would include the newline and spaces in the second argument.
5. Syntactic Sugar
Syntactic sugar is syntax that is processed directly in the source file prior to macro processing. (See Processing Steps.)
5.1. Comments
5.1.1. M5 Comments (///
and /**
…**/
)
M5 comments are one form of syntactic sugar. They look like:
/// This line comment will disappear. /** This block comment will also disappear. **/
Block comments begin with /**
and end with **/
. Line comments
begin with ///
and end with a newline. Both are stripped prior to any other processing.
As such:
-
M5-commented parentheses and quotes are not visible to parenthesis and quote matching checks, etc.
-
M5 comments may follow the
[
or{
beginning a code block or after a comma and prior to an argument that begins on the next line without affecting the code block or argument.
Whitespace preceding a line comment is also stripped. Newlines from block comments are preserved.
Note
|
Text immediately following **/ may, after stripping the comment, begin the line.
Comments are stripped before indentation checking. It is thus generally recommented that multi-line block comments
end with a newline.
|
In case ///
or /**
are needed in the resulting file, quotes can be used, e.g.: ['//']['/']
, to
disrupt the syntax.
5.1.2. Target-Language Comments (E.g. //
)
Comments in the target language are not recognized as comments by M5. To disable M5 code, it is important to use M5 comments, not target-language comments. (Thus it can be especially problematic when one’s editor mode highlights target-language comments in a manner that suggests the code has no impact.)
5.1.3. Statement Comments (E.g. /
)
These are specific to Code Blocks, introduced later.
5.2. Macro Call Sugar
m5_\foo(
is syntactic sugar for m5_\call(foo,
. (See m5_call
.) This transformation
(as long as it is evaluated) has no impact other than to verify that the macro exists.
m5_\foo(
should not appear in literal text that is never to be evaluated as it would
get undesirably sugared. (See Quotes and Backslash Word Boundary for syntax to avoid undesired sugaring.)
Note
|
M5 may avoid applying this sugar for common macros from the M5 core library that are assumed to be defined. |
This m5_foo(
syntax also enters "argument list context" (see Contexts).
5.3. Variable Sugar
m5_Foo
(without a postceding (
) is syntactic sugar for m5_get(Foo)
. (See m5_get
.)
m5_Foo
should not appear in literal text that is never to be evaluated as it would
get undesirably sugared. (For syntax to avoid undesired sugaring, see Quotes and Backslash Word Boundary.)
5.4. Backslash Word Boundary (m5_\
and \m5_
)
As more convenient alternatives to quotes:
-
m5_\foo
results inm5_foo
without sugaring. This should be used in quoted, non-evaluated context when the literal stringm5_foo
is desired. -
\m5_foo
is shorthand for['']m5_foo
to provide a word boundary, enabling M5 processing ofm5_foo
when preceded by a word.
5.5. Multi-line Constructs: Blocks and Bodies
5.5.1. What are Bodies and Blocks?
A "body" is a parameter or macro value that is to be be evaluated in the context of a caller.
Macros, like m5_if
and m5_loop
have "immediate" body parameters. These bodies are to be evaluated
by calls to these macros themselves. The final argument to a function or macro declaration
is an "indirect" body argument. This body is to be evaluated, not by the declaration macro itself, but by the
caller of the macro it declares.
Note
|
Declaring macros that evaluate body arguments requires special consideration. See Functions with Body Arguments. |
Code Blocks are convenient syntactic sugar constructs for multi-line body arguments formatted like code.
[Text blocks] are syntactic sugar for specifying multi-line blocks of arbitrary text, indented with the code.
5.5.2. Macro Bodies
A body argument can be provided as a quoted string of text:
m5_if(m5_A > m5_B, ['['Yes, ']m5_A[' > ']m5_B'])
Note that the quoting of ['Yes, ']
prevents misinterpretation of the ,
as an argument separator
as the body is evaluated.
This syntax is fine for simple text substitutions, but it is essentially restricted to a single line which is unreadable for larger bodies that might define local variables, perform calculations, evaluate code conditionally, iterate in loops, call other functions, recurse, etc.
5.5.3. Code Blocks
M5 supports special multi-line syntactic sugar convenient for body arguments, called "code blocks". These look more like blocks of code in a traditional programming language. Aside from comments and whitespace, they contain only macro calls and variable elaborations ("statements"). The resulting text of the code block is constructed from the results of these macro calls.
The code below is equivalent to the example above, expressed using a code body (and assuming it is itself called from within a code body).
/Might result in "Yes, 4 > 2". ~if(m5_A > m5_B, [ ~(['Yes, ']) ~A ~([' > ']) ~B ])
The block begins with [
, followed immediately by a newline. It ends with a line that begins with ]
,
indented consistently with the beginning line. The above code block is "unscoped". A "scoped" code block
uses, instead, {
and }
. Scopes are detailed in Scoped Code Blocks.
The first non-blank line of the block determines the indentation of the block. Indentation uses spaces; tabs are discouraged, but must be used consistently if they are used. All non-blank lines at this level of indentation (after stripping M5 comments) begin a "statement". Lines with deeper indentation would continue a statement. A continuation line either begins a macro argument or is part of its own (nested) code block argument.
Essentially, the body, when evaluated, results in the text produced by its statements, which are macros or
variables, listed without their m5_
prefix, or inline text.
Specifically, statements can be:
-
Macro calls, such as
~if(m5_A > m5_B, …)
. -
Variable elaborations, such as
~A
. -
Output statements, such as
~(['Yes, '])
. -
Comments, such as
/A comment
.
Statements that produce output (as all statements in the above example’s code block do) must be preceded by ~
(and others may be). This simply helps to identify
the source of code block ouput. The ~(…)
syntax produces the given text. A m5_
prefix is implicit on statements.
In the rare (and discouraged) event that a macro without this prefix is to be called, such as use of an m4_
macro, using ~out(m4_…)
will do the trick.
The earlier example behaves the same as:
m5_out(m5_if(m5_A > m5_B, m5__block([' m5_out(['Yes, ']) m5_out(m5_get(A)) m5_out([' > ']) m5_out(m5_get(B)) ']))
The (internal) m5__block
macro evaluates its argument and results in any text captured by m5_out
.
5.5.4. Scoped Code Blocks
Scoped Code Blocks are delimited by {
/ }
quotes.
Within a code block, variable declarations (e.g. made by m5_var
) are scoped. Their definitions are pushed by the declaration, and
popped at the end of their scope. (See Macro Stacks regarding pushing and popping.)
It is recommended that all indirect body arguments (see Multi-line Constructs: Blocks and Bodies), such as those of m5_fn
be scoped. Immediate body
arguments (see Multi-line Constructs: Blocks and Bodies), such as those of m5_if
, are most often unscoped, but scope may be used to isolate the side
effects of the block to explicit m5_out_eval
calls. Scoped and unscoped blocks are illustrated in the following example:
fn(check, Cond, { if(m5_Cond, [ warning(Check failed.) ]) )}
Declarations from outer scopes are visible in inner scopes. Similarly, declarations from calling scopes are visible in callee scopes, though functions should generally be written without any assumptions about the calling scope. Exceptions should be clearly documented/commented.
Note
|
It is fine to redeclare a variable in the same scope. The redeclaration will override the first, and both definitions will be popped after evaluating the code block. Notably, a variable may be conditionally declared without any negative consequence on stack maintenance. |
By convention, scoped variables and macros use Pascal case, e.g. MyVar
. (See [Macro Naming Conventions].)
5.5.5. Text Blocks
"Text blocks" provide a syntax for multi-line quoted text that is indented with its surroundings.
They are delimited similarly to code blocks, but use standard (['
/ ']
) quotes. The openning quote
must be followed by a newline and the closing quote must begin a new line that is indented consistently
with the line beginning the block. Their indentation is defined by the first non-blank line in the block.
All lines must contain at least this indentation (except the last). This fixed level of indentation
and the beginning and ending newline are removed. For example:
macro(copyright, ['[' Copyright (c) 20xx All rights reserved. ']'])
This is equivalent to:
macro(copyright, ['['Copyright (c) 20xx']m5_nl['All rights reserved.']'])
The text of the block is in source context, thus syntactic sugar is interpretted under the assumption
that the text is to be evaluated. Text blocks that contain literal (quoted) text that is not evaluated
should avoid entering argument list context with m5_
, using quotes or $
(if within a macro body), and it
should be understood that vanishing comments would be removed.
5.5.6. Evaluate Blocks
It can be convenient to form non-body arguments by evaluating code. Syntactic sugar is provided for
this in the form of a *
preceding the block open quote.
For example, here a scoped evaluate code block is used to form an error message by searching for negative arguments:
error(*{ ~(['Arguments include negative values: ']) var(Comma, ['']) ~for(Value, ['$@'], [ ~if(m5_Value < 0, [ ~Comma set(Comma, [', ']) ~Value ]) ]) ~(['.']) })
5.5.7. Block Labels: Escaping Blocks and Labeled Numbered Parameters
Proper use of quotes can get a bit tedious, especially when it is necessary to escape out of several levels of nested quotes. It can improve maintainability, code clarity, and performance to make judicious use of block labels. Note, however, that the need for block labels is rare and is mostly replaced by mechanisms provided by Functions.
Blocks can be labeled using syntax such as:
macro(my_macro, ..., <sf>{ })
Labels can be used in two ways.
-
First, to escape out of a block, typically to generate text of the block.
-
Second, to specify the block associated with a numbered parameter.
Both use cases are illustrated in the following example that attempts to declare a macro for parsing text.
This macro declares a helper macro ParseError
for reporting parse errors that can be
used many times by my_parser
.
/Parse a block of text. macro(my_parser, { var(Text, ['$1']) /// Text to parse var(What, ['$2']) /// A description identifying what is begin parsed /Report a parse error, e.g. m5_ParseError(['unrecognized character']) macro(ParseError, { error(['Parsing of ']m5_What[' failed with: "$1"']) }) ... })
This code contains, potentially, two mistakes in the error message. First, m5_What
will be
substituted at the time of the call to ParseError
. As long as my_parser
does not
modify the value of What
, this is fine, but it might be preferred to expand m5_What
in
the definition itself to avoid this potential Variable Masking issue in case What
is reused.
Secondly, $1
will be substituted upon calling my_parser
, not upon calling ParseError
,
and it will be substituted with a null string.
The corrected example would use:
macro(ParseError, <err>{ error(['Parsing of ']<err>m5_What[' failed with: "$<err>1"']) /// 2 Fixes! })
This code corrects both issues:
-
']<err>m5_What['
: This syntax acts in this case as']']m5_nquote(1,m5_get(What))['['
, escaping enough levels of quoting to evaluatem5_What
in the text of theerr
block and having the effect of using the definition ofm5_What
at the time of the macro definition. (The added level of quotes corresponds to the{
/}
block quotes which are sugar for['
/']
.) -
$<err>1
: This syntax associates$1
with theerr
block and is in this example equivalent to']']m5_nquote_dollar(1,1)['['
.
5.6. Contexts
The various features of M5 apply in different contexts. This section summarizes the syntaxes that transition among contexts and the syntactic features available in each context. The context in which various features are supported is also summarized in Reference Card. Contexts can be nested, with the innermost context determining which features are available.
The following file illustrates different contexts:
Copyright (c) Joe Cool /// source context m5_do([ /// enter argument list context then code context var(Ver, 1.0) /// code context var(Banner, [' /']['// code context, enters source context Zap™ (v']m5_Ver[') /']['// text (escaping to code) context Author: Joe Cool /']['// text context ']) /// exits source context ]) /// exit code context then argument list context File version: m5_Ver /// source context
5.6.1. Source Context
Source context generally passes text through to the output. It is the default context and is also the context of text blocks.
Features supported in source context are supported in all contexts. For text that is intended to be literal, caution must be taken to avoid inadvertent use of these syntaxes. (See Ensure No Impact.)
The following are recognized in source context:
-
Vanishing comments
-
Macro calls
-
Variable instantiation
-
Pragmas
5.6.2. Text Context
Text context is the default context entered by (block or non-block) ['
quotes.
In addition to the features of source context, the following are recognized in text context:
-
quotes are parsed and matched (see Quote and Parenthesis Matching)
5.6.3. M5 Context
Argument list context is entered from source and text contexts by, for example, m5_foo(
. This context is exited by the corresponding )
.
In addition to text context features, the following are recognized in argument list context:
-
code and text blocks
-
parentheses are matched (see Quote and Parenthesis Matching)
5.6.4. Code Context
Code context is for Code Blocks, supporting syntactic sugar for formatting macro code more like programs.
Code context is entered by [
/{
that end a line (after
stripping vanishing comments) and is exited by the corresponding ]
/}
beginning a line at matching indentation (also after
stripping vanishing comments).
In addition to argument list context features, the following are recognized in code context:
-
implicit
m5_
beginning lines -
~
allowing output (including, e.g.~(hi)
,~MyVar
,~nl()
) -
/
comments
5.7. Syntax Checks and Pragmas
5.7.1. Indentation Checks
M5 checks that indentation is consistent for code and text blocks.
5.7.2. Quote and Parenthesis Matching
Parenthesis and quote matching is performed on the code after stripping comments.
Quotes (including [
/ ]
and {
/ }
quotes for code blocks) must be
balanced.
Within each level of quotes, parentheses must be balanced. Parentheses in source and text context are excluded from this check, thus requiring parentheses for macros and parentheses that appear unquoted within macro arguments to be balanced.
Within a line, ']
/ ['
quotes may be used (including nesting) to escape
from and return to the same quoted context. This applies to contexts of all quote
types, including code blocks, even though they are bound using different
quote syntax. The context that is escaped from and returned to is the same
context, thus parenthesis matching happens across the escaping. Thus, the
parentheses on this code statement line are matching:
~hello(']<top>m5_Name[')
Here are some other examples:
m5_var(Expr, ['m5_calc(6 * (1 +']m5_Val['))']) /// OK - both match /// Similar, across two lines: m5_var(Expr, ['m5_calc(6 * (1 +']) /// Bad m5_append_var(Expr, m5_Val['))']) /// Bad m5_var(Open, ['(']) /// OK - paren in text context
See also, m5_open_quote
, m5_close_quote
, and
m5_pragma_[enable/disable]_paren_checks
in Pragmas.
5.7.3. Pragmas
In certain cases quote and parenthesis checking gets in the way. It is possible to disable checking and control debug behavior using pragmas. Pragmas processing happens after M5 comments are stripped. The following strings are recognized as pragmas:
-
where_am_i
: Prints the current quote context to STDERR. -
[enable/disable]_[paren/quote]_checks
: For disabling parenthesis/quote checking. -
sugar
: For disabling syntactic sugar (m5
and code/text blocks). -
[enable/disable]_debug
: Improves the readability of the file resulting from sugar processing, and continues processing after normally-fatal errors. -
[enable/disable]_verbose_checks
: Enables or disables verbose checking.
Since the pragmas would pass through to the target file, pragmas are generally expressed using the following macro calls which elaborate to nothing:
-
m5_pragma_where_am_i()
-
m5_pragma_[enable/disable]_{check}()
, where{check}
isparen_checks
,quote_checks
,sugar
,debug
, orverbose_checks
.
5.8. Literal Commas
A comma (,
) character appearing in source or text context is a "literal comma". It
can never have special meaning as an argument separator even if used to construct a string
that is evaluated as a macro call. A comma appearing in argument list or code context
is a "non-literal comma". It is expected to be evaluated as a macro argument separator, though
if never evaluated, it remains a ,
character and may pass through to the output.
Generally, comma characters will behave as expected, but, caution must be taken in situations where macro calls are constructed, then evaluated. For these rare cases, let’s consider a few examples.
Here, the commas are argument separators:
m5_foo(A, B, C)
while those within quotes (in text context), here, are literal:
m5_macro(MyList, ['A, B, C'])
and m5_foo(m5_MyList)
would receive a single parameter.
m5_macro(MyList, A\m5_arg_comma B\m5_arg_comma C)
in which case m5_foo(m5_MyList)
would receive three parameters.
In this example:
m5_macro(MyExpr, ['m5_foo(A, B, C)'])
all commas are argument separator commas. This defines m5_MyExpr()
to invoke m5_foo
with three parameters.
Below, however, m5_\foo
is not recognized a macro call (though the \
disappears), thus
the commas separating A
, B
, and C
are in text context and are literal (see Backslash Word Boundary):
m5_macro(MyExpr, ['m5_\foo(A, B, C)'])
and m5_MyExpr()
would invoke m5_foo
with a single parameter. This has the same effect as:
m5_macro(MyExpr, ['m5_foo(['A, B, C'])'])
(aside from the fact that the latter would be sugared and thus the existence of m5_foo
would be confirmed).
6. Coding Practices
6.1. Coding Conventions
6.2. Status
The variable m5_status
has a reserved usage. Some macros are defined to set m5_status
. A non-empty
value indicates that the macro did not perform its duties to the fullest. Several m5_if*
macros set non-empty
status if they do not evaluate a body.
Well-behaved macros set m5_status
always or never (and never is the assumption if no side effect is listed in a
macro’s documentation). Thus m5_status
is more like a return value than
a sticky flag. Sticky behavior can be achieved using m5_sticky_status
. There is no support for try-catch-like
error handling. In bodies of m5_macro
it may be necessary to explicitly save and restore status to avoid unintended
side-effects on m5_status
from calls within the bodies. m5_fn
does this automatically. If m5_status
is checked, it is
generally checked immediately after a call.
6.3. Functions
All but the simplest of macros are most often declared using m5_fn
and similar macros. These support a richer set of
mechanisms for defining and passing parameter. While m5_macro
is most often used with a one-line body definition,
m5_fn
is most often used with multi-line bodies as Scoped Code Blocks.
Such m5_fn
declarations using Scoped Code Blocks look and act like functions/procedures/subroutines/methods in a traditional
programming language, and we often refer to them as "functions". Function calls pass arguments into parameters. Functions'
code block bodies contain macro calls (statements) that define local variables, perform calculations, evaluate code conditionally,
iterate in loops, call other functions, recurse, etc.
Unlike typical programming languages, functions, like all macros, evaluate to text that substitutes for the calls. There is no mechanism to explicitly print to the standard output stream (though there are macros for printing to the standard error stream). Only a top-level call from the source code will implicitly echo to standard output.
Functions are defined using: m5_fn
and m5_lazy_fn
.
Declarations take the form:
m5_fn(<name>, [<param-list>,] ['<body>'])
A basic function declaration with a one-line body looks like:
m5_fn(mul, val1, val2, ['m5_calc(m5_val1 * m5_val2)'])
Or, equivalently, using a code block body:
fn(mul, val1, val2, { ~calc(m5_val1 * m5_val2) })
This mul
function is called (in source context) like:
m5_mul(3, 5) /// produces 15
6.3.1. Parameters
6.3.1.1. Parameters Types and Usage
-
Numbered parameters: Numbered parameters, as in
m5_macro
(see Declaring Macros), can be referenced as$1
,$2
, etc. with the same replacement behavior. However, they are explicitly identified in the parameter list (see The Parameter List). Within the function body, similar to['$3']
,m5_fn_arg
may also be used to access an argument. For example,m5_fn_arg(3)
evaluates to the literal third argument value. -
Special parameters: As for
m5_macro
, special parameters are supported. Note that:$@
,$*
, and$#
reflect only numbered parameters. Also,$0
will not have the expected value, however$0__
can still be used as a name prefix to localize names to this function. (See Variable Masking.) Similar to$@
, them5_fn_args
macro (or variable) also provides a quoted list of the numbered arguments. Similar to$#
, them5_fn_arg_cnt
macro also provides the number of numbered arguments. -
Named parameters: These are available locally to the body as variables. They are not available to the Aftermath of the function.
6.3.1.2. The Parameter List
The parameter list (<param-list>
) is a list of zero or more <param-spec>
s, where <param-spec>
is:
-
A parameter specification of the form:
[?][[<number>]][[^]<name>][: <comment>]
(in this order), e.g.?[2]^Name: the name of something
:-
<name>
: Name of a named parameter. -
?
: Specifies that the parameter is optional. Calls are checked to ensure that arguments are provided for all non-optional parameters or are defined for inherited parameters. Non-optional parameters may not follow optional ones. -
[<number>]
: Number of a numbered parameter. The first must be[1]
and would correspond to$1
andm5_fn_arg(1)
, and so on.<number>
is verified to match the sequential ordering of numbered parameters. Numbered parameters may also be named, in which case they can be accessed either way. -
^
: Specifies that the parameter is inherited. It must also be named. Its definition is inherited from the context of the func definition. If undefined, the empty['']
value is provided and an error is reported unless the parameter is optional, e.g.?^<name>
. There is no corresponding argument in a call of this function. It is conventional to list inherited parameters last (before the body) to maintain correspondence between the parameter list of the definition and the argument list of a call. -
<comment>
: A description of the parameter. In addition to commenting the code, this can be extracted in documentation.
-
-
…
: Listed after last numbered parameter to allow extra numbered arguments. Without this, extra arguments result in an error (except for the single empty argument of e.g.m5_foo()
. See Function Call Arguments.)
6.3.2. When To Use What Type of Parameter
For nested declarations, the use of numbered parameters ($1
, $2
, …) and special parameters
($@
, $*
, $#
, and $0
) can be extremely awkward.
Nested declarations are declarations within the bodies of other declarations. Since nested bodies are part of outer bodies,
numbered and special parameters within them would actually substitute based on the outer bodies. This can be prevented
by generating the body with macros that produce the numbered parameter references, but this requires an unnatural and bug prone use of quotes.
Therefore the use of functions with named parameters is preferred for inner macro declarations. Use of m5_fn_args
and m5_fn_arg
is
also simpler than using special parameters. If parameters are named, these are helpful primarily
to access …
arguments or to pass argument lists to other functions.
Additionally, and in summary:
-
Numbered/special parameters: These can be convenient to ensure substitution throughout the body without interference from quotes. They can, however, be extremely awkward to use in nested definitions as they would substitute with the arguments of the outer function/macro. Being unnamed, readability is an issue, especially for large functions.
-
Named parameters: These act more like typical function arguments vs. text substitution. Since they are named, they can improve readability. Unlike numbered parameters, they work perfectly well in functions defined within other functions/macros. (Similarly,
m5_fn_args
andm5_fn_arg
are useful for nested declarations.) Macros will not evaluate within quoted strings, so typical use requires unquoting, e.g.['Arg1: ']m5_arg1['.']
vs.['Arg1: $1.']
. -
Inherited parameters: These provide a more natural, readable, and explicit mechanism for customizing a function to the context in which it is defined. For example a function may define another function that is customized to the parameters of the outer function.
6.3.3. Function Call Arguments
Function calls must have arguments for all non-optional, non-inherited (^
) parameters. Arguments are positional, so misaligning arguments
is a common source of errors. There is checking, however, that required arguments are provided and that no extra arguments are given.
m5_foo()
is permitted for a function foo
declared with no parameters, though it is passed one emtpy parameter.
(m5_call(foo)
might be preferred.)
6.3.4. Function Arguments Example
In argument list context, function foo
is declared below to display its parameters.
/Context: var(Inherit2, two) /Define foo: fn(foo, Param1, ?[1]Param2: an optional parameter, ?^Inherit1, [2]^Inherit2, ..., { ~nl(Param1: m5_Param1) ~nl(Param2: m5_Param2) ~nl(Inherit1: m5_Inherit1) ~nl(Inherit2: m5_Inherit2) ~nl(['numbered args: $@']) })
And it can be called (again, in argument list context):
/Call foo: foo(arg1, arg2, extra1, extra2)
And this expands to:
Param1: arg1 Param2: arg2 Inherit1: Inherit2: two numbered args: ['arg2'],['two'],['extra1'],['extra2']
6.3.5. Aftermath
It is possible for a function to make assignments (and, actually do anything) in the calling scope.
This can be done using m5_on_return
or m5_return_status
.
This is important for:
-
passing arguments by reference
-
returning status
-
evaluating body arguments
-
tail recursion
Each of these is discussed in its own section, next.
6.3.6. Passing Arguments by Reference
Functions can pass variables by reference and make assignments to the referenced variables upon returning from the function. For example:
fn(update, FooRef, { var(Value, ['updated value']) on_return(set, m5_FooRef, m5_Value) } set(Foo, ['xxx']) update(Foo) ~Foo
A similar function could be defined to declare a referenced variable by using var
instead of set
.
The use of m5_on_return
avoids the potential masking issue that would result from:
update(Value)
6.3.7. Returning Status
A function’s m5_status
should be returned via the function’s aftermath, using m5_return_status
, e.g.
fn(my_fn, Val, { if(m5_Val > 10, ['']) return_status(m5_status) })
Functions automatically restore m5_status
after body evaluation to its value prior to body evaluation, so
the evaluation of the body has no impact on m5_status
. Aftermath is evaluated after this.
It is fine to call m5_return_status
multiple times. Only the last call will have a visible effect.
6.3.8. Functions with Body Arguments
The example below illustrates a function if_neg
that takes an argument that is a body to evaluate.
The body is defined in a calling function, e.g. my_fn
on lines 15-16. Such a body is expected to evaluate
in the context of the calling function, my_fn
. Its assignment of Neg
, on line 15, should be an assignment of
its own local Neg
, declared on line 12. Its side effects from m5_return_status
on
line 15 should be side effects of my_fn
.
If the body is evaluated inside the function body, its side effects would be side effects of if_neg
,
not my_fn
. The body should instead be evaluated as aftermath, using m5_on_return
, as on line 6.
Note that m5_return_status
is called after evaluating m5_Body
. Both m5_on_return
and m5_return_status
add to the Aftermath of the function, and m5_status
must be set after evaluating the body (which
could affect m5_status
).
Example of a body argument.
1: // Evaluate a body if a value is negative. 2: fn(if_neg, Value, Body, { 3: var(Neg, m5_calc(Value < 0)) 4: ~if(Neg, [ 5: /~eval(m5_Body) 6: on_return(Body) 7: ]) 8: return_status(if(Neg, [''], else)) 9: }) 10: 11: fn(my_fn, { 12: var(Neg, ['']) 13: return_status(['pos']) 14: ~if_neg(1, [ 15: return_status(['neg']) 16: set(Neg, ['-']) 17: ]) 18: ... 19: })
6.3.9. Tail Recursion
Recursive calls tend to grow the stack significantly, and this can result in an error (see m5_recursion_limit
) as well
inefficiency. When recursion is the last act of the function ("tail recursion"), the recursion can be performed in
aftermath to avoid growing the stack. For example:
fn(my_fn, First, ..., { ... ~unless(m5_Done, [ ... on_return(my_fn\m5_comma_args()) ]) ... })
6.4. Coding Paradigms, Patterns, Tips, Tricks, and Gotchas
6.4.1. Variable Masking
Variable "masking" is an issue that can arise when a macro has side effects determined by its arguments. For example, an argument might specify the name of a variable to assign, or an argument might provide a body to evaluate that could declare or assign arbitrary variables. If the macro declares a local variable, and the side effect updates a variable by the same name, the local variable may inadvertently be the one that is updated by the side effect. This issue is addressed differently depending how the macro is defined. Note that using function Aftermath is the preferred method, but all options are listed here for completeness:
-
Functions: Set variables using Aftermath. Using functions for variable-setting macros is preferred.
-
Macros declaring their body using a code block: Set variable using
m5_out_eval
. -
Macros declaring their body using a string: Push/pop local variables named using
$0__
prefix.
7. Macro Library
This section documents the macros defined by the M5 1.0 library. Some macros documented here are necessary to enable inclusion of this library and are, by necessity, built-into the language. This distinction may not be documented.
7.1. Specification Conventions
Macros are listed by category in a logical order. An alphabetical Index of macros can be found at the end of
this document (at least in the .pdf
version).
Macros that return integer values, unless otherwise specified, return decimal value strings. Similarly,
macro arguments that are integer values accept decimal value strings. Boolean inputs and outputs use
0
and 1
. Behavior for other argument values is undefined if unspecified.
Resulting output text is, by default, literal (quoted). Macros named with a _eval
suffix generally result
in text that gets evaluated.
7.2. Assigning and Accessing Macros/Variables
7.2.1. Declaring/Setting Variables
Description: |
Declare a scoped variable. See Variables. |
Side Effect(s): |
the variable is defined |
Parameter(s): |
|
Example(s): |
var(Foo, 5) |
See also: |
Description: |
Set the value of a scoped variable. See Variables. |
Side Effect(s): |
the variable’s value is set |
Parameter(s): |
|
Example(s): |
set(Foo, 5) |
See also: |
Description: |
Declare a variable that must be explicitly popped. |
Side Effect(s): |
the variable is defined |
Parameter(s): |
|
Example(s): |
push_var(Foo, 5) ... pop(Foo) |
See also: |
Description: |
Pop a variable or traditional macro declared using |
Side Effect(s): |
the macro is popped |
Parameter(s): |
|
Example(s): |
push_var(Foo, 5) ... pop(Foo) |
See also: |
Description: |
Declare variables with empty values. |
Side Effect(s): |
the variables are declared |
Parameter(s): |
|
7.2.2. Declaring Macros
Description: |
Declare a function. For details, see Functions. |
Side Effect(s): |
the function is declared |
Parameter(s): |
|
Example(s): |
fn(add, Addend1, Addend2, { ~calc(Addend1 + Addend2) }) |
See also: |
Description: |
Declare a scoped macro. See Declaring Macros. A null macro must produce no output. |
Side Effect(s): |
the macro is declared |
Parameter(s): |
|
Example(s): |
m5_macro(ParseError, <p>[ error(['Failed to parse $<p>1.']) ]) |
See also: |
Description: |
Set the value of a scoped(?) macro. See Declaring Macros. Using this macro is rare. |
Side Effect(s): |
the macro value is set |
Parameter(s): |
|
See also: |
Description: |
Push a new value of a macro that must be explicitly popped. Using this macro is rare. |
Side Effect(s): |
the macro value is pushed |
Parameter(s): |
|
See also: |
7.2.3. Accessing Macro/Variable Values
Output: |
the value of a variable without |
Parameter(s): |
|
Example(s): |
var(OneDollar, ['$1.00']) get(OneDollar) |
Example Output: |
$1.00 |
See also: |
Description: |
Ensure that the |
Parameter(s): |
|
7.3. Code Constructs
7.3.1. Status
Description: |
This universal variable is set as a side-effect of some macros to indicate an exceptional
condition or non-evaluation of a body argument. It may be desirable to check this condition
after calling such macros. Macros, like |
See also: |
Description: |
Used by the |
See also: |
Description: |
Used to capture the first non-empty status of multiple macro calls. |
Side Effect(s): |
|
Example(s): |
if(m5_A >= m5_Min, ['']) sticky_status() if(m5_A <= m5_Max, ['']) sticky_status() if(m5_reset_sticky_status(), ['m5_error(m5_A is out of range.)']) |
See also: |
Description: |
Tests and resets |
Output: |
[ |
Side Effect(s): |
|
See also: |
7.3.2. Conditionals
Description: |
An if/else construct. The condition is an expression that evaluates using |
Output: |
the output of the evaluated body |
Side Effect(s): |
status is set, empty iff a block was evaluated; side-effects of the evaluated body |
Parameter(s): |
|
Example(s): |
~if(m5_eq(m5_Ten, 10) && m5_Val > 3, [ ~do_something(...) ], m5_Val > m5_Ten, [ ~do_something_else(...) ], [ ~default_case(...) ]) |
See also: |
Description: |
An if/else construct where each condition is a comparison of an independent pair of strings.
The first block is evaluated if the strings match (for
|
||
Output: |
the output of the evaluated body |
||
Side Effect(s): |
status is set, empty iff a body was evaluated; side-effects of the evaluated body |
||
Parameter(s): |
|
||
Example(s): |
~if_eq(m5_Zero, 0, [ ~zero_is_zero(...) ], m5_calc(m5_Zero < 0), 1, [ ~zero_is_negative(...) ], [ ~zero_is_positive(...) ]) |
||
See also: |
m5_if_null(Var, Body, ElseBody)
m5_if_var_def(Var, Body, ElseBody)
m5_if_var_ndef(Var, Body, ElseBody)
m5_if_defined_as(Var, Value, Body, ElseBody)
Description: |
Evaluate |
Output: |
the output of the evaluated body |
Side Effect(s): |
status is set, empty iff a body was evaluated; side-effects of the evaluated body |
Parameter(s): |
|
Example(s): |
if_null(Tag, [ error(No tag.) ]) |
See also: |
Description: |
Likely following a macro that sets |
Output: |
the output of the evaluated body |
Side Effect(s): |
status is set, empty iff a body was evaluated; side-effects of the evaluated body |
Parameter(s): |
|
Example(s): |
~if(m5_Cnt > 0, [ decrement(Cnt) ]) else([ ~(Done) ]) |
See also: |
Description: |
Evaluate |
Output: |
the output of the evaluated body |
Side Effect(s): |
status is set, empty iff a body was evaluated; side-effects of the evaluated body |
Parameter(s): |
|
Example(s): |
m5_set(Either, if_var_def(First, m5_First)m5_else_if_def(Second, m5_Second)) |
See also: |
Description: |
Similar to |
Output: |
the output of the evaluated body |
Side Effect(s): |
status is set, empty iff a block was evaluated; side-effects of the evaluated body |
Parameter(s): |
|
Example(s): |
~case(Response, ok, [ ~ok_response(...) ], bad, [ ~bad_response(...) ], [ error(Unrecognized response: m5_Response) ]) |
See also: |
7.3.3. Loops
Description: |
A generalized loop construct. Implicit variable |
Output: |
output of the blocks |
Side Effect(s): |
side-effects of the blocks |
Parameter(s): |
|
Example(s): |
~loop((MyVar, 0), [ ~do_stuff(...) ], m5_LoopCnt < 10, [ ~do_more_stuff(...) ]) |
See also: |
Description: |
Evaluate a block a predetermined number of times. Implicit variable |
Output: |
output of the block |
Side Effect(s): |
side-effects of the block |
Parameter(s): |
|
Example(s): |
~repeat(10, [ ~do_stuff(...) ]) //{empty}/ Iterates m5_LoopCnt 0..9. |
See also: |
Description: |
Evaluate a block for each item in a listed. Implicit variable |
Output: |
output of the block |
Side Effect(s): |
side-effects of the block |
Parameter(s): |
|
Example(s): |
~for(fruit, ['apple, orange, '], [ ~do_stuff(...) ]) //{empty}/ (also maintains m5_LoopCnt) |
See also: |
7.3.4. Recursion
Description: |
Call a macro recursively to a given maximum recursion depth. Functions have a built-in recursion limit, so this is only useful for macros. |
Output: |
the output of the recursive call |
Side Effect(s): |
the side effects of the recursive call |
Parameter(s): |
|
Example(s): |
m5_recurse(20, myself, args) |
See also: |
7.4. Working with Strings
7.4.1. Special Characters
Description: |
Produce a new-line. Programmatically-generated output should always use this macro (directly or indirectly) to produce new-lines, rather than using an actual new-line in the source file. Thus the input file formatting can reflect the code structure, not the output formatting. |
Output: |
a new-line |
Description: |
Produce an open or close quote. These should rarely (never?) be needed and should be used with extra caution since they can create undetected imbalanced quoting. The resulting quote is literal, but it will be interpreted as a quote if evaluated. |
Output: |
the literal quote |
See also: |
Output: |
A macro argument separator comma. |
See also: |
Description: |
Produce |
Output: |
the literal quote |
See also: |
Description: |
Produce the single unicode character used to represent |
Output: |
the printable quote |
See also: |
Description: |
A unique untypeable value indicating that no assignment has been made. This is not used by any standard macro, but is available for explicit use. |
Output: |
the value indicating "undefined" |
Example(s): |
m5_var(Foo, m5_UNDEFINED) m5_if_eq(Foo, m5_UNDEFINED, ['['Foo is undefined.']']) R: Foo is undefined. |
7.4.2. Slicing and Dicing Strings
m5_append_var(Name, String)
m5_prepend_var(Name, String)
m5_append_macro(Name, String)
m5_prepend_macro(Name, String)
Description: |
Append or prepend to a variable or macro. (A macro evaluates its context; a variable does not.) |
Parameter(s): |
|
Example(s): |
m5_var(Hi, ['Hello']) m5_append_var([', ']m5_Name['!']) m5_Hi |
Example Output: |
Hello, Joe! |
Description: |
Extract a substring from Extracting substrings from strings with quotes is dangerous as it can lead to imbalanced quoting.
If the resulting string would contain any quotes, an error is reported suggesting the use of Extracting substrings from UTF-8 strings (supporting unicode characters) is also dangerous. M5 treats characters as bytes and UTF-8 characters can use multiple bytes, so substrings can split UTF-8 characters. Such split UTF-8 characters will result in bytes/M5-characters that have no special treatment in M5. They can be rejoined to reform valid UTF-8 strings. When evaluating substrings, care must be taken with
|
Output: |
the substring or its evaluation |
Parameter(s): |
|
Example(s): |
m5_substr(['Hello World!'], 3, 5) |
Example Output: |
lo Wo |
See also: |
Output: |
the arguments, delimited by the given delimiter string |
Parameter(s): |
|
Example(s): |
m5_join([', '], ['one'], ['two'], ['three']) |
Example Output: |
one, two, three |
Description: |
Transliterate a string, providing a set of character-for-character substitutions (where a character
is a unicode byte). |
Output: |
the transliterated string (or its evaluation for |
Side Effect(s): |
for |
Parameter(s): |
|
Example(s): |
m5_translit(['Testing: 1, 2, 3.'], ['123'], ['ABC']) |
Example Output: |
Testing: A, B, C. |
Description: |
Convert upper-case ASCII characters to lower-case. |
Output: |
the converted string |
Parameter(s): |
|
Example(s): |
m5_uppercase(['Hello!']) |
Example Output: |
HELLO! |
Description: |
Replicate a string the given number of times. (A non-evaluating version of |
Output: |
the replicated string |
Parameter(s): |
|
Example(s): |
m5_replicate(3, ['.']) |
Example Output: |
... |
See also: |
Description: |
Strip trailing whitespace from the given variable. |
Side Effect(s): |
the variable is updated |
Parameter(s): |
|
7.4.3. Formatting Strings
Description: |
Produce formatted output, much like the C From the M4 Manual,
Items not supported include positional arguments, the For more details on the functioning of |
Output: |
the formatted string |
Parameter(s): |
|
Example(s): |
1: m5_var(Foo, Hello) m5_format_eval(`String "%s" uses %d chars.', Foo, m5_length(Foo)) 2: m5_format_eval(`%*.*d', `-1', `-1', `1') 3: m5_format_eval(`%.0f', `56789.9876') 4: m5_length(m5_format(`%-*X', `5000', `1')) 5: m5_format_eval(`%010F', `infinity') 6: m5_format_eval(`%.1A', `1.999') 7: m5_format_eval(`%g', `0xa.P+1') |
Example Output: |
1: String "Hello" uses 5 chars. 2: 1 3: 56790 4: 5000 5: INF 6: 0X2.0P+0 7: 20 |
7.4.4. Inspecting Strings
Output: |
the length of a string in ASCII characters (unicode bytes) |
Parameter(s): |
|
Output: |
the position in a string in ASCII characters (unicode bytes) of the first occurence of a given substring or -1 if not present, where the string starts with character zero |
Parameter(s): |
|
Output: |
the number of new-lines in the given string |
Parameter(s): |
|
Description: |
Evaluate |
Output: |
output from |
Side Effect(s): |
side-effects of |
Parameter(s): |
|
7.4.5. Safely Working with Strings
Description: |
For strings that may contain quotes, working with substrings can lead to imbalanced quotes
and unpredictable behavior. |
Output: |
dequoted or requoted string |
Parameter(s): |
|
Output: |
the given string with quotes, surrogate quotes and printable quotes replaced by their original format (['']) |
Parameter(s): |
|
See also: |
Description: |
Assert that the given string contains no quotes. |
Parameter(s): |
|
7.4.6. Regular Expressions
Regular expressions in M5 use the same regular expression syntax as GNU Emacs. (See GNU Emacs Regular Expressions.) This syntax is similar to BRE, Basic Regular Expressions in POSIX and is regrettably rather limited. Extended Regular Expressions are not supported.
Description: |
Searches for
|
Output: |
If If The resulting text is literal for |
Side Effect(s): |
|
Parameter(s): |
|
Example(s): |
m5_regex_eval(['Hello there'], ['\w+'], ['First word: m5_translit(['\&']).']) |
Example Output: |
First word: Hello. |
See also: |
Description: |
Declare variables assigned to subexpressions of a regular expression. |
Side Effect(s): |
|
Parameter(s): |
|
Example(s): |
m5_var_regex(['mul A, B'], ['^\(\w+\)\s+\(w+\),\s*\(w+\)$'], (Operation, Src1, Src2)) m5_if_so(['m5_DEBUG(Matched: m5_Src1[','] m5_Src2)']) m5_else(['m5_error(['Match failed.'])']) |
See also: |
Description: |
For chaining |
Output: |
output of the matching body |
Side Effect(s): |
|
Parameter(s): |
|
Example(s): |
~if_regex(m5_Instruction, ['^mul\s+\(w+\),\s*\(w+\)$'], (Src1, Src2), [ ~calc(m5_Src1 * m5_Src2) ], ['^incr\s+\(w+\)$'], (Src1), [ ~calc(m5_Src1 + 1) ]) |
See also: |
Description: |
Evaluate body for every pattern matching regex in the string. |
Side Effect(s): |
side-effects of evaluating the body |
Parameter(s): |
|
Example(s): |
m5_for_each_regex(H1dd3n D1git5, ['\([0-9]\)'], (Digit), ['Found m5_Digit. ']) |
Example Output: |
Found 1. Found 3. Found 1. Found 5. |
See also: |
7.5. Utilities
7.5.1. Fundamental Macros
Output: |
the M4 definition of a macro; note that the M4 definition is slightly different from the M5 definition |
Parameter(s): |
|
Description: |
Call a macro. Versus directly calling a macro, this indirect mechanism has two primary uses. First it provides a consistent syntax for calls with zero arguments as for calls with a non-zero number of arguments. Second, the macro name can be constructed. |
Output: |
the output of the called macro |
Side Effect(s): |
the side-effects of the called macro |
Parameter(s): |
|
Example(s): |
m5_call(error, ['Fail!']) |
See also: |
Output: |
a comma-separated list of quoted arguments, i.e. |
Parameter(s): |
|
Example(s): |
m5_quote(A, ['B']) |
Example Output: |
['A'],['B'] |
See also: |
Output: |
the arguments within the given number of quotes, the innermost applying individually to
each argument, separated by commas. A |
Parameter(s): |
|
Example(s): |
1: m5_nquote(3, A, ['m5_nl']) 2: m5_nquote(3, m5_nquote(0, A, ['m5_\nl'])xx) |
Example Output: |
1: ['['['A'],['m5_\nl']']'] 2: ['['['A'],['m5_\nlxx']']'] |
See also: |
Description: |
Evaluate the argument. |
Output: |
the result of evaluating the argument |
Side Effect(s): |
the side-effects resulting from evaluation |
Parameter(s): |
|
Example(s): |
1: m5_eval(['m5_calc(1 + 1)']) 2: m5_eval(['m5'])_calc(1 + 1) |
Example Output: |
1: 2 2: m5_calc(1 + 1) |
Output: |
nothing at all; used to provide a comment (though [comments] are preferred) or to discard the result of an evaluation |
Parameter(s): |
|
7.5.2. Manipulating Macro Stacks
See Macro Stacks.
Output: |
|
Parameter(s): |
|
Example(s): |
*{ var(Foo, A) var(Foo, B) ~get_ago(Foo, 1) ~get_ago(Foo, 0) } |
Example Output: |
AB |
Output: |
the number of values on a variable’s stack |
Parameter(s): |
|
Example(s): |
m5_depth_of(Foo) m5_push_var(Foo, A) m5_depth_of(Foo) |
Example Output: |
0 1 |
7.5.3. Argument Processing
Description: |
Removes the first argument. |
Output: |
a list of remaining arguments, or |
Side Effect(s): |
none |
Parameter(s): |
|
Example(s): |
m5_foo(m5_shift($@)) /// $@ has at least 2 arguments m5_call(foo['']m5_comma_shift($@)) /// $@ has at least 1 argument |
Output: |
the number of arguments given (useful for variables that contain lists) |
Parameter(s): |
|
Example(s): |
m5_set(ExampleList, ['hi, there']) m5_nargs(m5_ExampleList) |
Example Output: |
2 |
Output: |
the nth of the given |
Parameter(s): |
|
Example(s): |
m5_argn(2, a, b, c) |
Example Output: |
b |
Description: |
Convert a quoted argument list to a list of arguments with a preceding comma. This is necessary to properly work with argument lists that may contain zero arguments. |
Parameter(s): |
|
Example(s): |
m5_call(foo['']m5_comma_args(['$@']), last) |
See also: |
Description: |
For rather pathological use illustrated in the example, … |
Output: |
the argument list ( |
Parameter(s): |
|
Example(s): |
m5_macro(append_to_paren_list, ['m5_echo_args$1, ${empty}2']) m5_append_to_paren_list((one, two), three) |
Example Output: |
(one,two,three) |
7.5.4. Arithmetic Macros
Description: |
Calculate an expression. Calculations are done with 32-bit signed integers. Overflow silently results in wraparound. A warning is issued if division by zero is attempted, or if the expression could not be parsed. Expressions can contain the following operators, listed in order of decreasing precedence.
All binary operators, except exponentiation, are left-associative. Exponentiation is right-associative. Immediate values in
Digits are For the relational operators, a true relation returns 1, and a false relation return 0. |
Output: |
the calculated value of the expression in the given |
Parameter(s): |
|
Example(s): |
1: m5_calc(2**3 <= 4) 2: m5_calc(-0xf, 2, 8) |
Example Output: |
1: 0 2: -00001111 |
Description: |
Set a variable to the result of an arithmetic expression computed by |
Side Effect(s): |
the variable is set |
Parameter(s): |
|
Example(s): |
m5_equate(Foo, 1+2) m5_operate_on(Foo, * (3-1)) m5_Foo |
Example Output: |
6 |
See also: |
Description: |
Increment/decrement a variable holding an integer value by one or by the given amount. |
Side Effect(s): |
the variable is updated |
Parameter(s): |
|
Example(s): |
m5_increment(Cnt) |
See also: |
7.5.5. Boolean Macros
These have boolean (0
/ 1
) results. Note that some m5_calc
expressions result in boolean values as well.
Output: |
[ |
Parameter(s): |
|
Output: |
[ |
Parameter(s): |
|
Example(s): |
m5_if(m5_neq(m5_Response, ok, bad), ['m5_error(Unknown response: m5_Response.)']) |
7.5.6. Within Functions or Code Blocks
Description: |
|
Output: |
|
Side Effect(s): |
none |
Example(s): |
m5_foo(1, m5_fn_args()) /// works for 1 or more fn_args m5_foo(1['']m5_comma_fn_args()) /// works for 0 or more fn_args |
See also: |
Description: |
Access a function argument by position from |
Output: |
the argument value. |
Parameter(s): |
|
See also: |
Description: |
The number of arguments in |
Output: |
the argument value. |
See also: |
Description: |
These append to code block output that is expanded after the evaluation of the block. |
Output: |
no direct output, though, since these indirectly result in output as a side-effect, it is recommended to use |
Side Effect(s): |
indirectly, |
Parameter(s): |
|
See also: |
Description: |
Provide return status. (Shorthand for |
Side Effect(s): |
sets |
Parameter(s): |
|
See also: |
Parameter(s): |
|
7.6. Checking and Debugging
Description: |
Get or set the debug level. |
Output: |
with zero arguments, the current debug level |
Side Effect(s): |
sets |
Parameter(s): |
|
Example(s): |
debug_level(max) use(m5-1.0) |
7.6.1. Checking and Reporting to STDERR
These macros output text to the standard error output stream (STDERR) (with ['
/ ']
quotes represented by single characters).
(Note that STDOUT is the destination for the evaluated output.)
Description: |
Write to STDERR stream (with a trailing new-line for |
Parameter(s): |
|
Example(s): |
m5_errprint_nl(['Hello World.']) |
Description: |
Report an error/warning/debug message and stack trace (except for |
Parameter(s): |
|
Example(s): |
m5_error(['Parsing failed.']) |
m5_warning_if(condition, message)
m5_error_if(condition, message)
m5_fatal_error_if(condition, message)
m5_DEBUG_if(condition, message)
Description: |
Report an error/warning/debug message and stack trace (except for |
Parameter(s): |
|
Example(s): |
m5_error_if(m5_Cnt < 0, ['Negative count.']) |
Description: |
Assert that a condition is true, reporting an error if it is not, e.g. |
Parameter(s): |
|
Example(s): |
m5_assert(m5_Cnt < 0) |
m5_verify_min_args(Name, Min, Actual)
m5_verify_num_args(Name, Min, Actual)
m5_verify_min_max_args(Name, Min, Max, Actual)
Description: |
Verify that a traditional macro has a minimum number, a range, or an exact number of arguments. |
Parameter(s): |
|
Example(s): |
m5_verify_min_args(my_fn, 2, $#) |
7.6.2. Uncategorized Debug Macros
Description: |
If the function call stack exceeds this value, a fatal error is reported. |
Description: |
For reporting messages containing argument lists, abbreviate long arguments and/or a long argument list by replacing long input args and remaining arguments beyond a limit with ['…']. |
Output: |
a quoted string of quoted args with a comma preceding every arg. |
Parameter(s): |
|
Example(s): |
m5_abbreviate_args(5, 15, $@) |
8. Reference Card
M5 processes the following syntaxes:
Feature | Reference | Syntax | Contexts | Must Be Evaluated? |
---|---|---|---|---|
M5 comments |
|
All |
N/A |
|
Quotes |
|
All |
Yes |
|
Macro calls |
e.g. |
All (except as code statement) |
Yes |
|
Numbered/special parameters |
|
Within outermost macro/function (not var) definition body |
N/A |
|
Escapes |
|
All |
|
The contexts listed under the "Contexts" column of Core Syntax are described in Contexts. "Yes" in the "Must Be Evaluated" column indicates that the syntax should not pass through to the output without being evaluated. Doing so may result in an error or unexpected output text.
Additionally, text and code block syntax is recognized when special quotes are opened at the end of a line or closed at the beginning of a line. See Code Blocks. For example:
/Report error. error(*<blk>{ ~(['Something went wrong!']) })
Block syntax incudes:
Feature | Reference | Syntax | Contexts |
---|---|---|---|
Code block quotes |
|
M5, code |
|
Text block quotes |
|
M5, code |
|
Evaluate Blocks |
|
M5, code |
|
Statement comment |
|
Code |
|
Statement with no output |
|
Code |
|
Code block statement with output |
|
Code |
|
Code block output |
|
Code |
All syntax in Block Syntax must be evaluated (strictly, must not be unevaluated).
Though not essential, block labels can be used to improve maintainability and performance in extreme cases.
Feature | Reference | Syntax | Contexts | Must Be Evaluated |
---|---|---|---|---|
Named blocks |
|
M5, Code |
Yes |
|
Quote escape |
|
All (within any type of M5 quotes) |
|
|
Labeled number/special parameter reference |
|
All (within corresponding block) |
N/A |
Many macros accept arguments with syntaxes of their own, defined in the macro definition. Functions, for example are fundamental. See Functions.