Introduction
AZ65 is a powerful but simple assembler for the Zilog Z80, MOS 6502, and Sharp LR35902 (sm83 / gbz80) architectures. In this book you'll learn the ins-and-outs of the assembler and its advanced meta-programming capabilities.
Note that this book only covers using AZ65 and does not cover assembly in general.
For assembly language references see:
- z80.info for lots of z80 resources.
- 6502.org for plenty of 6502 tutorials.
- gbdev.io for resources on sm83/gbz80 programming
Command Line Interface
All the functionality of AZ65 is in the az65
binary.
Assembling Files
Pass the target CPU architecture and the name of an assembly file to assemble it:
az65 6502 code.asm
Architectures
6502
for MOS 6502z80
for Zilog Z80sm83
for the Sharp LR35902 (a.k.a gbz80)
By default, az65
will write the assembled program to stdout
. You can direct
this to a file using the >
operator:
az65 6502 code.asm > code.bin
You can alternatively use the -o
option to pick an output file:
az65 z80 code.asm -o code.bin
Search Paths
AZ65 supports specifying search paths for locating files that are referenced
in your code. Pass search paths by repeatedly using the -I
option.
az65 sm83 code.asm -I macros -I data > code.bin
When a file is referenced by name it will be searched for in the same directory as the currently assembled file and if not found each include path will be checked in the order given.
Expressions
Number Formats
Binary
Binary numbers in AZ65 use standard 6502 binary number syntax. They are prefixed
with a modulus (%
). For example:
%00001111
%1010
%0
Hexadecimal
Like binary numbers, standard syntax is used. They are prefixed with a
dollar-sign ($
) and are case-insensitive. For example:
$1234
$DADcafe
$0
Decimal
Numbers without a %
or $
prefix are assumed to be decimal (base 10) numbers.
Operators
All expressions in AZ65 operate on 32-bit signed integers with wrapping over/underflow semantics. All operators and their precedence match that of the C language with a few notable modifications:
- There is no C binary comma (
,
) operator. It is mostly an anachronism that many C programmers aren't even aware exists. - The unary
<
and>
operators, common in 6502 assembly, are present. They are used to get the low and high byte of a 16-bit word. For example:< $1234
evaluates to$34
.> $1234
evaluates to$12
.
- There is a unary
+
operator. This is mainly used to disambiguate between expressions and memory locations in some assembly languages. For example, in z80 assembly the instructionld a, ($42)
is ambiguous. A programmer may intend for this to load the value$42
intoa
, but AZ65 will interpret this is loading a byte at address$0042
intoa
. To add clarity, you can use a unary+
to indicate that you are passing a numeric expression rather than an address:ld a, +($42)
- Unsigned (logical) shift operators are provided. Use the
<<<
and>>>
symbols to shift left and right respectively:$ffffffff >>> 1
evaluates to$7fffffff
$ffffffff <<< 1
evaluates to$fffffffe
Strings
All strings in AZ64 are UTF-8 encoded. They are written enclosed in double
quotes ("
):
"Hello World"
"not a number: 1234"
"Howdy, cowboy 🤠"
Use C-style escape sequences to write special characters inside a string:
"line break: \n"
"tab: \t"
"double-quote: \""
Multi-line strings can be written by placing a backslash immediately before the line break:
"multi\
line\
string"
To encode a byte directly, place a backslash before a hexadecimal number:
"capital Q: \$51"
Multicharacter Literals
AZ65 also supports the multicharacter literal that is present in C.
You can specify big-endian 32-bit values as a sequence of 1 to 4 ASCII
characters enclosed in single quotes ('
):
'a'
'yo'
'test'
Labels
There are 2 types of labels in AZ65:
Global Labels
Global labels are labels as you'd normally expect them in an assembler. They are alphanumeric tokens that are used to name addresses and constants.
GlobalLabel:
jr GlobalLabel
Note that the use of colons (:
) is optional.
Local Labels
Local labels are labels defined within the "scope" of
a global label-- that is labels that are defined after a
global label in your code. They look like global labels
but begin with a dot (.
):
GlobalLabel:
nop
.LocalLabel:
jr .LocalLabel
Local labels are really just syntactic sugar for writing longer fully-qualified labels. The example above is equivalent to this:
GlobalLabel:
nop
GlobalLabel.LocalLabel:
jp GlobalLabel.LocalLabel
This means that two global labels can have local labels with the same name and they will not conflict with each other.
It also means you can always refer to a local label by its full name. When written this way, they are referred to as "direct" labels.
Simple Directives
In assembly languages, "directives" are special commands that are used to
control the behavior of the assembler. AZ65 directives are special tokens that
begin with an at-sign (@
).
There are two kinds of directives in AZ65: "simple" and "macro-like" directives. We'll cover the simple directives first since they are just like directives found in most other assemblers.
@echo
We'll start with the @echo
directive since it is very useful for debugging
and demonstrating future directives.
The @echo
directive takes a single string or expression argument and prints it
to stderr
.
Examples
@echo "Hello World"
Note that expressions are always printed in base 10.
@echo $1234 + $5678 ; Prints "26796"
Methods of constructing strings out of expressions and printing in other bases will be covered later.
@here
As AZ65 assembles your code, a virtual program counter keeps track of the 16-bit address of every instruction and label you write.
You access the current address in expressions using the @here
directive.
Examples
@echo @here ; Prints "0"
nop
nop
@echo @here ; Prints "2"
nop
nop
jmp @here - 2 ; Jumps 2 bytes back
@org
@org
is the compliment to @here
. It sets the virtual program
counter to a new 16-bit value.
Examples
@org $8000
@echo @here ; Prints "32768"
@defn
Use @defn
to define constants. This works similarly to EQU
found in most
other assemblers.
@defn
takes two arguments: a label, followed by an expression.
Examples
@defn SCALE_FACTOR, 2
@echo 11 * SCALE_FACTOR ; Prints "22"
Global1:
@defn .LOCAL_CONSTANT, 2
Global2:
@defn .LOCAL_CONSTANT, 4
@echo Global1.LOCAL_CONSTANT ; Prints "2"
@echo Global2.LOCAL_CONSTANT ; Prints "4"
@defn Global.LOCAL_CONSTANT, 9
Global:
@echo .LOCAL_CONSTANT ; Prints "9"
@defl
Use @defl
to define labels that reference addresses.
Prefer using @defl
instead of @defn
for defining things such
as addresses in RAM. The assembler will treat @defl
values as addresses for
example when generating debugging information.
@defl
takes two arguments: a label, followed by an expression.
Examples
@defl VRAM, $2000
@defl VRAM.palette, VRAM + $0100
@org $8000
ldx #$10
lda #$ff
sta VRAM.palette, x
@redefn
@defn
will not let you change the value of a constant once it is
defined. They are immutable.
Though there may be some circumstances you may want to do this. You can use
@redefn
to achieve this.
Examples
@defn CONST, 42
@defn CONST, 32 ; Will result in an ERROR!
@echo "Success!"
@defn CONST, 42
@redefn CONST, 32
@echo "Success!"
@redefl
@defl
will not let you change the value of a label once it is
defined. They are immutable.
Though there may be some circumstances you may want to do this. You can use
@redefl
to achieve this.
Examples
@defl ADDR, $4000
@defl ADDR, $2000 ; Will result in an ERROR!
@echo "Success!"
@defl ADDR, $4000
@redefl ADDR, $2000
@echo "Success!"
@undef
Use @undef
to "undefine" a label or constant.
Examples
@defn TEST, 42
@echo TEST ; Prints "42"
@undef TEST
@echo TEST ; Will error due to expression not being solvable
@assert
The @assert
directive is also invaluable for debugging and validating
invariants in your code.
Examples
It is invoked with an expression argument and optional string message:
@assert 2 + 2 == 4
@assert 1, "cannot fail!"
Assertions support lazy evaluation. That means you can write assertions that reference labels that are defined later in your code:
@assert SubRoutine.length == 4
SubRoutine:
nop
nop
nop
rts
@defn .length, @here - SubRoutine
@die
@die
is a convenience directive used to terminate
the assembler with an error message.
Examples
@die "Assembler will not continue..."
The error message will be printed on stderr
.
@db
@db
is used to define bytes. It along with its siblings:
@dw
and @ds
can have special meaning
depending on where and when they are used.
Examples
Most commonly, you'll use @db
to define strings in the CODE
segment of your binary:
@db "hello"
However you can also define sequences of bytes:
@db $42, $43, $44, $45
Or mix and match:
@db "Hello World", $a, "This is a test", $42
Within the ADDR
segment, @db
simply increments the program
counter by 1.
@segment "ADDR"
@org 0
location:
@db
location2:
@db
@assert location2 == 1
@dw
@dw
is used to define 16-bit "words". It along with its siblings:
@db
and @ds
can have special meaning
depending on where and when they are used.
Examples
Most commonly, you'll use @dw
to define words in the CODE
segment of your binary:
@dw $1234
Like with @db
, you can define a sequence of words:
@dw $1234, $5678
Within the ADDR
segment, @dw
simply increments the program
counter by 2.
@segment "ADDR"
@org 0
location:
@dw
location2:
@dw
@assert location2 == 2
@ds
@ds
is used to define space, or really a contiguous sequence of bytes.
It along with its siblings: @db
and @dw
can have special meaning depending on where and when they are used.
Examples
Most commonly, you'll use @ds
to define strings in the CODE
segment of your binary:
@ds 8 ; Will add 8 bytes of $00 to the output.
However you can also define your space to have an initial value:
@db 16, $ff ; Will add 16 bytes of $ff to the output.
Examples
Within the ADDR
segment, @ds
simply increments the program
counter by a fixed amount.
@segment "ADDR"
@org 0
location:
@ds 7
location2:
@ds 1
@assert location2 == 7
This works similarly in structs where it can be used to insert padding:
@struct MyStruct
@ds 2
field 1
@endstruct
@assert MyStruct.field == 2
@segment
When AZ65 is parsing your assembly it is running in one of two segment modes:
CODE
By default, AZ65 is running in the CODE
segment mode. This means that
AZ65 is generating normal code.
For example when assembling z80 code and the assembler encounters:
@org 0
LoadAWith42:
ld a, $42
It will emit the bytes 3E 42
. And add an entry in the assembler's symbol
table for the value LoadAWith42
set to the value 0
(the value of the program
counter).
ADDR
However, if you switch to the ADDR
segment mode the assembler is much
more restrictive in what it accepts.
In the following example we add the directive @segment
with a value of
"ADDR"
to the top of our code:
@segment "ADDR"
@org 0
LoadAWith42:
ld a, $42
This will result in an ERROR indicating that instructions are not
allowed in the ADDR
segment.
In this mode, you are free to define labels and constants, but
cannot generate any code. All directives work as normal besides
the @db
, @dw
, and @ds
directives. In ADDR
mode they only
increment the program counter.
The ADDR
segment is meant specifically for defining memory addresses
in RAM. This is useful in systems where your code is stored in ROM,
usually game consoles like the Nintendo Entertainment System or Game Boy.
For example, the Game Boy memory map defines RAM as starting at $C000
.
Rather than manually using @defl
to manually define memory locations
for variables, you can let the assembler assign addresses:
@segment "ADDR"
org $C000
WRAM0:
.PlayerX: @db
.PlayerY: @db
.PlayerHealth: @db
With this, you can easily define, move, and reference variables in RAM bank 0.
@if
The @if
directive is used to essentually turn blocks of code
ON or OFF.
@if 0
@die "Ignored!"
@endif
An @if
directive takes a single expression argument. If that expression
evaluates to 0
then all tokens are ignored by the assembler until it finds
a matching @endif
directive. Any value other than 0
is interpreted as
being true
and will result in the tokens being assembled as normal.
Include Guards
Since AZ65 assembles and links your code in a single pass, you usually
dont need what are referred to as include guards. But for the sake
of demonstration, these are @if
directives used in assembly languages
to prevent labels from being defined multiple times.
An example of an include-guarded assembly file would look like this:
; filename: "foo.inc"
@if ! @isdef FOO_INC
@defn FOO_INC, 1
@defl MY_CONSTANT, $42
@endif ; FOO_INC
In this example a constant value FOO_INC
is used as a flag indicating
if the block of code within the @if
and @endif
directives has been
read by the assembler already. The @isdef
directive is used to check whether the FOO_INC
flag has been defined,
if not, then it will be set to 1
on the next line.
This allows you to run @include "foo.inc"
multiple times in your
assembly with no worry about errors.
@struct
The @struct
directive is used to simplify working with
complex data types. "Structs" ultimately allow you to
define offsets and lengths of data from a base address.
We'll start with a simple example:
@struct MyStruct
field1 1
field2 2
@endstruct
A @struct
directive begins with a global label. In this example
we create a struct named MyStruct
. Then we provided a list of
label-expression pairs ending with an @endstruct
directive.
This has the same effect as writing:
@segment "ADDR"
@org 0
MyStruct:
@meta "@SIZEOF" "1"
.field1: @ds 0
@endmeta
@meta "@SIZEOF" "2"
.field2: @ds 1
@endmeta
@redefl MyStruct, 1 + 2
We can ignore the @meta
and @endmeta
directives for now and focus on the symbols added to the symbol table.
MyStruct
holds the sum of all expressions for each label defined in the struct. This is the "size" (or length) of the struct in bytes.MyStruct.field1
holds the offset of the label in bytes from the start of the struct. The offset of a struct is the sum of all expressions that came before it. Sice it is the first label in the struct, it has an offset of1
.MyStruct.field2
as with the above, this label holds the offset of the label from the start of the struct. Since.field1
evaluates to1
, then.field2
is set to1
.
Structs provide a few other bits of syntactic sugar. Colons (:
) following
labels within the struct body are optional. Expressions can contain references
to previously-defined labels in the struct. And you can use @db
and @dw
as synonyms for 1
and 2
, respectively:
@struct MyStruct
field1: @dw
field2: .field1 + 1
field3: @db
@endstruct
Finally, @align
can be used in structs to add padding
between labels just like you would do in normal code:
@struct MyStruct
field1: @db
@align 8
field2: @db
@endstruct
@sizeof
@sizeof
is helper for accessing the expression provided
for labels defined in a @struct
. Normally,
when you have a struct like this:
@struct MyStruct
field: 2
@endstruct
You have no way to access the result of the expression after
field
. This size "metadata" is stored in the
symbol table, but you cannot access it directly. The label
MyStruct.field
afterall is set to 0
: the offset of the
label from the start of the struct.
To access it easily, use @sizeof
:
@defn FIELD_SIZE, @sizeof MyStruct.field
@sizeof MyStruct.field
in this example will return 2
.
This has nearly the same effect as the much more clunky:
@defn FIELD_SIZE, @parse @string { @getmeta MyStruct.field, "@SIZEOF" "\\" }
Though @sizeof
acts as a unary operator in expressions. It will be lazily
evaluated if necessary.
@incbin
@align
@meta
Macros
Macro-Like Directives
As explained before, macros in AZ65 act as generators of tokens. The following directives are considered "macro-like" since they ultimately perform token generation.
They can be thought of as macros that are built-in to the assembler, but unlike normal macros, they will be evaluated while defining new macros.
@getmeta
@include
@isdef
@string
@label
@count
@each
The @each
directive will repeat a block of tokens
for each input token.
Examples
; Will print the numbers on separate lines
@each Token, { 1 2 3 4 }
@echo Token
@endeach
Each can be combined with @count
to create
a REPEAT
macro:
@macro REPEAT, 2, N, Body
@each _, { @count N }
Body
@endeach
@endmacro
; Prints "Hello" 3 times
REPEAT 3, {
@echo "Hello"
}