Introduction

AZ65 is a powerful but simple assembler for the Zilog Z80, MOS 6502, and Sharp LR35902 (sm83 / gbz80) architectures. In this book you'll learn the ins-and-outs of the assembler and its advanced meta-programming capabilities.

Note that this book only covers using AZ65 and does not cover assembly in general.

For assembly language references see:

  • z80.info for lots of z80 resources.
  • 6502.org for plenty of 6502 tutorials.
  • gbdev.io for resources on sm83/gbz80 programming

Command Line Interface

All the functionality of AZ65 is in the az65 binary.

Assembling Files

Pass the target CPU architecture and the name of an assembly file to assemble it:

az65 6502 code.asm

Architectures

  • 6502 for MOS 6502
  • z80 for Zilog Z80
  • sm83 for the Sharp LR35902 (a.k.a gbz80)

By default, az65 will write the assembled program to stdout. You can direct this to a file using the > operator:

az65 6502 code.asm > code.bin

You can alternatively use the -o option to pick an output file:

az65 z80 code.asm -o code.bin

Search Paths

AZ65 supports specifying search paths for locating files that are referenced in your code. Pass search paths by repeatedly using the -I option.

az65 sm83 code.asm -I macros -I data > code.bin

When a file is referenced by name it will be searched for in the same directory as the currently assembled file and if not found each include path will be checked in the order given.

Expressions

Number Formats

Binary

Binary numbers in AZ65 use standard 6502 binary number syntax. They are prefixed with a modulus (%). For example:

  • %00001111
  • %1010
  • %0

Hexadecimal

Like binary numbers, standard syntax is used. They are prefixed with a dollar-sign ($) and are case-insensitive. For example:

  • $1234
  • $DADcafe
  • $0

Decimal

Numbers without a % or $ prefix are assumed to be decimal (base 10) numbers.

Operators

All expressions in AZ65 operate on 32-bit signed integers with wrapping over/underflow semantics. All operators and their precedence match that of the C language with a few notable modifications:

  1. There is no C binary comma (,) operator. It is mostly an anachronism that many C programmers aren't even aware exists.
  2. The unary < and > operators, common in 6502 assembly, are present. They are used to get the low and high byte of a 16-bit word. For example:
    • < $1234 evaluates to $34.
    • > $1234 evaluates to $12.
  3. There is a unary + operator. This is mainly used to disambiguate between expressions and memory locations in some assembly languages. For example, in z80 assembly the instruction ld a, ($42) is ambiguous. A programmer may intend for this to load the value $42 into a, but AZ65 will interpret this is loading a byte at address $0042 into a. To add clarity, you can use a unary + to indicate that you are passing a numeric expression rather than an address:
    • ld a, +($42)
  4. Unsigned (logical) shift operators are provided. Use the <<< and >>> symbols to shift left and right respectively:
    • $ffffffff >>> 1 evaluates to $7fffffff
    • $ffffffff <<< 1 evaluates to $fffffffe

Strings

All strings in AZ64 are UTF-8 encoded. They are written enclosed in double quotes ("):

  • "Hello World"
  • "not a number: 1234"
  • "Howdy, cowboy 🤠"

Use C-style escape sequences to write special characters inside a string:

  • "line break: \n"
  • "tab: \t"
  • "double-quote: \""

Multi-line strings can be written by placing a backslash immediately before the line break:

"multi\
line\
string"

To encode a byte directly, place a backslash before a hexadecimal number:

"capital Q: \$51"

Multicharacter Literals

AZ65 also supports the multicharacter literal that is present in C.

You can specify big-endian 32-bit values as a sequence of 1 to 4 ASCII characters enclosed in single quotes ('):

'a'
'yo'
'test'

Labels

There are 2 types of labels in AZ65:

Global Labels

Global labels are labels as you'd normally expect them in an assembler. They are alphanumeric tokens that are used to name addresses and constants.

GlobalLabel:
    jr GlobalLabel

Note that the use of colons (:) is optional.

Local Labels

Local labels are labels defined within the "scope" of a global label-- that is labels that are defined after a global label in your code. They look like global labels but begin with a dot (.):

GlobalLabel:
    nop
.LocalLabel:
    jr .LocalLabel

Local labels are really just syntactic sugar for writing longer fully-qualified labels. The example above is equivalent to this:

GlobalLabel:
    nop
GlobalLabel.LocalLabel:
    jp GlobalLabel.LocalLabel

This means that two global labels can have local labels with the same name and they will not conflict with each other.

It also means you can always refer to a local label by its full name. When written this way, they are referred to as "direct" labels.

Simple Directives

In assembly languages, "directives" are special commands that are used to control the behavior of the assembler. AZ65 directives are special tokens that begin with an at-sign (@).

There are two kinds of directives in AZ65: "simple" and "macro-like" directives. We'll cover the simple directives first since they are just like directives found in most other assemblers.

@echo

We'll start with the @echo directive since it is very useful for debugging and demonstrating future directives.

The @echo directive takes a single string or expression argument and prints it to stderr.

Examples

@echo "Hello World"

Note that expressions are always printed in base 10.

@echo $1234 + $5678 ; Prints "26796" 

Methods of constructing strings out of expressions and printing in other bases will be covered later.

@here

As AZ65 assembles your code, a virtual program counter keeps track of the 16-bit address of every instruction and label you write.

You access the current address in expressions using the @here directive.

Examples

@echo @here ; Prints "0"
nop
nop
@echo @here ; Prints "2"
nop
nop
jmp @here - 2 ; Jumps 2 bytes back

@org

@org is the compliment to @here. It sets the virtual program counter to a new 16-bit value.

Examples

@org $8000
@echo @here ; Prints "32768"

@defn

Use @defn to define constants. This works similarly to EQU found in most other assemblers.

@defn takes two arguments: a label, followed by an expression.

Examples

@defn SCALE_FACTOR, 2

@echo 11 * SCALE_FACTOR ; Prints "22"
Global1: 
    @defn .LOCAL_CONSTANT, 2
    
Global2: 
    @defn .LOCAL_CONSTANT, 4

@echo Global1.LOCAL_CONSTANT ; Prints "2"

@echo Global2.LOCAL_CONSTANT ; Prints "4"
@defn Global.LOCAL_CONSTANT, 9
    
Global: 
    @echo .LOCAL_CONSTANT ; Prints "9"

@defl

Use @defl to define labels that reference addresses.

Prefer using @defl instead of @defn for defining things such as addresses in RAM. The assembler will treat @defl values as addresses for example when generating debugging information.

@defl takes two arguments: a label, followed by an expression.

Examples

@defl VRAM, $2000
@defl VRAM.palette, VRAM + $0100

@org $8000

ldx #$10
lda #$ff
sta VRAM.palette, x

@redefn

@defn will not let you change the value of a constant once it is defined. They are immutable.

Though there may be some circumstances you may want to do this. You can use @redefn to achieve this.

Examples

@defn CONST, 42
@defn CONST, 32 ; Will result in an ERROR!

@echo "Success!"
@defn CONST, 42
@redefn CONST, 32

@echo "Success!"

@redefl

@defl will not let you change the value of a label once it is defined. They are immutable.

Though there may be some circumstances you may want to do this. You can use @redefl to achieve this.

Examples

@defl ADDR, $4000
@defl ADDR, $2000 ; Will result in an ERROR!

@echo "Success!"
@defl ADDR, $4000
@redefl ADDR, $2000

@echo "Success!"

@undef

Use @undef to "undefine" a label or constant.

Examples

@defn TEST, 42

@echo TEST ; Prints "42"

@undef TEST

@echo TEST ; Will error due to expression not being solvable

@assert

The @assert directive is also invaluable for debugging and validating invariants in your code.

Examples

It is invoked with an expression argument and optional string message:

@assert 2 + 2 == 4
@assert 1, "cannot fail!"

Assertions support lazy evaluation. That means you can write assertions that reference labels that are defined later in your code:

@assert SubRoutine.length == 4

SubRoutine:
    nop
    nop
    nop
    rts
    @defn .length, @here - SubRoutine

@die

@die is a convenience directive used to terminate the assembler with an error message.

Examples

@die "Assembler will not continue..."

The error message will be printed on stderr.

@db

@db is used to define bytes. It along with its siblings: @dw and @ds can have special meaning depending on where and when they are used.

Examples

Most commonly, you'll use @db to define strings in the CODE segment of your binary:

@db "hello"

However you can also define sequences of bytes:

@db $42, $43, $44, $45

Or mix and match:

@db "Hello World", $a, "This is a test", $42

Within the ADDR segment, @db simply increments the program counter by 1.

    @segment "ADDR"
    @org 0

location:
    @db
location2:
    @db

    @assert location2 == 1

@dw

@dw is used to define 16-bit "words". It along with its siblings: @db and @ds can have special meaning depending on where and when they are used.

Examples

Most commonly, you'll use @dw to define words in the CODE segment of your binary:

@dw $1234

Like with @db, you can define a sequence of words:

@dw $1234, $5678

Within the ADDR segment, @dw simply increments the program counter by 2.

    @segment "ADDR"
    @org 0

location:
    @dw
location2:
    @dw

    @assert location2 == 2

@ds

@ds is used to define space, or really a contiguous sequence of bytes. It along with its siblings: @db and @dw can have special meaning depending on where and when they are used.

Examples

Most commonly, you'll use @ds to define strings in the CODE segment of your binary:

@ds 8 ; Will add 8 bytes of $00 to the output.

However you can also define your space to have an initial value:

@db 16, $ff ; Will add 16 bytes of $ff to the output.

Examples

Within the ADDR segment, @ds simply increments the program counter by a fixed amount.

    @segment "ADDR"
    @org 0

location:
    @ds 7
location2:
    @ds 1

    @assert location2 == 7

This works similarly in structs where it can be used to insert padding:

@struct MyStruct
    @ds 2
    field 1
@endstruct

@assert MyStruct.field == 2

@segment

When AZ65 is parsing your assembly it is running in one of two segment modes:

CODE

By default, AZ65 is running in the CODE segment mode. This means that AZ65 is generating normal code.

For example when assembling z80 code and the assembler encounters:

@org 0

LoadAWith42:
    ld a, $42

It will emit the bytes 3E 42. And add an entry in the assembler's symbol table for the value LoadAWith42 set to the value 0 (the value of the program counter).

ADDR

However, if you switch to the ADDR segment mode the assembler is much more restrictive in what it accepts.

In the following example we add the directive @segment with a value of "ADDR" to the top of our code:

@segment "ADDR"

@org 0

LoadAWith42:
    ld a, $42

This will result in an ERROR indicating that instructions are not allowed in the ADDR segment.

In this mode, you are free to define labels and constants, but cannot generate any code. All directives work as normal besides the @db, @dw, and @ds directives. In ADDR mode they only increment the program counter.

The ADDR segment is meant specifically for defining memory addresses in RAM. This is useful in systems where your code is stored in ROM, usually game consoles like the Nintendo Entertainment System or Game Boy.

For example, the Game Boy memory map defines RAM as starting at $C000. Rather than manually using @defl to manually define memory locations for variables, you can let the assembler assign addresses:

@segment "ADDR"

org $C000

WRAM0:
    .PlayerX: @db
    .PlayerY: @db
    .PlayerHealth: @db

With this, you can easily define, move, and reference variables in RAM bank 0.

@if

The @if directive is used to essentually turn blocks of code ON or OFF.

@if 0
    @die "Ignored!"	
@endif

An @if directive takes a single expression argument. If that expression evaluates to 0 then all tokens are ignored by the assembler until it finds a matching @endif directive. Any value other than 0 is interpreted as being true and will result in the tokens being assembled as normal.

Include Guards

Since AZ65 assembles and links your code in a single pass, you usually dont need what are referred to as include guards. But for the sake of demonstration, these are @if directives used in assembly languages to prevent labels from being defined multiple times.

An example of an include-guarded assembly file would look like this:

; filename: "foo.inc"

@if ! @isdef FOO_INC
@defn FOO_INC, 1

@defl MY_CONSTANT, $42

@endif ; FOO_INC

In this example a constant value FOO_INC is used as a flag indicating if the block of code within the @if and @endif directives has been read by the assembler already. The @isdef directive is used to check whether the FOO_INC flag has been defined, if not, then it will be set to 1 on the next line.

This allows you to run @include "foo.inc" multiple times in your assembly with no worry about errors.

@struct

The @struct directive is used to simplify working with complex data types. "Structs" ultimately allow you to define offsets and lengths of data from a base address.

We'll start with a simple example:

@struct MyStruct
    field1 1
    field2 2
@endstruct

A @struct directive begins with a global label. In this example we create a struct named MyStruct. Then we provided a list of label-expression pairs ending with an @endstruct directive.

This has the same effect as writing:

@segment "ADDR"
@org 0

MyStruct:

@meta "@SIZEOF" "1"
.field1: @ds 0
@endmeta

@meta "@SIZEOF" "2"
.field2: @ds 1
@endmeta

@redefl MyStruct, 1 + 2

We can ignore the @meta and @endmeta directives for now and focus on the symbols added to the symbol table.

  • MyStruct holds the sum of all expressions for each label defined in the struct. This is the "size" (or length) of the struct in bytes.
  • MyStruct.field1 holds the offset of the label in bytes from the start of the struct. The offset of a struct is the sum of all expressions that came before it. Sice it is the first label in the struct, it has an offset of 1.
  • MyStruct.field2 as with the above, this label holds the offset of the label from the start of the struct. Since .field1 evaluates to 1, then .field2 is set to 1.

Structs provide a few other bits of syntactic sugar. Colons (:) following labels within the struct body are optional. Expressions can contain references to previously-defined labels in the struct. And you can use @db and @dw as synonyms for 1 and 2, respectively:

@struct MyStruct
    field1: @dw
    field2: .field1 + 1
    field3: @db
@endstruct

Finally, @align can be used in structs to add padding between labels just like you would do in normal code:

@struct MyStruct
    field1: @db

    @align 8
    field2: @db 
@endstruct

@sizeof

@sizeof is helper for accessing the expression provided for labels defined in a @struct. Normally, when you have a struct like this:

@struct MyStruct
    field: 2
@endstruct

You have no way to access the result of the expression after field. This size "metadata" is stored in the symbol table, but you cannot access it directly. The label MyStruct.field afterall is set to 0: the offset of the label from the start of the struct.

To access it easily, use @sizeof:

@defn FIELD_SIZE, @sizeof MyStruct.field

@sizeof MyStruct.field in this example will return 2.

This has nearly the same effect as the much more clunky:

@defn FIELD_SIZE, @parse @string { @getmeta MyStruct.field, "@SIZEOF" "\\" }

Though @sizeof acts as a unary operator in expressions. It will be lazily evaluated if necessary.

@incbin

@align

@meta

Macros

Macro-Like Directives

As explained before, macros in AZ65 act as generators of tokens. The following directives are considered "macro-like" since they ultimately perform token generation.

They can be thought of as macros that are built-in to the assembler, but unlike normal macros, they will be evaluated while defining new macros.

@getmeta

@include

@isdef

@string

@label

@count

@each

The @each directive will repeat a block of tokens for each input token.

Examples

; Will print the numbers on separate lines
@each Token, { 1 2 3 4 }
    @echo Token
@endeach

Each can be combined with @count to create a REPEAT macro:

@macro REPEAT, 2, N, Body
    @each _, { @count N }
        Body
    @endeach
@endmacro

; Prints "Hello" 3 times
REPEAT 3, {
    @echo "Hello"
}

@hex

@bin

@parse

@entropy

Advanced Macros

Bank Metadata

Bank Switching

Debug Symbols