Atari ST code setup / tricks

Introduction
Quick-start tooling
ST graphics quick-start : Line A
ST graphics quick-start : Direct access
ST graphics quick-start : VT52 emulation
Vertical Synchronization
Double buffering
Keyboard / clean exit
Line A / Direct access combo example
Useful tricks

Introduction

A small quick-start / reminder to low-level (Motorola 68000 assembly) programming for the Atari ST computers. See this for more resources (setting palette, sound etc.), the Atari GEMDOS reference manual can be useful for GEMDOS calls, this write-up is also great and the various intros. (512b, 256b, 128b, 64b, 32b)

Focus is on Atari STF (8Mhz Motorola 68000, 1MiB RAM, computer released in ~1985), may works for other models (Atari Falcon) but untested.

Atari ST is a special platform to me as i was fascinated by it in early 2000s which led into buying an Atari 1040 STF (first retro computer !), the fascination reason was a bit fuzzy but it was probably inspired by the amount of peoples (on online boards, IRC etc.) i knew which tinkered with a ST in the past, it was also because of some impressive demos (Posh, Odd Stuff, Blood, Virtual Escape etc.) and games (Rick Dangerous, Dungeon Master, Populous, Xenon 2, Another World etc.), i also appreciated the bare simplicity of the hardware.

I did little programming on the ST, mainly tinkering with Devpac and doing some GEM "hello world", it was probably too overwhelming for me at the time as i recall struggling with simple pixels plotting, the code i was looking at was probably way too complex (or even tricky) for what i really wanted !

I tried to code on the ST again later but i was probably put off by m68k assembly again, the ISA was a bit weird / overwhelming to me (operands order, longer mnemonics, size suffixes etc.), i later (2024) tried it again and finally appreciate some parts of it although my preference still goes for early RISC type CPU such as ARM. (Acorn Archimedes !)

Note that this article focus on straightforward low res 320x200 graphics (pixels access and some Line A stuff) for code golfing on the ST, may works with higher resolution modes but untested, i use tos104uk TOS ROM image.

Also note that Line A and direct access methods can be combined and it is sometimes preferred for speed or code size because multiple Line A calls may trash some useful registers... (routine dependent but usually d0, d1, d2 and a0, a1, a2 !)

Here is the default 16 colors TOS palette for reference :

Quick-start tooling

Emulator

On Ubuntu i use Hatari, it can be installed from the package manager, it require a TOS ROM (look at ROM entry in the menu) which can be found on the internet, my Hatari config also have a GEMDOS drive path setup (Hard disks menu) where i put .prg or .tos programs that i can run from TOS and i enabled faster boot in System menu entry, boosting the CPU clock to 32 Mhz (CPU menu entry) is useful to prototype on because of faster boot and overall speed.

Assembler

I use vasm as an assembler under Linux, another good 68k assembler i knew from my Z80 years is WLA-DX although i didn't test it for 68k.

To install vasm and compile a file such as my_assembly_source.s under Ubuntu (may also works for others) :

curl http://sun.hasenbraten.de/vasm/release/vasm.tar.gz | tar -xz && cd vasm &&make CPU=m68k SYNTAX=mot &&sudo cp vasmm68k_mot /usr/local/bin &&sudo cp vobjdump /usr/local/bin
vasmm68k_mot -Ftos my_assembly_source.s -o my_st_binary.tos

This assemble as a valid compatible .tos (relocatable ?) executable but a shorter way is to craft the program header ourselves (we don't need data, bss sections etc.) so that we have complete control on the executable content :

    dc.w
$601a     ; ph_branch (branch to the code)

    dc.l end-start ; ph_tlen

    dc.l
$0        ; ph_dlen

    dc.l
$0        ; ph_blen

    dc.l
$0        ; ph_slen

    dc.l
$0        ; ph_res1

    dc.l $0        ;
ph_prgflags

    dc.w
$1        ; ph_absflag
(absolute; simpler; not relocatable)

start

    loop:

        bra loop

end

This is a valid .tos custom program, it loop indefinitely, it can be assembled with :

vasmm68k_mot -Fbin
my_assembly_source.s -o my_st_raw_binary.tos

Documentation on the program header can be found here.

Debugger

Hatari can go in debug mode on a keypress (hotkey must be configured in the menu) when it is launched from a terminal, hitting the hotkey will pause the emulator and go straight into debug mode where registers can be inspected with commands such as cpureg. (type help for a list of debugger commands)

This online tool is quite handy as an interactive Motorola 68000 documentation. (has instructions with examples and integrated live editor / debugger)

Disassembler

There is probably a disassembler with vasm (vobjdump ?) but i anyway use the web version of ImHex which has a neat set of tooling such as hex editor and various disassembler.

ST graphics quick-start : Line A

Line A is a quick way for low-level relatively fast bundled graphics routines on the ST, it is great for code golfing as it provide simple ways to do graphics, it can also provide various type of information and can also go beyond simple pixels access. (line, filled rectangles, polygons, fill, blit etc.)

Useful resources on Line A API for me was this documentation which list the variables index, this is also nice for a quick overview of the API.

Line A was surprising at first as it use a normally illegal instruction but that is also why it works... it trigger an exception but the execution is routed to the routines code. :)

Initialization

Line A must be initialized first before calling any routines, the Line A call also provide various useful pointers in d0, a0, a1, a2, most useful at first is d0 or a0 which contain the pointer to the base address of Line A interface variables, here is a Line A initialization sample that also get VPLANES, VWRAP, CONTRL, INTIN and PTSIN variables content in a1, a2, a3 and a4 (last two is used for Line A pixels access) and hide cursor :

...

start

    dc.w $a000 ; Line A init call

    movem.l (a0),a1-a4 ; get some useful variables
content in a1-a4

    sf -6(a0) ; disable cursor (optional; see VT52
emulation below for a shorter way)

    loop:

        bra loop

end

Note that this is a rather generic way to initialize Line A, the movem line can be removed if there is no Line A calls that require the use of the variables such as rectangle call etc. (Line A pixels access do use INTIN and PTSIN for example)

Filled rectangle (screen clearing)

Now here is a complete example of a low res (320x200) screen clearing (black color; default palette) call using Line A rectangle fill routine :

move.l #-1,24(a0) ; bit-plane 0 and 1
(color)

move.l #-1,28(a0) ; bit-plane 2 and 3 (color)

move.w #0,36(a0) ; writing mode (0: replace, 1: transparent, 2:
xor, 3: inverse of 1)

move.l #$00000000,38(a0) ; x1: 0 y2: 0

move.l #$013f00c7,42(a0) ; x2: 319 y2: 199

move.l #$a886,46(a0) ; ptr to fill pattern (tested on tos104uk)

move.w #0,50(a0) ; fill pattern mask

move.w #0,52(a0) ; multi-plane fill pattern flag

move.w #0,54(a0) ; clipping off

movem.l a0,-(sp)

dc.w $a005 ; Line A filled rectangle call

movem.l (sp)+,a0

Note that the rectangle call require a pointer to a fill pattern which means additional code or data to embed in the program so i use a trick to avoid this here : a pointer that was found by trial and error by looking around for a full fill pattern, disadvantage of this trick is compatibility, the pattern may change with different TOS version / RAM content !

This code is quite lengthy... but most of these parameters are actually unneeded for a full screen clear :

clr.w 36(a0) ; replace mode (required
otherwise it doesn't do anything)

move.l #$00000000,38(a0) ; x1: 0 y2: 0

move.l #$013f00c7,42(a0) ; x2: 320 y2: 200

clr.w 54(a0) ; clipping off (it clear only a tiny area if
removed)

movem.l a0,-(sp)

dc.w $a005 ; Line A filled rectangle call

movem.l (sp)+,a0

This clear with the default color and shows that clipping and writing mode parameters are mandatory. Note that it also require the fill pattern pointer when the clear color is custom. (don't know why but this was from my tests)

a0 is saved on stack before the call to preserve the register, there is some ways to avoid these stack instructions to gain some bytes by having better registers organization. (avoiding the use of Line A trashed registers)

Disadvantage of Line A screen clearing is that it is very slow... and require about the same amount of bytes (if not more !) than direct access screen clearing, it may be useful in some very limited cases but the speed overhead is still quite bad.

Pixels access

Now that we have a full screen clear we can try plotting a white centered pixel with Line A :

move.w #0,(a3) ; pixel value
(color)

move.l #$00a00064,(a4) ; x: 160, y: 100 could also be two
instructions : move.w #$00a0,(a4) and move.w #$0064,2(a4)

dc.w $a001 ; put pixel Line A call

Now this is short... coordinates are as is and can be contained in a single register so we could pack them with instructions such as swap etc.

Note that the Line A call for "put pixel" trash some registers on my test, i didn't wrap the call here but it might be needed on some use cases.

More Line A speed

Routines call can be sped up through a direct call (avoiding interrupt) with the table of function pointers given by the init call in a2 and XBIOS function 38 to bypass memory protection, see here for details. It can probably be done with GEMDOS $20 call to go in supervisor mode (see here or direct access below), all of this may require some precious bytes though...

ST graphics quick-start : Direct access

The alternative to Line A is direct access to the hardware, the fastest method on a bare ST ! It is also quite short although there is still some disadvantages for code golfing. (must compute coordinates)

Initialization

Direct access on the ST (hardware registers and system variables) require to escalate to supervisor mode which can be done with a GEMDOS call :

...

start

    move.w #$20,-(sp)

    trap #1

    loop

        bra loop

end

Then we can do..

Screen clearing

moveq #0,d1

move.l ($44e),a2

move.w #200*80/2,d0 ; 320 / 4 * 200 / 2

cls:

    move.l d1,(a2)+

    dbra d0,cls

Quite small compared to a Line A rectangle call and full controls on the trashed registers... and fast.

There is ways to reduce this code by reorganizing it, there is no needs to clear d1 if a register is already set to 0 (value can be grabbed from RAM also !) and the loop start value could come from another register or memory. (with a shift to make it close to the screen buffer size)

Using a random register for d1 (or use a2) can produce some nice glitches also.

See VT52 section below for an alternative that may be shorter and almost as fast.

Pixels access

Pixels access on the ST may be "hard" to get compared to other platforms (especially modern ones) due to planar graphics. (advantage is space efficiency and speed for some stuff, there is also tricks as in 0-bitplanes demos !)

Here is an example of a generic direct access put pixel routine which emulate the behavior of a Line A call :

put_pixel ; call this with bsr
instruction with d0 and d1 being X/Y and d2 the color index

    move.l ($44e),a2 ; get base screen address in
a2

    move.w d0,d3

    and #$fff0,d0 ; align x

    lsr.w #1,d0 ; log2(8/v_planes) where v_planes is
number of bitplane (4 in low res 16 colors)

    muls #320/2,d1 ; number of bytes/video line

    add.w d0,d1 ; d1 = x + y

    add.w d1,a2 ; add base screen address; a2 now
point to the address of the bitplanes slice

    and.w #$f,d3 ; compute x % 16 to get the
position to plot at in a bitplane slice

    move.w #$8000,d1

    lsr.w d3,d1 ; now set the corresponding bitfield
bit



    ; prepare unset mask

    move.w d1,d3

    not.w d3



    ; first bitplane

    btst #0,d2

    beq r1

        or.w d1,(a2)+ ; set

        bra s1

r1  and.w d3,(a2)+ ; unset

s1  lsr.w #1,d2



    ; second bitplane

    btst #0,d2

    beq r2

        or.w d1,(a2)+

        bra s2

r2  and.w d3,(a2)+

s2  lsr.w #1,d2



    ; third bitplane

    btst #0,d2

    beq r3

        or.w d1,(a2)+

        bra s3

r3  and.w d3,(a2)+

s3  lsr.w #1,d2



    ; fourth bitplane

    btst #0,d2

    beq r4

        or.w d1,(a2)+

        bra s4

r4  and.w d3,(a2)+

s4  lsr.w #1,d2

    rts

For code golfing we can drop the generic routine to just set the bit planes we want and avoid all these checks :

...

    move.w d1,d3

    not.w d3



    or.w d1,(a2)+ ; set red

    and.l d3,(a2)+ ; unset two bit planes at the
same time

    and.w d3,(a2)+ ; unset the last one, could also
omit this if a single bit plane is only used (same for previous bit
planes !)

...

The position calculation can be shortened and x/y can be packed into a single register, there is a lot of ways that this put pixels code can be reduced down depending on the use case.

ST graphics quick-start : VT52 emulation

TOS VT52 emulation can be used as a shortcut to perform some graphics stuff (or just text things !) such as clearing the screen or disabling the cursor, it is smaller than direct access when used as a combo, the GEMDOS/TOS VT52 extended commands are particularly short :

...

    pea vt52Commands(pc) ; push commands address on
stack

    move.w #9,-(sp) ; push GEMDOS Cconws call (write
NULL terminated string to the standard output)

    trap #1 ; GEMDOS call

...

vt52Commands

    dc.b 27,"f"   ; disable cursor

    dc.b 27,"c",1 ; set background color to red
(optional)

    dc.b 27,"E"   ; clear to start of
screen

    dc.b 0

This is likely shorter than a direct screen clear when the background color doesn't need to be set to a particular value, can also be blended with data... it also feels as fast as the direct screen clearing method. (much faster than Line A !)

Vertical Synchronization

VSync might be required to avoid ugly tearing in real time stuff, there is a short way to do it fortunately by using a XBIOS call which halts processing until the next vertical blank :

move.l a0,-(sp) ; may trash a0 (which
is useful for Line A) so preserve it; a shorter way is : exg a0,a5
(replace a5 by a free register)

move.w #$25,-(sp)

trap #14

addq.l #2,sp ; fix stack

move.l (sp)+,a0 ; get back a0 (shorter way is : exg
a0,a5)

Stack fix may be ruled out in some cases by careful code organization, same for a0 preservation.

Here is some more resources on timing on the Atari ST.

This article (also this one) has in depth details about raster effects on the ST(e) and show a way to poll the hardware registers which may be faster than the interrupt road.

A cheap timer can be used by reading (frame counter) : $466

Double buffering

Double buffering can be done easily with XBIOS Setscreen which allow to change logical / physical screen address.

Keyboard / clean exit

$ffffc2 address can be used to get the last byte from the keyboard and a clean exit can be done with a GEMDOS call :

; check for space key

cmp.b #$39,$fffc02

... do something based on the condition ...



; clean program exit

clr.l -(a7)

trap #1

Line A / Direct access combo example

See my entry for Lovebyte 2025 for an example.

This 256b intro is a much more impressive demonstration of using Line A polygon calls to draw a cube on Atari Falcon, it use double buffering, VSync and use screen clearing glitches that i described above, it is probably way too slow on the ST though ! (might also be incompatible due to some Falcon / newer TOS specifics)

Useful tricks

Some tricks that i found useful for code golfing on Atari ST, probably valid for any Motorola 68000 powered machines :

packing two words into a register to multiply the amount of registers (save whole registers) and avoiding touching the stack, computation such as addition can also be done on the two values simultaneously then and could even be given directly to Line A call in a single instruction, individual words can be accessed using the swap instruction, exg instruction can also be interesting to save registers
self-modifying code is very useful in some cases
clear previous graphics data instead of a slow full screen clear e.g. if you draw some points push their coordinates on stack at the same time and draw them again with background color after a VSync wait
using dbra or dbcc instructions for short loop (dbra decrement automatically for example)
movem is sometimes useful to transfer multiple registers directly, it speed things up and can also bring code size down, immediate example would be packing Line A call parameters...
when general purpose registers are lacking the address registers can still be useful although they have limited operations, USP register can also be used in supervisors mode although even more limited
although obvious moveq to load small constants help, same for addq or subq
short branch instructions (bra.s, beq.s, bne.s etc.), note that most assemblers do this automatically
addressing modes
choosing appropriate data sizes
shifting is cycles variable unlike mid 80s CPU (ARM, 386 etc.) so HAKMEM 149 algorithm for example isn't so valuable for speed / real time (lookup tables are perhaps better !)
if compatibility isn't a concern some values can be hardcoded such as $#F8000 which is the base screen address in my case (note : may change with different ST models, graphics mode ? or even TOS version etc. so not safe at all !)

Atari ST code setup / tricks

Contents

Introduction

Quick-start tooling

Emulator

Assembler

Debugger

Disassembler

ST graphics quick-start : Line A

Initialization

Filled rectangle (screen clearing)

Pixels access

More Line A speed

ST graphics quick-start : Direct access

Initialization

Screen clearing

Pixels access

ST graphics quick-start : VT52 emulation

Vertical Synchronization

Double buffering

Keyboard / clean exit

Line A / Direct access combo example

Useful tricks