Atari ST code setup / tricks
Contents
Introduction
A small quick-start / reminder to low-level (Motorola 68000
assembly) programming for the Atari ST computers.
See this for
more resources (setting palette, sound etc.), the
Atari GEMDOS reference manual can be useful for GEMDOS calls,
this write-up
is also great and the various intros. (512b,
256b,
128b,
64b,
32b)
Focus is on Atari STF (8Mhz Motorola 68000, 1MiB RAM, computer
released in ~1985), may works for other models but untested.
Atari ST is a special platform to me as i was fascinated by it
in early 2000s which led into buying an Atari 1040 STF (first retro
computer !), the fascination reason was a bit fuzzy but it was
probably inspired by the amount of peoples (on online boards,
IRC etc.) i knew
which tinkered with a ST in the past, it was also because of some
impressive demos (Posh, Odd Stuff, Blood, Virtual Escape etc.)
and games (Rick Dangerous,
Dungeon
Master, Populous,
Xenon
2, Another
World etc.), i also appreciated the bare simplicity of the
hardware.
I did little programming on the ST, mainly tinkering with
Devpac
and doing some GEM
"hello world", it was probably too overwhelming for me at the time
as i recall struggling with simple pixels plotting, now i know that
the code i was looking at was way too complex (or even tricky) for
what i really wanted !
I tried to code on the ST again later but i was probably put
off by m68k assembly again, the ISA
was a bit weird / overwhelming to me (operands order, longer
mnemonics, size suffixes etc.), i later (2024) tried it again and
finally appreciate some parts of it although my preference still
goes for early RISC
type CPU such as ARM
(Acorn
Archimedes !) even if it is not as good for code golfing.
Note that this article focus on straightforward low res
(320x200) graphics (pixels access and some Line A
stuff) for code golfing on the ST, may works with higher resolution
modes but untested, i use tos104uk TOS ROM image.
Also note that Line A and direct access methods can be
combined and it is sometimes preferred for speed or code size
because multiple Line A calls may trash some useful registers...
(routine dependent but
usually d0, d1, d2 and a0, a1, a2 !)
Here is the default 16 colors TOS palette for reference
:
Quick-start tooling
Emulator
On Ubuntu i use Hatari, it can be installed
from the package manager, it require a TOS ROM (look at
ROM entry in the menu) which can be found on the internet,
my Hatari config also have a GEMDOS drive path setup (Hard disks
menu) where i put .prg or .tos programs that i can run from TOS and
i enabled faster boot in System menu entry, boosting the CPU
clock to 32 Mhz (CPU menu entry) is useful to prototype on
because of faster boot and overall speed.
Assembler
I use vasm as an
assembler under Linux, another good 68k assembler i knew from my
Z80 years is WLA-DX
although i didn't test it for 68k.
To install vasm and compile a file such as
my_assembly_source.s
under Ubuntu (may also works for
others) :curl http://sun.hasenbraten.de/vasm/release/vasm.tar.gz | tar -xz && cd vasm &&
make CPU=m68k SYNTAX=mot &&
sudo cp vasmm68k_mot /usr/local/bin &&
sudo cp vobjdump /usr/local/bin
vasmm68k_mot -Ftos my_assembly_source.s -o my_st_binary.tos
This assemble as a valid compatible .tos (relocatable
?) executable but a shorter way is to craft the program header
ourselves (we don't need data, bss sections etc.) so that we have
complete control on the executable content :
dc.w
$601a ; ph_branch (branch to the code)
dc.l end-start ; ph_tlen
dc.l
$0 ; ph_dlen
dc.l
$0 ; ph_blen
dc.l
$0 ; ph_slen
dc.l
$0 ; ph_res1
dc.l $0 ;
ph_prgflags
dc.w
$1 ; ph_absflag
(absolute; simpler; not relocatable)
start
loop:
bra loop
end
This is a valid .tos custom program, it loop indefinitely, it
can be assembled with :
vasmm68k_mot -Fbin
my_assembly_source.s -o my_st_raw_binary.tos
Documentation on the program header can be found here.
Debugger
Hatari can go in debug mode on a keypress (hotkey must be
configured in the menu) when it is launched from a terminal,
hitting the hotkey will pause the emulator and go straight into
debug mode where registers can be inspected with commands such as
cpureg. (type help for a list of debugger
commands)
Disassembler
There is probably a disassembler with vasm (vobjdump ?) but i
anyway use the web version of ImHex which has a neat set of
tooling such as hex editor and various disassembler.
ST graphics quick-start :
Line A
Line A is a
quick way for low-level relatively fast bundled graphics routines
on the ST, it is great for code golfing as it provide simple ways
to do graphics, it can also provide various type of information and
can also go beyond simple pixels access. (line, filled rectangles,
polygons, fill, blit etc.)
Useful resources on Line A API for me was this
documentation which list the variables index, this is
also nice for a quick overview of the API.
Line A was surprising at first as it use a normally illegal
instruction but that is also why it works... it trigger an
exception but the execution is routed to the routines code.
:)
Initialization
Line A must be initialized first before calling any routines,
the Line A call also provide various useful pointers in d0, a0, a1,
a2, most useful at first is d0 or a0 which contain the pointer to
the base address of Line A interface variables, here is a Line A
initialization sample that also get VPLANES, VWRAP, CONTRL, INTIN
and PTSIN variables content in a1, a2, a3 and a4 (last two is used
for Line A pixels access) and hide cursor :
...
start
dc.w $a000 ; Line A init call
movem.l (a0),a1-a4 ; get some useful variables
content in a1-a4
sf -6(a0) ; disable cursor (optional; see VT52
emulation below for a shorter way)
loop:
bra loop
end
Note that this is a rather generic way to initialize Line A,
the movem line can be removed if there is no Line A calls
that require the use of the variables such as rectangle call etc.
(Line A pixels access do use INTIN and PTSIN for example)
Filled rectangle
(screen clearing)
Now here is a complete example of a low res (320x200) screen
clearing (black color; default palette) call using Line A rectangle
fill routine :
move.l #-1,24(a0) ; bit-plane 0 and 1
(color)
move.l #-1,28(a0) ; bit-plane 2 and 3 (color)
move.w #0,36(a0) ; writing mode (0: replace, 1: transparent, 2:
xor, 3: inverse of 1)
move.l #$00000000,38(a0) ; x1: 0 y2: 0
move.l #$013f00c7,42(a0) ; x2: 319 y2: 199
move.l #$a886,46(a0) ; ptr to fill pattern (tested on tos104uk)
move.w #0,50(a0) ; fill pattern mask
move.w #0,52(a0) ; multi-plane fill pattern flag
move.w #0,54(a0) ; clipping off
movem.l a0,-(sp)
dc.w $a005 ; Line A filled rectangle call
movem.l (sp)+,a0
Note that the rectangle call require a pointer to a fill
pattern which means additional code or data to embed in the program
so i use a trick to avoid this here : a pointer that was found by
trial and
error by looking around for a full fill pattern, disadvantage
of this trick is compatibility, the pattern may change with
different TOS version / RAM content !
This code is quite lengthy... but most of these parameters are
actually unneeded for a full screen clear :
clr.w 36(a0) ; replace mode (required
otherwise it doesn't do anything)
move.l #$00000000,38(a0) ; x1: 0 y2: 0
move.l #$013f00c7,42(a0) ; x2: 320 y2: 200
clr.w 54(a0) ; clipping off (it clear only a tiny area if
removed)
movem.l a0,-(sp)
dc.w $a005 ; Line A filled rectangle call
movem.l (sp)+,a0
This clear with the default color and shows that clipping and
writing mode parameters are mandatory. Note that it also require
the fill pattern pointer when the clear color is custom. (don't
know why but this was from my tests)
a0 is saved on stack before the call to preserve the register,
there is some ways to avoid these stack instructions to gain some
bytes by having better registers organization. (avoiding the use of
Line A trashed registers)
Disadvantage of Line A screen clearing is that it is very
slow... and require about the same amount of bytes (if not more !)
than direct access screen clearing, it may be useful in some very
limited cases but the speed overhead is still quite bad.
Pixels access
Now that we have a full screen clear we can try plotting a
white centered pixel with Line A :
movem.w #0,(a3) ; pixel value
(color)
move.l #$00a00064,(a4) ; x: 160, y: 100 could also be two
instructions : move.w #$00a0,(a4) and move.w #$0064,2(a4)
dc.w $a001 ; put pixel Line A call
Now this is short... coordinates are as is and can be
contained in a single register so we could pack them with
instructions such as swap etc.
Note that the Line A call for "put pixel" trash some registers
on my test, i didn't wrap the call here but it might be needed on
some use cases.
More Line A speed
Routines call can be sped up through a direct call (avoiding
interrupt) with the table of function pointers given by the init
call in a2 and XBIOS function 38 to bypass memory protection,
see
here for details. It can probably be done with GEMDOS $20 call to go
in supervisor mode (see
here or direct access below), all of this may require some
precious bytes though...
ST graphics quick-start : Direct access
The alternative to Line A is direct access to the hardware,
the fastest method on a bare ST ! It is also quite short although
there is still some disadvantages for code golfing. (must compute
coordinates)
Initialization
Direct access on the ST (hardware registers and system
variables) require to escalate to supervisor mode which can be done
with a GEMDOS call :
...
start
move.w #$20,-(sp)
trap #1
loop
bra loop
end
Then we can do..
Screen clearing
moveq #0,d1
move.l ($44e),a2
move.w #200*80/2,d0 ; 320 / 4 * 200 / 2
cls:
move.l d1,(a2)+
dbra d0,cls
Quite small compared to a Line A rectangle call and full
controls on the trashed registers... and fast.
There is ways to reduce this code by reorganizing it, there is
no needs to clear d1 if a register is already set to 0 (value can
be grabbed from RAM also !) and the loop start value could come
from another register or memory. (with a shift to make it close to
the screen buffer size)
Using a random register for d1 (or use a2) can produce some
nice glitches also.
See VT52 section below for an alternative that may be shorter
and almost as fast.
Pixels access
Pixels access on the ST may be "hard" to get compared to other
platforms (especially modern ones) due to planar
graphics. (advantage is space efficiency and speed for some
stuff, there is also tricks as in 0-bitplanes
demos !)
Here is an example of a generic direct access put pixel
routine which emulate the behavior of a Line A call :
put_pixel ; call this with bsr
instruction with d0 and d1 being X/Y and d2 the color index
move.l ($44e),a2 ; get base screen address in
a2
move.w d0,d3
and #$fff0,d0 ; align x
lsr.w #1,d0 ; log2(8/v_planes) where v_planes is
number of bitplane (4 in low res 16 colors)
muls #320/2,d1 ; number of bytes/video line
add.w d0,d1 ; d1 = x + y
add.w d1,a2 ; add base screen address; a2 now
point to the address of the bitplanes slice
and.w #$f,d3 ; compute x % 16 to get the
position to plot at in a bitplane slice
move.w #$8000,d1
lsr.w d3,d1 ; now set the corresponding bitfield
bit
; prepare unset mask
move.w d1,d3
not.w d3
; first bitplane
btst #0,d2
beq r1
or.w d1,(a2)+ ; set
bra s1
r1 and.w d3,(a2)+ ; unset
s1 lsr.w #1,d2
; second bitplane
btst #0,d2
beq r2
or.w d1,(a2)+
bra s2
r2 and.w d3,(a2)+
s2 lsr.w #1,d2
; third bitplane
btst #0,d2
beq r3
or.w d1,(a2)+
bra s3
r3 and.w d3,(a2)+
s3 lsr.w #1,d2
; fourth bitplane
btst #0,d2
beq r4
or.w d1,(a2)+
bra s4
r4 and.w d3,(a2)+
s4 lsr.w #1,d2
rts
For code golfing we can drop the generic routine to just set
the bit planes we want and avoid all these checks :
...
move.w d1,d3
not.w d3
or.w d1,(a2)+ ; set red
and.l d3,(a2)+ ; unset two bit planes at the
same time
and.w d3,(a2)+ ; unset the last one, could also
omit this if a single bit plane is only used (same for previous bit
planes !)
...
The position calculation can be shortened and x/y can be
packed into a single register, there is a lot of ways that this put
pixels code can be reduced down depending on the use
case.
ST graphics
quick-start : VT52 emulation
TOS VT52
emulation can be used as a shortcut to perform some graphics stuff
(or just text things !) such as clearing the screen or disabling
the cursor, it is smaller than direct access when used as a combo,
the GEMDOS/TOS
VT52 extended commands are particularly short :
...
pea vt52Commands(pc) ; push commands address on
stack
move.w #9,-(sp) ; push GEMDOS Cconws call (write
NULL terminated string to the standard output)
trap #1 ; GEMDOS call
...
vt52Commands
dc.b 27,"f" ; disable cursor
dc.b 27,"c",1 ; set background color to red
(optional)
dc.b 27,"E" ; clear to start of
screen
dc.b 0
This is likely shorter than a direct screen clear when the
background color doesn't need to be set to a particular value, can
also be blended with data... it also feels as fast as the direct
screen clearing method. (much faster than Line A !)
Vertical Synchronization
VSync
might be required to avoid ugly tearing in real time stuff, there
is a short way to do it fortunately by using a XBIOS call which
halts processing until the next vertical blank :
move.l a0,-(sp) ; may trash a0 (which
is useful for Line A) so preserve it; a shorter way is : exg a0,a5
(replace a5 by a free register)
move.w #$25,-(sp)
trap #14
addq.l #2,sp ; fix stack
move.l (sp)+,a0 ; get back a0 (shorter way is : exg
a0,a5)
Stack fix may be ruled out in some cases by careful code
organization, same for a0 preservation.
Here is some more resources on timing on the Atari ST.
This article (also
this one) has in depth details about raster effects on the
ST(e) and show a way to poll the hardware registers which may be
faster than the interrupt road.
Double buffering
Double buffering can be done
easily with XBIOS Setscreen which allow to change logical /
physical screen address.
Line A / Direct
access combo example
Here is a 256 bytes Atari ST prototype of my own (may release
the sources later !) which use a combo of Line A (line call to draw
an edge) and direct access (screen clear, cube outline), it is
quite slow at 8 Mhz so it was recorded at 32 Mhz for the GIF, the
first one has plenty flickering because it is slow and there is no
VSync, second has better timing and VSync :
This 256b intro is
a much more impressive demonstration of using Line A polygon calls
to draw a cube on Atari Falcon, it
use double buffering, VSync and use screen clearing glitches that i
described above, it is probably way too slow on the ST though !
(might also be incompatible due to some Falcon / newer TOS
specifics)
Useful tricks
Some tricks that i found useful for code golfing on Atari ST,
probably valid for any Motorola 68000 powered machines :
- packing two words into a register to multiply the amount of
registers (save whole registers) and avoiding touching the stack,
computation such as addition can also be done on the two values
simultaneously then and could even be given directly to Line A call
in a single instruction, individual words can be accessed using the
swap instruction, exg instruction can also be
interesting to save registers
- self-modifying
code is very useful in some cases
- using dbra or dbcc instructions for short loop (dbra decrement automatically for example)
- movem is sometimes useful to transfer multiple registers directly, it speed things up and can also bring code size down, immediate example would be packing Line A call parameters...
- when general purpose registers are lacking the address registers can still be useful although they have limited operations
- although obvious moveq to load small constants help, same for addq or subq
- short branch instructions (bra.s, beq.s, bne.s etc.), note that most assemblers do this automatically
- addressing
modes
- choosing appropriate data sizes
- if compatibility isn't a concern some values can be
hardcoded such as $#F8000 which is the base screen address
in my case (note : may change with different ST models or even TOS
version etc. so not safe at all !)
back to top