Home
Admin | Edit

Atari ST code setup / tricks

Contents

Introduction


A small quick-start / reminder to low-level (Motorola 68000 assembly) programming for the Atari ST computers. See this for more resources (setting palette, sound etc.), the Atari GEMDOS reference manual can be useful for GEMDOS calls, this write-up is also great and the various intros. (512b, 256b, 128b, 64b, 32b)

Focus is on Atari STF (8Mhz Motorola 68000, 1MiB RAM, computer released in ~1985), may works for other models (Atari Falcon) but untested.

Atari ST is a special platform to me as i was fascinated by it in early 2000s which led into buying an Atari 1040 STF (first retro computer !), the fascination reason was a bit fuzzy but it was probably inspired by the amount of peoples (on online boards, IRC etc.) i knew which tinkered with a ST in the past, it was also because of some impressive demos (Posh, Odd Stuff, Blood, Virtual Escape etc.) and games (Rick Dangerous, Dungeon Master, Populous, Xenon 2, Another World etc.), i also appreciated the bare simplicity of the hardware.

I did little programming on the ST, mainly tinkering with Devpac and doing some GEM "hello world", it was probably too overwhelming for me at the time as i recall struggling with simple pixels plotting, the code i was looking at was probably way too complex (or even tricky) for what i really wanted !

I tried to code on the ST again later but i was probably put off by m68k assembly again, the ISA was a bit weird / overwhelming to me (operands order, longer mnemonics, size suffixes etc.), i later (2024) tried it again and finally appreciate some parts of it although my preference still goes for early RISC type CPU such as ARM. (Acorn Archimedes !)

Note that this article focus on straightforward low res 320x200 graphics (pixels access and some Line A stuff) for code golfing on the ST, may works with higher resolution modes but untested, i use tos104uk TOS ROM image.

Also note that Line A and direct access methods can be combined and it is sometimes preferred for speed or code size because multiple Line A calls may trash some useful registers... (routine dependent but usually d0, d1, d2 and a0, a1, a2 !)

Here is the default 16 colors TOS palette for reference :

Quick-start tooling

Emulator

On Ubuntu i use Hatari, it can be installed from the package manager, it require a TOS ROM (look at ROM entry in the menu) which can be found on the internet, my Hatari config also have a GEMDOS drive path setup (Hard disks menu) where i put .prg or .tos programs that i can run from TOS and i enabled faster boot in System menu entry, boosting the CPU clock to 32 Mhz (CPU menu entry) is useful to prototype on because of faster boot and overall speed.

Assembler

I use vasm as an assembler under Linux, another good 68k assembler i knew from my Z80 years is WLA-DX although i didn't test it for 68k.

To install vasm and compile a file such as my_assembly_source.s under Ubuntu (may also works for others) :
This assemble as a valid compatible .tos (relocatable ?) executable but a shorter way is to craft the program header ourselves (we don't need data, bss sections etc.) so that we have complete control on the executable content :

    dc.w $601a     ; ph_branch (branch to the code)
    dc.l end-start ; ph_tlen
    dc.l $0        ; ph_dlen
    dc.l $0        ; ph_blen
    dc.l $0        ; ph_slen
    dc.l $0        ; ph_res1
    dc.l $0        ; ph_prgflags
    dc.w $1        ; ph_absflag (absolute; simpler; not relocatable)
start
    loop:
        bra loop
end

This is a valid .tos custom program, it loop indefinitely, it can be assembled with : vasmm68k_mot -Fbin my_assembly_source.s -o my_st_raw_binary.tos

Documentation on the program header can be found here.

Debugger

Hatari can go in debug mode on a keypress (hotkey must be configured in the menu) when it is launched from a terminal, hitting the hotkey will pause the emulator and go straight into debug mode where registers can be inspected with commands such as cpureg. (type help for a list of debugger commands)

This online tool is quite handy as an interactive Motorola 68000 documentation. (has instructions with examples and integrated live editor / debugger)

Disassembler

There is probably a disassembler with vasm (vobjdump ?) but i anyway use the web version of ImHex which has a neat set of tooling such as hex editor and various disassembler.

ST graphics quick-start : Line A


Line A is a quick way for low-level relatively fast bundled graphics routines on the ST, it is great for code golfing as it provide simple ways to do graphics, it can also provide various type of information and can also go beyond simple pixels access. (line, filled rectangles, polygons, fill, blit etc.)

Useful resources on Line A API for me was this documentation which list the variables index, this is also nice for a quick overview of the API.

Line A was surprising at first as it use a normally illegal instruction but that is also why it works... it trigger an exception but the execution is routed to the routines code. :)

Initialization

Line A must be initialized first before calling any routines, the Line A call also provide various useful pointers in d0, a0, a1, a2, most useful at first is d0 or a0 which contain the pointer to the base address of Line A interface variables, here is a Line A initialization sample that also get VPLANES, VWRAP, CONTRL, INTIN and PTSIN variables content in a1, a2, a3 and a4 (last two is used for Line A pixels access) and hide cursor :

...
start
    dc.w $a000 ; Line A init call
    movem.l (a0),a1-a4 ; get some useful variables content in a1-a4
    sf -6(a0) ; disable cursor (optional; see VT52 emulation below for a shorter way)
    loop:
        bra loop
end

Note that this is a rather generic way to initialize Line A, the movem line can be removed if there is no Line A calls that require the use of the variables such as rectangle call etc. (Line A pixels access do use INTIN and PTSIN for example)

Filled rectangle (screen clearing)

Now here is a complete example of a low res (320x200) screen clearing (black color; default palette) call using Line A rectangle fill routine :

move.l #-1,24(a0) ; bit-plane 0 and 1 (color)
move.l #-1,28(a0) ; bit-plane 2 and 3 (color)
move.w #0,36(a0) ; writing mode (0: replace, 1: transparent, 2: xor, 3: inverse of 1)
move.l #$00000000,38(a0) ; x1: 0 y2: 0
move.l #$013f00c7,42(a0) ; x2: 319 y2: 199
move.l #$a886,46(a0) ; ptr to fill pattern (tested on tos104uk)
move.w #0,50(a0) ; fill pattern mask
move.w #0,52(a0) ; multi-plane fill pattern flag
move.w #0,54(a0) ; clipping off
movem.l a0,-(sp)
dc.w $a005 ; Line A filled rectangle call
movem.l (sp)+,a0

Note that the rectangle call require a pointer to a fill pattern which means additional code or data to embed in the program so i use a trick to avoid this here : a pointer that was found by trial and error by looking around for a full fill pattern, disadvantage of this trick is compatibility, the pattern may change with different TOS version / RAM content !

This code is quite lengthy... but most of these parameters are actually unneeded for a full screen clear :

clr.w 36(a0) ; replace mode (required otherwise it doesn't do anything)
move.l #$00000000,38(a0) ; x1: 0 y2: 0
move.l #$013f00c7,42(a0) ; x2: 320 y2: 200
clr.w 54(a0) ; clipping off (it clear only a tiny area if removed)
movem.l a0,-(sp)
dc.w $a005 ; Line A filled rectangle call
movem.l (sp)+,a0

This clear with the default color and shows that clipping and writing mode parameters are mandatory. Note that it also require the fill pattern pointer when the clear color is custom. (don't know why but this was from my tests)

a0 is saved on stack before the call to preserve the register, there is some ways to avoid these stack instructions to gain some bytes by having better registers organization. (avoiding the use of Line A trashed registers)

Disadvantage of Line A screen clearing is that it is very slow... and require about the same amount of bytes (if not more !) than direct access screen clearing, it may be useful in some very limited cases but the speed overhead is still quite bad.

Pixels access

Now that we have a full screen clear we can try plotting a white centered pixel with Line A :

move.w #0,(a3) ; pixel value (color)
move.l #$00a00064,(a4) ; x: 160, y: 100 could also be two instructions : move.w #$00a0,(a4) and move.w #$0064,2(a4)
dc.w $a001 ; put pixel Line A call

Now this is short... coordinates are as is and can be contained in a single register so we could pack them with instructions such as swap etc.

Note that the Line A call for "put pixel" trash some registers on my test, i didn't wrap the call here but it might be needed on some use cases.

More Line A speed

Routines call can be sped up through a direct call (avoiding interrupt) with the table of function pointers given by the init call in a2 and XBIOS function 38 to bypass memory protection, see here for details. It can probably be done with GEMDOS $20 call to go in supervisor mode (see here or direct access below), all of this may require some precious bytes though...

ST graphics quick-start : Direct access


The alternative to Line A is direct access to the hardware, the fastest method on a bare ST ! It is also quite short although there is still some disadvantages for code golfing. (must compute coordinates)

Initialization

Direct access on the ST (hardware registers and system variables) require to escalate to supervisor mode which can be done with a GEMDOS call :

...
start
    move.w #$20,-(sp)
    trap #1
    loop
        bra loop
end

Then we can do..

Screen clearing

moveq #0,d1
move.l ($44e),a2
move.w #200*80/2,d0 ; 320 / 4 * 200 / 2
cls:
    move.l d1,(a2)+
    dbra d0,cls

Quite small compared to a Line A rectangle call and full controls on the trashed registers... and fast.

There is ways to reduce this code by reorganizing it, there is no needs to clear d1 if a register is already set to 0 (value can be grabbed from RAM also !) and the loop start value could come from another register or memory. (with a shift to make it close to the screen buffer size)

Using a random register for d1 (or use a2) can produce some nice glitches also.

See VT52 section below for an alternative that may be shorter and almost as fast.

Pixels access

Pixels access on the ST may be "hard" to get compared to other platforms (especially modern ones) due to planar graphics. (advantage is space efficiency and speed for some stuff, there is also tricks as in 0-bitplanes demos !)

Here is an example of a generic direct access put pixel routine which emulate the behavior of a Line A call :

put_pixel ; call this with bsr instruction with d0 and d1 being X/Y and d2 the color index
    move.l ($44e),a2 ; get base screen address in a2
    move.w d0,d3
    and #$fff0,d0 ; align x
    lsr.w #1,d0 ; log2(8/v_planes) where v_planes is number of bitplane (4 in low res 16 colors)
    muls #320/2,d1 ; number of bytes/video line
    add.w d0,d1 ; d1 = x + y
    add.w d1,a2 ; add base screen address; a2 now point to the address of the bitplanes slice
    and.w #$f,d3 ; compute x % 16 to get the position to plot at in a bitplane slice
    move.w #$8000,d1
    lsr.w d3,d1 ; now set the corresponding bitfield bit

    ; prepare unset mask
    move.w d1,d3
    not.w d3

    ; first bitplane
    btst #0,d2
    beq r1
        or.w d1,(a2)+ ; set
        bra s1
r1  and.w d3,(a2)+ ; unset
s1  lsr.w #1,d2

    ; second bitplane
    btst #0,d2
    beq r2
        or.w d1,(a2)+
        bra s2
r2  and.w d3,(a2)+
s2  lsr.w #1,d2

    ; third bitplane
    btst #0,d2
    beq r3
        or.w d1,(a2)+
        bra s3
r3  and.w d3,(a2)+
s3  lsr.w #1,d2

    ; fourth bitplane
    btst #0,d2
    beq r4
        or.w d1,(a2)+
        bra s4
r4  and.w d3,(a2)+
s4  lsr.w #1,d2
    rts

For code golfing we can drop the generic routine to just set the bit planes we want and avoid all these checks :

...
    move.w d1,d3
    not.w d3

    or.w d1,(a2)+ ; set red
    and.l d3,(a2)+ ; unset two bit planes at the same time
    and.w d3,(a2)+ ; unset the last one, could also omit this if a single bit plane is only used (same for previous bit planes !)
...

The position calculation can be shortened and x/y can be packed into a single register, there is a lot of ways that this put pixels code can be reduced down depending on the use case.

ST graphics quick-start : VT52 emulation


TOS VT52 emulation can be used as a shortcut to perform some graphics stuff (or just text things !) such as clearing the screen or disabling the cursor, it is smaller than direct access when used as a combo, the GEMDOS/TOS VT52 extended commands are particularly short :

...
    pea vt52Commands(pc) ; push commands address on stack
    move.w #9,-(sp) ; push GEMDOS Cconws call (write NULL terminated string to the standard output)
    trap #1 ; GEMDOS call
...
vt52Commands
    dc.b 27,"f"   ; disable cursor
    dc.b 27,"c",1 ; set background color to red (optional)
    dc.b 27,"E"   ; clear to start of screen
    dc.b 0

This is likely shorter than a direct screen clear when the background color doesn't need to be set to a particular value, can also be blended with data... it also feels as fast as the direct screen clearing method. (much faster than Line A !)

Vertical Synchronization


VSync might be required to avoid ugly tearing in real time stuff, there is a short way to do it fortunately by using a XBIOS call which halts processing until the next vertical blank :

move.l a0,-(sp) ; may trash a0 (which is useful for Line A) so preserve it; a shorter way is : exg a0,a5 (replace a5 by a free register)
move.w #$25,-(sp)
trap #14
addq.l #2,sp ; fix stack
move.l (sp)+,a0 ; get back a0 (shorter way is : exg a0,a5)

Stack fix may be ruled out in some cases by careful code organization, same for a0 preservation.

Here is some more resources on timing on the Atari ST.

This article (also this one) has in depth details about raster effects on the ST(e) and show a way to poll the hardware registers which may be faster than the interrupt road.

Double buffering


Double buffering can be done easily with XBIOS Setscreen which allow to change logical / physical screen address.

Line A / Direct access combo example


Here is a 256 bytes Atari ST prototype of my own (may release the sources later !) which use a combo of Line A (line call to draw an edge) and direct access (screen clear, cube outline), it is quite slow at 8 Mhz so it was recorded at 32 Mhz for the GIF, the first one has plenty flickering because it is slow and there is no VSync, second has better timing and VSync :


This 256b intro is a much more impressive demonstration of using Line A polygon calls to draw a cube on Atari Falcon, it use double buffering, VSync and use screen clearing glitches that i described above, it is probably way too slow on the ST though ! (might also be incompatible due to some Falcon / newer TOS specifics)

Useful tricks


Some tricks that i found useful for code golfing on Atari ST, probably valid for any Motorola 68000 powered machines :
  • packing two words into a register to multiply the amount of registers (save whole registers) and avoiding touching the stack, computation such as addition can also be done on the two values simultaneously then and could even be given directly to Line A call in a single instruction, individual words can be accessed using the swap instruction, exg instruction can also be interesting to save registers
  • self-modifying code is very useful in some cases
  • using dbra or dbcc instructions for short loop (dbra decrement automatically for example)
  • movem is sometimes useful to transfer multiple registers directly, it speed things up and can also bring code size down, immediate example would be packing Line A call parameters...
  • when general purpose registers are lacking the address registers can still be useful although they have limited operations
  • although obvious moveq to load small constants help, same for addq or subq
  • short branch instructions (bra.s, beq.s, bne.s etc.), note that most assemblers do this automatically
  • addressing modes
  • choosing appropriate data sizes
  • if compatibility isn't a concern some values can be hardcoded such as $#F8000 which is the base screen address in my case (note : may change with different ST models or even TOS version etc. so not safe at all !)

back to topLicence Creative Commons