ARM2+ LZ4 decompression routine
Few days ago i ported to ARM2
a LZ4 decompression routine which was made for ARM Cortex-M0. It took
me a few hours to get it working, it did not use much modern ARM
instructions.
The point was to compress my Acorn
Archimedes binaries (because ARM code density isn't great) and
associated data like bitmaps. The later idea is to have some kind
of loader which take care of that dynamically, it must be fast
since it will decode data at runtime.
At first i was going for the old LZW
compression algorithm which is very simple to implement
(compression isn't great though) but stumbled upon LZ4 later on which had great
compression / speed ratio and is also relatively simple.
The decompression speed and simplicity was the most appealing
to me, you don't want to make peoples wait on some logos when you
do a demo that use a lot of optimizations like loops unrolling,
code generation and various lookup data.
The code probably works on many old-school ARM CPU (ARM2,
ARM250, ARM3 and maybe newer), it is completely independent as i
made it rely on its own stack. It is also the most recent version.
(the author could not modify its own post after some times and some
peoples pushed fixes in the comments section of the ARM online
board, those fixes were integrated below)
It does not parse any headers so you may have to remove the
compression header if you compress your data with some LZ4 tools.
You must also append a 16-bit value on top of the compressed data,
this value is the length of the compressed data. Since it is 16-bit
the compressed data must fit into 64KB.
The original author uploaded a simple Perl tool to compress
(with the official
compressor), remove the compression header and append the
16-bit length at the top. I modified it a bit for my needs since it
produced too many files (like some C code) and i wanted the files
to have Arculator
hostfs style file-type/extension. The updated Perl script can be
found below after the decompression routine. The original filename
of that tool was
lz4cut
, just save it
somewhere, do a chmod +x lz4cut
and compress
your files with lz4cut your_filename
the
program will produce two files, the compressed one with headers
(end with lz4) and the one compatible with the decompression
routine below (start with lz4). You must have the lz4 binary
somewhere in your system. (you can compile it from sources)The performances are good (my data is loaded in few seconds)
although the code could probably be much more optimized. As for the
size it compress my 629KB binary to 38KB so it is very efficient!
For some bitmap data it compress 131KB into 25Kb which is also not
bad. Here is some
write-up about LZ4 on 8088/8086 CPU which
can give some hints about what this algorithm is capable of.
Usage
ldr r0,lz4DataAddr
ldr r1,dstDataAddr
bl unlz4
ARM2+ LZ4 decompression routine
dcd 0
dcd 0
dcd 0
dcd 0
.unlz4_stack ; stack goes upward
; LZ4 decompression routine ported to ARM2 by grz-
; credits ->
community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/lz4-decompression-routine-for-cortex-m0-and-later
; make sure to remove compressed data header if you use a
compression tool
; make sure to append a 16-bit value at the top of your compressed
file which represent the compressed data length
; compressed data should thus fit into 64kb
; commented lines is the original version
; r0 = address of the compressed data
; r1 = RAM address of the decompressed data
; r4,r5,r6 = preserved
.unlz4
;ldrh r2,[r0] ; get length of compressed data
ldr r2,[r0]
mov r2,r2, ROR #16
mov r2,r2, LSR #16
adds r0,r0,#2 ; advance source pointer
;push {r4-r6,lr} ;save r4, r5, r6 and return-address
.unlz4_len
adr r13,unlz4_stack
stmfd r13!,{r4-r6,r14}
adds r5,r2,r0 ; point r5 to end of compressed data
.unlz4_getToken
ldrb r6,[r0] ; get token
adds r0,r0,#1 ; advance source pointer
;lsrs r4,r6,#4 ; get literal length, keep token in r6
movs r4,r6, LSR #4
beq unlz4_getOffset ; jump forward if there are no
literals
bl unlz4_getLength ; get length of literals
movs r2,r0 ; point r2 to literals
bl unlz4_copyData ; copy literals (r2=src, r1=dst,
r4=len)
movs r0,r2 ; update source pointer
.unlz4_getOffset
cmp r0,r5
bge unlz4_bye
ldrb r3,[r0,#0] ; get match offset's low byte
subs r2,r1,r3 ; subtract from destination; this will become
the match position
ldrb r3,[r0,#1] ; get match offset's high byte
;lsls r3,r3,#8 ; shift to high byte
movs r3,r3, LSL #8
subs r2,r2,r3 ; subtract from match position
adds r0,r0,#2 ; advance source pointer
;lsls r4,r6,#28 ; get rid of token's high 28 bits
movs r4,r6, LSL #28
;lsrs r4,r4,#28 ; move the 4 low bits back where they
were
movs r4,r4, LSR #28
bl unlz4_getLength ; get length of match data
adds r4,r4,#4 ; minimum match length is 4 bytes
bl unlz4_copyData ; copy match data (r2=src, r1=dst,
r4=len)
cmp r0,r5 ; check if we've reached the end of the compressed
data
blt unlz4_getToken ; if not, go get the next token
;pop {r4-r6,pc} ; restore r4, r5 and r6, then return
ldmfd r13!,{r4-r6,r15}
.unlz4_getLength
cmp r4,#15 ; if length is 15, then more length info
follows
bne unlz4_gotLength ; jump forward if we have the complete
length
.unlz4_getLengthLoop
ldrb r3,[r0] ; read another byte
adds r0,r0,#1 ; advance source pointer
adds r4,r4,r3 ; add byte to length
cmp r3,#255 ; check if end reached
beq unlz4_getLengthLoop ; if not, go round loop
.unlz4_gotLength
;bx lr
mov r15,r14 ; return
.unlz4_copyData
rsbs r4,r4,#0 ; index = -length
subs r2,r2,r4 ; point to end of source
subs r1,r1,r4 ; point to end of destination
.unlz4_copyDataLoop
ldrb r3,[r2,r4] ; read byte from source_end[-index]
strb r3,[r1,r4] ; store byte in destination_end[-index]
adds r4,r4,#1 ; increment index
bne unlz4_copyDataLoop ; keep going until index wraps to
0
;bx lr ; return
mov r15,r14
.unlz4_bye
ldmfd r13!,{r4-r6,pc}
Perl script (lz4cut)
#!/usr/bin/perl
use strict;
my $infile=@ARGV[0] or die("no input file.");
my ($file, $name)=removeExtension("$infile");
my $lz4file="$file.lz4";
my $binfile="$file.bin";
my $cfile="$file.c";
($name) = "$name" =~ /^([^\.]+)/;
my $array = "s" . uc(substr("$name", 0, 1)) . substr("$name",
1);
my $e = system("lz4", "-9", "$infile", "$lz4file");
if(-1 == $e){ die("failed compressing $infile to $lz4file."); }
my $bin;
if(open(LZ4FILE, "<", "$lz4file"))
{
my $b;
if(read(LZ4FILE, $b, 11))
{
my ($i, $f, $j, $l) =
unpack("H8CA2V", $b);
if("$i" eq "04224d18")
{
seek(LZ4FILE, 1, $f & 8);
my $cl =
read(LZ4FILE, $bin, $l);
if($cl ==
$l)
{
print("Extracted $l bytes from $lz4file\n");
print("prepending 16-bit length.\n");
$bin = pack("v", $cl) . "$bin";
}
else
{
die("$lz4file is garbled.");
}
}
}
close(LZ4FILE);
}
if(length($bin))
{
if(open(BINFILE, ">", "$binfile"))
{
print(BINFILE "$bin");
close(BINFILE);
}
if(open(CFILE, ">", "$cfile"))
{
print(CFILE "static const
uint8_t ${array}[] = {\n");
my $data;
while("$bin")
{
$data =
substr($bin, 0, 16);
$bin =
substr("$bin", length($data));
$data =~
s/(.)/sprintf("0x%02x, ", ord($1))/seg;
print(CFILE "\t$data\n");
}
print(CFILE "};\n");
close(CFILE);
}
}
sub removeExtension(pathname)
{
my @path = split(/\//, "$infile");
my $fname = pop(@path);
my @fname = split(/\./, "$fname");
scalar(@fname) > 1 && pop(@fname);
push(@path, join(".", @fname));
(join("/", @path), "$fname");
}
back to top