Arm command. Assembly school: assembly language for ARM architecture central processors. ARM assembler instructions

24.06.2020 Data recovery

Hi all!
By occupation I am a Java programmer. Last months work made me get acquainted with development for Android NDK and, accordingly, writing native applications in C. Here I ran into the problem of optimizing Linux libraries. Many turned out to be completely unoptimized for ARM and heavily loaded the processor. Previously, I had practically never programmed in assembly language, so at first it was difficult to start learning this language, but still I decided to try. This article was written, so to speak, from a beginner for beginners. I will try to describe the basics that I have already learned, I hope this will interest someone. In addition, I will be glad to receive constructive criticism from professionals.

Introduction

So, first, let's figure out what ARM is. Wikipedia gives this definition:

ARM architecture (Advanced RISC Machine, Acorn RISC Machine, advanced RISC machine) is a family of licensed 32-bit and 64-bit microprocessor cores developed by ARM Limited. The company exclusively develops kernels and tools for them (compilers, debugging tools, etc.), making money by licensing the architecture to third-party manufacturers.

If anyone doesn't know, now most mobile devices, tablets are designed specifically on this processor architecture. The main advantage of this family is low power consumption, due to which it is often used in various embedded systems. The architecture has evolved over time, and starting with ARMv7, 3 profiles have been defined: ‘A’(application) - applications, ‘R’ (real time) - real time, ‘M’ (microcontroller) - microcontroller. You can read the history of the development of this technology and other interesting data on Wikipedia or by googling it on the Internet. ARM supports different operating modes (Thumb and ARM, in addition, Thumb-2 has recently appeared, which is a mixture of ARM and Thumb). In this article, we will look at the ARM mode itself, in which a 32-bit instruction set is executed.

Each ARM processor is created from the following blocks:

37 registers (of which only 17 are visible during development)
Arithmetic logic unit (ALU) - performs arithmetic and logical tasks
Barrel shifter - a device designed to move blocks of data a certain number of bits
The CP15 - special system, controlling ARM coprocessors
Instruction decoder - deals with converting instructions into a sequence of micro-operations

These are not all components of ARM, but delving into the jungle of processor construction is beyond the scope of this article.

Pipeline execution

ARM processors use a 3-stage pipeline (starting with ARM8, a 5-stage pipeline was implemented). Let's look at a simple pipeline using the ARM7TDMI processor as an example. The execution of each instruction consists of three steps:

1. Sampling stage (F)
At this stage, instructions flow from RAM into the processor pipeline.
2. Decoding stage (D)
The instructions are decoded and their type is recognized.
3. Execution phase (E)
Data enters the ALU and is executed and the resulting value is written to the specified register.

But when developing, one must take into account that there are instructions that use several execution cycles, for example, load(LDR) or store. In this case, the execution stage (E) is divided into stages (E1, E2, E3...).

Conditional execution

One of the most important functions of the ARM assembler is conditional execution. Each instruction can be executed conditionally and suffixes are used for this. If a suffix is added to the name of an instruction, the parameters are checked before executing it. If the parameters do not meet the condition, then the instruction is not executed. Suffixes:
MI - negative number
PL - positive or zero
AL - always execute instructions
There are many more conditional execution suffixes. Read the rest of the suffixes and examples in the official documentation: ARM documentation
Now it's time to consider...

Basic ARM assembler syntax

For those who have worked with assembler before, you can actually skip this point. For everyone else, I will describe the basics of working with this language. So, every assembly language program consists of instructions. The instruction is created in this way:
(label) (instruction|operands) (@comment)
Label is an optional parameter. Instruction is a direct mnemonic of instructions to the processor. Basic instructions and their use will be discussed below. Operands - constants, register addresses, addresses in random access memory. A comment is an optional parameter that does not affect program execution.

Register names

The following register names are allowed:
1.r0-r15

3.v1-v8 (variable registers, r4 to r11)

4.sb and SB (static register, r9)

5.sl and SL (r10)

6.fp and FP (r11)

7.ip and IP (r12)

8.sp and SP (r13)

9.lr and LR (r14)

10.pc and PC (program counter, r15).

Variables and constants

In ARM assembler, like any (practically) other programming language, variables and constants can be used. They are divided into the following types:

Numerical
brain teaser
String

Numeric variables are initialized like this:
a SETA 100; a numeric variable "a" is created with the value 100.
String variables:
improb SETS "literal"; a variable improb is created with the value “literal”. ATTENTION! The variable value cannot exceed 5120 characters.
Boolean variables use the values TRUE and FALSE respectively.

Examples of ARM assembler instructions

In this table I have collected the basic instructions that will be required for further development (at the most basic stage:):

To reinforce the use of basic instructions, let's write some simple examples, but first we will need an arm toolchain. I work on Linux so I chose: frank.harvard.edu/~coldwell/toolchain (arm-unknown-linux-gnu toolchain). It can be installed as easily as any other program on Linux. In my case (Russian Fedora) I only needed to install rpm packages from the site.
Now it's time to write simplest example. The program will be absolutely useless, but the main thing is that it will work :) Here is the code that I offer you:
start: @ Optional line indicating the beginning of the program mov r0, #3 @ Load register r0 with the value 3 mov r1, #2 @ Do the same with register r1, only now with the value 2 add r2, r1, r0 @ Add the values of r0 and r1, the answer is written to r2 mul r3, r1, r0 @ Multiply the value of register r1 by the value of register r0, the answer is written to r3 stop: b stop @ Program termination line
We compile the program to obtain the .bin file:
/usr/arm/bin/arm-unknown-linux-gnu-as -o arm.o arm.s /usr/arm/bin/arm-unknown-linux-gnu-ld -Ttext=0x0 -o arm.elf arm .o /usr/arm/bin/arm-unknown-linux-gnu-objcopy -O binary arm.elf arm.bin
(the code is in the arm.s file, and the toolchain in my case is in the /usr/arm/bin/ directory)
If everything went well, you will have 3 files: arm.s (the actual code), arm.o, arm.elf, arm.bin (the actual executable program). In order to check the operation of the program, it is not necessary to have your own arm device. It is enough to install QEMU. For reference:

QEMU is a free program with open source source code to emulate hardware of various platforms.
Includes emulation of Intel x86 processors and I/O devices. Can emulate 80386, 80486, Pentium, Pentium Pro, AMD64 and other x86-compatible processors; PowerPC, ARM, MIPS, SPARC, SPARC64, m68k - only partially.
Works on Syllable, FreeBSD, FreeDOS, Linux, Windows 9x, Windows 2000, Mac OS X, QNX, Android, etc.

So, to emulate arm you will need qemu-system-arm. This package is in yum, so for those who have Fedora, you don’t have to bother and just run the command:
yum install qemu-system-arm

Next, we need to launch the ARM emulator so that it executes our arm.bin program. To do this, we will create a file flash.bin, which will be flash memory for QEMU. It's very easy to do this:
dd if=/dev/zero of=flash.bin bs=4096 count=4096 dd if=arm.bin of=flash.bin bs=4096 conv=notrunc
Now we load QEMU with the resulting flash memory:
qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null
The output will be something like this:

$ qemu-system-arm -M connex -pflash flash.bin -nographic -serial /dev/null
QEMU 0.15.1 monitor - type "help" for more information
(qemu)

Our arm.bin program had to change the values of four registers, therefore, to check the correct operation, let's look at these same registers. This is done with a very simple command: info registers
At the output you will see all 15 ARM registers, and four of them will have changed values. Check :) The register values match those that can be expected after program execution:
(qemu) info registers R00=00000003 R01=00000002 R02=00000005 R03=00000006 R04=00000000 R05=00000000 R06=00000000 R07=00000000 R08=00000000 R09=000 00000 R10=00000000 R11=00000000 R12=00000000 R13=00000000 R14=00000000 R15=00000010 PSR=400001d3 -Z-- A svc32

P.S. In this article I tried to describe the basics of programming in ARM assembler. I hope you enjoyed it! This will be enough to further delve into the jungle of this language and write programs in it. If everything works out, I will write further about what I find out myself. If there are errors, please do not kick me, as I am new to assembly language.

CISC processors perform quite complex operations in one instruction, including arithmetic and logical operations on the contents of memory cells. CISC processor instructions can have different lengths.

In contrast, RISC has relatively simple system commands with a clear division by type of operation:

working with memory (reading from memory into registers or writing from registers into memory),
processing data in registers (arithmetic, logical, data shifts left/right or bit rotation in a register),
commands of conditional or unconditional transitions to other addresses.

As a rule (but not always, and only if the program code enters the controller's cache memory), one command is executed within one processor cycle. The length of an ARM processor instruction is fixed - 4 bytes (one computer word). In fact, a modern ARM processor can switch to other operating modes, for example, to THUMB mode, when the instruction length becomes 2 bytes. This allows you to make the code more compact. However, we do not cover this mode in this article, since it is not supported in the Amber ARM v2a processor. For the same reason, we will not consider modes such as Jazelle (optimized for executing Java code) and we will not consider NEON commands - commands for operations on multiple data. After all, we are studying the pure ARM instruction system.

ARM processor registers.

The ARM processor has several sets of registers, of which only 16 are currently available to the programmer. There are several processor operating modes; depending on the operating mode, the corresponding bank of registers is selected. These operating modes:

application mode (USR, user mode),
supervisor mode or mode operating system(SVC, supervisor mode),
interrupt processing mode (IRQ, interrupt mode) and
“urgent interruption” processing mode (FIRQ, fast interrupt mode).

That is, for example, when an interrupt occurs, the processor itself goes to the address of the interrupt handler program and automatically “switches” register banks.

ARM processors of older versions, in addition to the above operating modes, have additional modes:

Abort (used to handle memory access exceptions),
Undefined (used to implement a coprocessor in software) and
privileged task mode of the operating system System.

The Amber ARM v2a processor does not have these additional three modes.

For Amber ARM v2a, the set of registers can be represented as follows:

Registers r0-r7 are the same for all modes.
Registers r8-r12 are common only for USR, SVC, IRQ modes.
Register r13 is a stack pointer. He is his own in all modes.
Register r14 - the return register from the subroutine is also different in all modes.
Register r15 is a pointer to executable instructions. It is common for all modes.

It can be seen that the FIRQ mode is the most isolated, it has the most of its own registers. This is done so that some very critical interrupt can be processed without saving registers on the stack, without wasting time.

Particular attention should be paid to register r15, also known as pc (Program Counter) - a pointer to executable commands. You can perform various arithmetic and logical operations on its contents, thereby the program execution will move to other addresses. However, specifically for the ARM v2a processor implemented in the Amber system there are some subtleties in the interpretation of the bits of this register.

The fact is that in this processor, in register r15 (pc), in addition to the actual pointer to the executable commands, the following information is contained:

Bits 31:28 - flags for the result of an arithmetic or logical operation
Bits 27 - interrupt IRQ mask, interrupts are disabled when the bit is set.
Bits 26 - FIRQ interrupt mask, fast interrupts are disabled when the bit is set.
Bits 25:2 - the actual pointer to program instructions takes up only 26 bits.
Bits 1:0 - current processor operating mode.
3 - Supervisor
2 - Interrupt
1 - Fast Interrupt
0 - User

In older ARM processors, all flags and service bits are located in separate registers Current Program Status Register(cpsr) and Saved Program Status Register (spsr), for access to which there are separate special commands. This is done in order to expand the available address space for programs.

One of the difficulties in mastering ARM assembler is the alternative names of some registers. So, as mentioned above, r15 is the same pc. There is also r13 - this is the same sp (Stack Pointer), r14 is lr (Link Register) - the return address register from the procedure. In addition, r12 is the same ip (Intra-Procedure -call scratch register) used by C compilers in a special way to access parameters on the stack. Such alternative naming is sometimes confusing when you look at someone else's program code - both of these register designations are found there.

Features of code execution.

In many types of processors (for example, x86), only a transition to another program address can be performed by condition. This is not the case with ARM. Each ARM processor instruction may or may not be executed conditionally. This allows you to minimize the number of transitions through the program and therefore more efficiently use the processor pipeline.

After all, what is a pipeline? One processor instruction is now selected from the program code, the previous one is already being decoded, and the previous one is already being executed. This is the case of the 3-stage pipeline of the Amber A23 processor, which we use in our project for the Mars Rover2Mars Rover2 board. The modification of the Amber A25 processor has a 5-stage pipeline, it is even more efficient. But, there is one big BUT. Jump commands force the processor to clear the pipeline and refill it. Thus, a new command is selected, but there is still nothing to decode and, moreover, nothing to execute immediately. The efficiency of code execution decreases with frequent transitions. IN modern processors There are all sorts of transition prediction mechanisms that somehow optimize the filling of the pipeline, but our processor does not have this. In any case, ARM was wise to make it possible for each command to be executed conditionally.

On an ARM processor, in any type of instruction, the four bits of the instruction execution condition are encoded in the highest four bits of the instruction code:

There are a total of 4 condition flags in the processor:
. Negative - the result of the operation was negative,
. Zero - the result is zero,
. Carry - when performing an operation with unsigned numbers, a carry occurred,
. oVerflow - an overflow occurred when performing an operation with signed numbers, the result does not fit into the register)

These 4 flags form many possible condition combinations:

Code	Suffix	Meaning	Flags
4"h0	eq	Equal	Z set
4"h1	ne	Not equal	Z clear
4"h2	cs/hs	Carry set / unsigned higher or same	C set
4"h3	cc/lo	Carry clear / unsigned lower	C clear
4"h4	mi	Minus/negative	N set
4"h5	pl	Plus / positive or zero	N clear
4"h6	vs	Overflow	V set
4"h7	vc	No overflow	V clear
4"h8	hi	Unsigned higher	C set and Z clear
4"h9	ls	Unsigned lower or same	C clear or Z set
4"ha	ge	Signed greater than or equal	N == V
4"hb	lt	Signed less than	N != V
4"hc	GT	Signed greater than	Z == 0,N == V
4"HD	le	Signed less than or equal	Z == 1 or N != V
4"he	al	Always (unconditional)
4"hf	-	Invalid condition

Now this leads to another difficulty in learning ARM processor instructions - the many suffixes that can be added to the instruction code. For example, addition provided the Z flag is set is the command addeq as add + suffix eq . Jump to subroutine if flag N=0 is blpl as bl + suffix pl .

Flags (Negative, Zero, Carry, oVerflow) the same is not always established for arithmetic or logical operations, as happens, say, in an x86 processor, but only when the programmer wants. For this, there is another suffix to the command mnemonics: “s” (in the command code it is encoded by bit 20). Thus, the addition command does not change the flags, but the adds command does change the flags. Or there may also be a conditional addition command, but which changes the flags. For example: addgts. It is clear that the number of possible combinations of command names with different suffixes for conditional execution and setting flags makes the assembly code of an ARM processor very peculiar and difficult to read. However, over time you get used to it and begin to understand this text.

Arithmetic and logical operations (Data Processing).

The ARM processor can perform various arithmetic and logical operations.

The actual four-bit operation code (Opcode) is contained in the processor instruction bits.

Any operation is performed on the contents of the register and the so-called shifter_operand. The result of the operation is placed in the register. The four-bit Rn and Rd are indexes of registers in the active bank of the processor.

Depending on the I 25 bit, shifter_operand is treated either as a numeric constant, or as an index of the second register of the operand, and even a shift operation on the value of the second operand.

Simple examples of assembler commands would look like this:

add r0,r1,r2 @ place the sum of the values of registers r1 and r2 into register r0
sub r5,r4,#7 @ place the difference (r4-7) in register r5

The operations performed are coded as follows:

4"h0 and Logical AND Rd:= Rn AND shifter_operand
4"h1 eor Logical exclusive OR Rd:= Rn XOR shifter_operand
4"h2 sub Arithmetic subtraction Rd:= Rn - shifter_operand
4"h3 rsb Arithmetic reverse subtraction Rd:= shifter_operand - Rn
4"h4 add Arithmetic addition Rd:= Rn + shifter_operand
4"h5 adc Arithmetic addition plus carry flag Rd:= Rn + shifter_operand + Carry Flag
4"h6 sbc Arithmetic subtraction with carry Rd:= Rn - shifter_operand - NOT(Carry Flag)
4"h7 rsc Arithmetic reverse subtraction with carry Rd:= shifter_operand - Rn - NOT(Carry Flag)
4"h8 tst Logical AND, but without storing the result, only the flags Rn AND shifter_operand S bit always set are changed
4"h9 teq Logical exclusive OR, but without storing the result, only the flags Rn EOR shifter_operand are changed
S bit always set
4"ha cmp Comparison, or rather arithmetic subtraction without storing the result, only the Rn flags change - shifter_operand S bit always set
4"hb cmn Comparison of inverse, or rather arithmetic addition without storing the result, only the flags Rn + shifter_operand S bit always set change
4"hc orr Logical OR Rd:= Rn OR shifter_operand
4"hd mov Copy value Rd:= shifter_operand (no first operand)
4"he bic Reset bits Rd:= Rn AND NOT(shifter_operand)
4"hf mvn Copy inverse value Rd:= NOT shifter_operand (no first operand)

Barrel shifter.

The ARM processor has a special “barrel shifter” circuit that allows one of the operands to be shifted or rotated by given number bit before any arithmetic or logical operation. This is a rather interesting feature of the processor, which allows you to create very efficient code.

For example:

@multiplying by 9 is multiplying a number by 8
@ by shifting left by 3 bits plus another number
add r0, r1, r1, lsl #3 @ r0= r1+(r1<<3) = r1*9

@ multiplying by 15 is multiplying by 16 minus the number
rsb r0, r1, r1, lsl #4 @ r0= (r1<<4)-r1 = r1*15

@ access to a table of 4 byte words, where
@r1 is the base address of the table
@r2 is the index of the element in the table
ldr r0,

In addition to the logical left shift lsl, there is also a logical right shift lsr and an arithmetic right shift asr (a sign-preserving shift, the most significant bit is multiplied on the left simultaneously with the shift).

There is also rotation of the ror bits - the bits are moved out to the right and those that are pulled out are pushed in to the left.
There is a one bit shift via the C flag - this is the rrx command. The register value is shifted to the right by one bit. On the left, the C flag is loaded into the most significant bit of the register.

The shift can be carried out not by a fixed constant number, but by the value of the third operand register. For example:

add r0, r1, r1, lsr r3 @ this is r0 = r1 + (r1>>r3);
add r0, r0, r1, lsr r3 @ this is r0 = r0 + (r1>>r3);

So shifter_operand is what we describe in assembler commands, for example as "r1, lsr r3" or "r2, lsl #5".

The most interesting thing is that using shifts in operations costs nothing. These shifts (usually) do not require additional clock cycles, which is very good for system performance.

Using numeric operands.

Arithmetic or logical operations can use not only the contents of a register, but also a numeric constant as the second operand.

Unfortunately, there is one important limitation here. Since all commands have a fixed length of 4 bytes (32 bits), it will not be possible to encode “any” number in it. In the operation code, 4 bits are already occupied by the execution condition code (Cond), 4 bits for the operation code itself (Opcode), then 4 bits - the receiver register Rd, and another 4 bits - the register of the first operand Rn, plus various flags I 25 (just denotes a numerical constant in the operation code) and S 20 (setting flags after the operation). In total, there are only 12 bits left for a possible constant, the so-called shifter_operand - we saw this above. Since 12 bits can encode numbers only in a narrow range, the developers of the ARM processor decided to encode the constant as follows. The twelve bits of shifter_operand are divided into two parts: the four-bit rotation indicator encode_imm and the actual eight-bit numeric value imm_8.

On an ARM processor, a constant is defined as an eight-bit number inside a 32-bit number, rotated to the right by an even number of bits. That is:

imm_32 = imm_8 ROR (encode_imm *2)

It turned out pretty tricky. It turns out that not every constant number can be used in assembler commands.

You can write

add r0, r2, #255 @ constant in decimal form
add r0, r3, #0xFF @ constant in hexadecimal

since 255 is in the 8 bit range. These commands will be compiled like this:

0: e28200ff add r0, r2, #255 ; 0xff
4: e28300ff add r0, r3, #255 ; 0xff

And you can even write

add r0, r4, #512
add r0, r5, 0x650000

The compiled code will look like this:

0: e2840c02 add r0, r4, #512 ; 0x200
4: e2850865 add r0, r5, #6619136 ; 0x650000

In this case, the number 512 itself, of course, does not fit into the byte. But then we imagine it in hexadecimal form 32’h00000200 and see that this is 2 expanded to the right by 24 bits (1 ror 24). The rotation coefficient is two times less than 24, that is, 12. So it turns out shifter_operand = ( 4’hc , 8’h02 ) - these are the twelve least significant bits of the command. Same with the number 0x650000. For him, shifter_operand = ( 4’h8, 8’h65 ).

It is clear that you cannot write

add r0, r1,#1234567

or you can't write

mov r0, #511

since here the number cannot be represented in the form of imm_8 and encode_imm - the rotation factor. The assembler compiler will throw an error.

What to do when a constant cannot be directly encoded into shifter_operand ? We'll have to do all sorts of tricks.
For example, you can first load the number 512 into a free register, and then subtract one:

mov r0, #511
sub r0,r0,#1

The second way to load a specific number into a register is to read it from a specially reserved variable located in memory:

ldr r7,my_var
.....
my_var: .word 0x123456

The easiest way to write it is like this:

ldr r2,=511

In this case (note the "=" sign), if the constant can be represented as imm_8 and encode_imm , if it can fit into bit 12 of shifter_operand , then the assembly compiler will automatically compile ldr into a mov instruction. But if the number cannot be represented this way, then the compiler itself will reserve a memory cell in the program for this constant, and will itself give this memory cell a name and compile the command into ldr .

This is what I wrote:

ldr r7,my_var
ldr r8,=511
ldr r8,=1024
ldr r9,=0x3456
........
My_var: .word 0x123456

After compilation I got this:

18: e59f7030 ldr r7, ; 50
1c: e59f8030 ldr r8, ; 54
20: e3a08b01 mov r8, #1024 ; 0x400
24: e59f902c ldr r9, ; 58
.............
00000050 :
50: 00123456 .word 0x00123456
54: 000001ff .word 0x000001ff
58: 00003456 .word 0x00003456

Note that the compiler uses memory addressing relative to the pc register (aka r15).

Reading a memory cell and writing a register to memory.

As I wrote above, the ARM processor can only perform arithmetic or logical operations on the contents of registers. Data for operations must be read from memory and the result of the operations must be written back into memory. There are special commands for this: ldr (probably from the combination “LoaD Register”) for reading and str (probably “STore Register”) for writing.

It would seem that there are only two teams, but in fact they have many variations. Just look at the way the ldr /str commands are encoded on the Amber ARM processor to see how many auxiliary flag bits are L 20, W 21, B 22, U 23, P 24, I 25 - and they determine the specific behavior of the command:

Bit L 20 determines write or read. 1 - ldr, read, 0 - str, write.
Bit B 22 determines the read/write of a 32-bit word or 8-bit byte. 1 means byte operation. When a byte is read into a register, the most significant bits of the register are reset to zero.
Bit I 25 determines the use of the Offset field. If I 25 ==0, then Offset is interpreted as a numeric offset that must either be added to the base address from the register or subtracted. But adding or subtracting depends on bit U 23.

(Cond) - condition for performing the operation. Interpreted in the same way as for logical/arithmetic commands - reading or writing can be conditional.

Thus, in assembly text you can write something like this:

ldr r1, @ into register r1 read the word at the address from register r0
ldrb r1, @ into register r1 read byte at address from register r0
ldreq r2, @ conditional word reading
ldrgtb r2, @ conditional byte read
ldr r3, @ read word at address 8 relative to address from register r4
ldr r4, @ read word at address -16 relative to address from register r5

Having compiled this text, you can see the actual codes of these commands:

0: e5901000 ldr r1,
4: e5d01000 ldrb r1,
8: 05912000 ldreq r2,
c: c5d12000 ldrbgt r2,
10: e5943008 ldr r3,
14: e5154010 ldr r4,

In the example above I'm only using ldr , but str is used in much the same way.

There are pre-index and post-index write-back memory access modes. In these modes, the memory access pointer is updated before or after the instruction is executed. If you are familiar with the C programming language, then you are familiar with pointer access constructs like ( *psource++;) or ( a=*++psource;). The ARM processor implements this memory access mode. When a read command is executed, two registers are updated at once - the receiver register receives the value read from memory and the value in the pointer register to the memory cell is moved forward or backward.

Writing these commands is, in my opinion, somewhat illogical. It takes a long time to get used to.

ldr r3, ! @psrc++; r3 = *psrc;
ldr r3, , #4 @ r3 = *psrc; psrc++;

The first ldr command first increments the pointer, then reads. The second command first reads, then increments the pointer. The value of the psrc pointer is in register r0.

All the examples discussed above were for the case when bit I 25 in the command code was reset. But it can still be installed! Then the value of the Offset field will not contain a numeric constant, but the third register participating in the operation. Moreover, the value of the third register can still be pre-shifted!

Here are examples of possible code variations:

0: e7921003 ldr r1, @ read address - sum of values from registers r2 and r3
4: e7b21003 ldr r1, ! @ the same, but after reading r2 will be increased by the value from r3
8: e6932004 ldr r2, , r4 @ first there will be a read at address r3, and then r3 will increase by r4
c: e7943185 ldr r3, @ read address r4+r5*8
10: e7b43285 ldr r3, ! @ read address r4+r5*32, after reading r4 will be set to the value of this address
14: e69431a5 ldr r3, , r5, lsr #3 @ address for reading r4, after executing the command r4 will be set to r4+r5/8

These are the variations of read/write commands in the ARM v2a processor.

In older models of ARM processors, this variety of commands is even greater.
This is due to the fact that the processor allows, for example, to read not only words (32-bit numbers) and bytes, but also half-words (16 bits, 2 bytes). Then the suffix “h”, from the word half-word, is added to the ldr / str commands. The commands will look like ldrh or strh . There are also commands for loading half-words ldrsh or bytes ldrsb interpreted as signed numbers. In these cases, the most significant bit of the loaded wordword or byte is multiplied into the most significant bits of the whole word in the receiver register. For example, loading the halfword 0xff25 with the ldrsh command in the destination register results in 0xffffff25 .

Multiple reads and writes.

The ldr /str commands are not the only ones for accessing memory. The ARM processor also has commands that allow you to perform block transfers - you can load the contents of several consecutive words from memory and several registers at once. You can also write the values of several registers sequentially into memory.

Block transfer command mnemonics start at the root ldm (LoaD Multiple) or stm (Store Multiple). But then, as usual in ARM, the story with suffixes begins.

In general, the command looks like this:

op(cond)(mode) Rd(, {Register list} !}

The suffix (Cond) is understandable, this is a condition for executing the command. The suffix (mode) is the transmission mode, more on that later. Rd is a register that determines the base address in memory for reading or writing. Exclamation mark after the register Rd indicates that after the read/write operation it will be modified. The list of registers that are loaded from memory or paged into memory is (Register list).

The list of registers is specified in curly braces separated by commas or as a range. For example:

stm r0,(r3,r1, r5-r8)

The memory will be written out of order. The list simply indicates which registers will be written to memory and that’s it. The command code contains 16 bits reserved for the Register List, exactly the number of registers in the processor bank. Each bit in this field indicates which register will participate in the operation.

Now about the read/write mode. There is room for confusion here. The fact is that for the same action they can be used different names mode.

If we make a small lyrical digression, then we need to talk about... the stack. A stack is a way to access LIFO type data - Last In First Out (wiki) - last in, first out. The stack is widely used in programming when calling procedures and saving the state of registers at the input of functions and restoring them on exit, as well as when passing parameters to called procedures.

There are, who would have thought, four types of memory stack.

The first type is Full Descending. This is when the stack pointer points to an occupied stack element and the stack grows towards decreasing addresses. When you need to put a word on the stack, first the stack pointer is decreased (Decrement Before), then the word is written to the address of the stack pointer. When you need to remove a computer word from the stack, the word is read using the current value of the stack pointer, then the pointer moves up (Increment After).

The second type is Full Ascending. The stack does not grow down, but up, towards larger addresses. The pointer also points to the occupied element. When you need to put a word on the stack, first the stack pointer is incremented, then the word is written to the pointer (Increment Before). When you need to remove from the stack, you first read the stack pointer, because it points to an occupied element, then the stack pointer is decreased (Decrement After).

The third type is Empty Descending. The stack grows downwards, as in the case of Full Descending, but the difference is that the stack pointer points to an unoccupied cell. Thus, when you need to put a word on the stack, an entry is made immediately, then the stack pointer is decreased (Decrement After). When removing from the stack, the pointer is first incremented, then read (Increment Before).

The fourth type is Empty Ascending. I hope everything is clear - the stack grows upward. The stack pointer points to an empty element. Putting on the stack means writing a word to the address of the stack pointer and incrementing the stack pointer (Increment After). Pop from stack - decrement the stack pointer and read the word (Decrement Before).

Thus, when performing operations on the stack, you need to increase or decrease the pointer - (Increment/Decrement) before or after (Before/After) reading/writing into memory, depending on the type of the stack. IN Intel processors, for example, there are special commands for working with the stack such as PUSH (put a word on the stack) or POP (pop a word from the stack). There are no special instructions in the ARM processor, but the ldm and stm instructions are used.

If you implement the stack using ARM processor instructions, you get the following picture:

Why did the same team need to be given different names? I don’t understand at all... Here, of course, it should be noted that the stack standard for ARM is still Full Descending.

The stack pointer in an ARM processor is the sp or r13 register. This is generally the agreement. Of course, writing stm or reading ldm can be done with other base registers as well. However, you need to remember how the sp register differs from other registers - it can be different in different processor operating modes (USR, SVC, IRQ, FIRQ), because they have their own register banks.

And one more note. Write a line like this in ARM assembly code push(r0-r3), Of course you can. Only in reality it will be the same team stmfd sp!,(r0-r3).

Finally, I will give an example of assembly code and its compiled disassembled text. We have:

stmfd sp!,(r0-r3)
stmdb sp!,(r0-r3)
push (r0-r3)

@these three instructions are the same and do the same thing
pop(r0-r3)
ldmia sp!,(r0-r3)
ldmfd r13!,(r0-r3)

Stmfd r4,(r0-r3,r5,r8)
stmea r4!,(r0-r3,r7,r9,lr,pc)
ldm r5,(r0,pc)

After compilation we get:

0: e92d000f push (r0, r1, r2, r3)
4: e92d000f push (r0, r1, r2, r3)
8: e92d000f push (r0, r1, r2, r3)
c: e8bd000f pop (r0, r1, r2, r3)
10: e8bd000f pop (r0, r1, r2, r3)
14: e8bd000f pop (r0, r1, r2, r3)
18: e904012f stmdb r4, (r0, r1, r2, r3, r5, r8)
1c: e8a4c28f stmia r4!, (r0, r1, r2, r3, r7, r9, lr, pc)
20: e8958001 ldm r5, (r0, pc)

Transitions in programs.

Programming is not possible without transitions. In any program there is cyclic execution of code, and calls of procedures and functions, and there is also conditional execution of sections of code.

The Amber ARM v2a processor has only two commands: b (from the word Branch - branch, transition) and bl (Branch with Link - transition while maintaining the return address).

The command syntax is very simple:

b(cond)label
bl(cond)label

It is clear that any transitions can be conditional, that is, the program may contain strange words like these, formed from the roots “b” and “bl” and the condition suffixes (Cond):

beq, bne, bcs, bhs, bcc, blo, bmi, bpl, bvs, bvc, bhi, bls, bge, bgt, ble, bal, b

bleq, blne, blcs, blhs, blcc, bllo, blmi, blpl, blvs, blvc, blhi, blls, blge, blgt, blle, blal, bl

The variety is amazing, isn't it?

The jump command contains a 24-bit offset Offset. The jump address is calculated as the sum of the current value of the pc pointer and the Offset number shifted 2 bits to the left, interpreted as a signed number:

New pc = pc + Offset*4

Thus, the range of transitions is 32MB forward or backward.

Let's look at what a transition is while preserving the return address bl. This command is used to call subroutines. An interesting feature of this command is that the return address from the procedure when calling the procedure is not stored on the stack, like on Intel processors, but in the regular r14 register. Then there is no need to return from the procedure special team ret , like the same Intel processors, or you can simply copy the r14 value back to pc . Now it’s clear why register r14 has alternative name lr (Link Register).

Let's look at the outbyte procedure from the hello-world project for the Amber SoC.

000004a0<_outbyte>:
4a0: e59f1454 ldr r1, ; 8fc< адрес регистра данных UART >
4a4: e59f3454 ldr r3, ; 900< адрес регистра статуса UART >
4a8: e5932000 ldr r2, ; read the current status
4ac: e2022020 and r2, r2, #32
4b0: e3520000 cmp r2, #0 ; check that the UART is not busy
4b4: 05c10000 strbeq r0, ; write a character to the UART only if it is not busy
4b8: 01b0f00e movseq pc, lr ; conditional return from procedure if UART was not busy
4bc: 1afffff9 bne 4a8<_outbyte+0x8>; loop to check UART status

I think from the comments in this fragment it is clear how this procedure works.

Another important note about transitions. Register r15 (pc) can be used in ordinary arithmetic or logical operations as a destination register. So a command like add pc,pc,#8 is quite an instruction for moving to another address.

One more note needs to be made regarding transitions. Older ARM processors also have additional branch instructions bx, blx and blj. These are commands for jumping to code fragments with a different command system. Bx /blx allows you to switch to the 16-bit THUMB code of ARM processors. Blj is a call to Jazelle instruction system procedures (Java language support in ARM processors). Our Amber ARM v2a does not have these commands.

So we created new project, completed the basic settings, created and connected to the project a file in which we want to write some simple program in assembler.

What's next? Then, in fact, you can write a program using the thumb-2 command set supported by the Cortex-M3 core. The list and description of supported commands can be found in the document called Cortex-M3 Generic User Guide(chapter The Cortex-M3 Instruction Set), which can be found on the Books tab in the project manager in Keil uVision 5. Details about the thumb-2 commands will be written in one of the following parts of this article, but for now let’s talk about programs for the STM32 in general.

Like any other assembler program, a program for the STM32 consists of commands and pseudo-commands that will be translated directly into machine codes, as well as various directives that are not translated into machine codes, but are used for service purposes (program markup, assigning symbolic symbols to constants names, etc.)

For example, a special directive allows you to split a program into separate sections - AREA. It has the following syntax: AREA Section_Name (,type) (, attr) …, Where:

Section_name— section name.
type— section type. For a section containing data, the DATA type must be specified, and for a section containing commands, the CODE type must be specified.
attr- additional attributes. For example, the readonly or readwrite attributes indicate in which memory the section should be located, the align=0..31 attribute specifies how the section should be aligned in memory, the noinit attribute is used to allocate memory areas that do not need to be initialized or that are initialized to zero (when using this The attribute need not specify the section type, since it can only be used for data sections).

Directive EQU is probably well known to everyone, since it is found in any assembler and is intended for assigning symbolic names to various constants, memory cells, etc. It has the following syntax: Name EQU number and tells the compiler that all symbols encountered Name needs to be replaced with a number number. Let's say if as number use the address of a memory cell, then in the future this cell can be accessed not by its address, but by using an equivalent symbolic notation ( Name).

Directive GET filename inserts text into the program from a file named filename. This is an analogue of the include directive in assembler for AVR. It can be used, for example, to take out separate file directives for assigning symbolic names to various registers. That is, we put all naming assignments into a separate file, and then, so that these symbolic names can be used in the program, we simply include this file in our program with the GET directive.

Of course, in addition to those listed above, there are a bunch of different directives, a full list of which can be found in the chapter Directives Reference document Assembler User Guide, which can be found in Keil uVision 5 at the following path: tab Books project manager -> Tools User's Guide -> Complete User's Guide Selection -> Assembler User Guide.

Most commands, pseudo-commands, and directives in a program have the following syntax:

(label) SYMBOL (expr) (,expr) (,expr) (; comment)

(label) - label. It is needed so that the address of the command following this label can be determined. The label is an optional element and is used only when it is necessary to find out the address of the command (for example, to jump to this command). There must be no spaces before the label (that is, it must start from the very first position of the line), and the label name can only begin with a letter.

SYMBOL is a command, pseudo-command or directive. A command, unlike a label, on the contrary, must have some indentation from the beginning of the line, even if there is no label before it.

(expr) (,expr) (,expr) - operands (registers, constants...)

; - delimiter. All text on the line after this delimiter is treated as a comment.

Well, now, as promised, the simplest program:

AREA START , CODE , READONLY dcd 0x20000400 dcd Program_start ENTRY Program_start b Program_start END

AREA START, CODE, READONLY dcd 0x20000400 dcd Program_start ENTRY Program_start b Program_start END

In this program we have only one section, which is called START. This section is located in flash memory (since the readonly attribute is used for it).

The first 4 bytes of this section contain the address of the top of the stack (in our case 0x20000400), and the second 4 bytes contain the address of the entry point (the beginning of the executable code). Next comes the code itself. In our simplest example, the executable code consists of one single command to unconditionally jump to the Program_start label, that is, to execute the same command again.

Since there is only one section in the flash, in the scatter file for our program we will need to specify its name (that is, START) as First_Section_Name.

In this case, we have mixed data and commands. The address of the top of the stack and the address of the entry point (data) are written using dcd directives directly in the code section. Of course you can write like this, but it’s not very nice. Especially if we describe the entire table of interrupts and exceptions (which will turn out to be quite long), and not just the reset vector. It is much more beautiful not to clutter the code with unnecessary data, but to place the table of interrupt vectors in a separate section, or even better - in a separate file. Similarly, stack initialization can be placed in a separate section or even file. For example, we will place everything in separate sections:

AREA STACK, NOINIT, READWRITE SPACE 0x400 ; skip 400 bytes Stack_top ; and put the label AREA RESET, DATA, READONLY dcd Stack_top ; label address Stack_top dcd Program_start ; label address Program_start AREA PROGRAM, CODE, READONLY ENTRY ; entry point (start of executable code) Program_start ; program start mark b Program_start END

Well, the same program (which still doesn’t do anything useful), but now it looks much clearer. In the scatter file for this program, you need to specify the name RESET as First_Section_Name so that this section is located first in flash memory.

1. The real time clock counter must be enabled (1); The clock source select bit is cleared (2) if clocking is not provided by the main clock generator.

2. One or both of the interrupt event select bits (3) must be set. And it is selected which events will trigger the interrupt request (5).

3. Interrupt event masks (4, 7) must be specified.

2.5 About programming ARM7 in assembler

The ARM7 instruction set (Section 1.4) includes only 45 instructions, which are quite complex due to the variety of addressing methods, conditional fields and modifiers. The assembler program is cumbersome and

With difficult to read. Therefore, assembler is rarely used in programming for the ARM7 architecture.

At the same time, the high-level language C hides many architectural features from the programmer. The programmer practically does not touch such procedures as choosing the kernel mode, allocating memory for the stack, and handling interrupts. To learn these procedures, it is useful to write at least one simple program in assembly language.

In addition, even when using C, you still have to resort to assembly language.

1. Should be controlled The C compiler monitors whether it excluded important commands during optimization, considering them unnecessary. Is the compiler generating extremely inefficient code for a relatively simple operation due to insufficient optimization. To make sure that the compiler actually uses those hardware resources that are designed to increase the efficiency of a particular algorithm.

2. While searching for errors or causes of exceptions (section 2.4.1).

3. To obtain code that is absolutely optimal in terms of performance or memory consumption (sections 2.2.20, 3.1.5).

Let's look at the basic techniques for writing a program in assembler

With the goal is to demonstrate all the code executed by the microcontroller, as is, and without mediation C compiler.

The procedure for creating a project based on assembler is almost the same as for C programs (sections 2.3.1–2.3.3). There are only two exceptions:

a) the source text file is assigned the extension *.S;

b) here it is assumed that the STARTUP.S file is not connected to the program.

2.5.1 Basic rules for writing programs in assembler

The text of an assembler program is usually formatted in four columns. We can say that each line consists of four fields, namely: labels, operations, operands, comments. Fields are separated from each other by tab characters or spaces.

The main fields are operations and operands. Valid operations and their syntax are given in table (1.4.2)

A label is a symbolic designation of the command address. Everywhere, instead of a label, the address of the command preceded by the label will be substituted. Most often, tags are used in control transfer commands. Each label must be unique and is optional. Unlike many other versions, in RealView assembler, labels do not end with a colon (":").

Comments are optionally placed at the end of the line and separated by a semicolon (“;”).

Let's give a simple example.

2.5.2 Pseudo-commands

The RealView assembler supports so-called pseudo-instructions. A pseudo-instruction is a mnemonic notation that does not actually correspond to the processor's instruction set, but is replaced by one or (rarely) several instructions. Pseudo-commands are a kind of macros and serve to simplify the syntax. The list of supported pseudo-commands is given in table (2.5.1).

2.5.3 Assembly directives

Unlike commands, directives do not create executable code that is loaded into the microcontroller's memory. Directives are only instructions to the assembler; they control the formation of executable code.

Let's look at frequently used RealView 4 assembler directives.

Name EQU Constant

Assigns the symbolic designation Name to the Constant, which becomes a synonym for the constant. The main purpose is to introduce the names of control registers,

AREA Name, Parameters

Defines a memory area with the given Name. Using parameters, you specify the purpose of the memory area, for example, DATA (data) or CODE (code). The addresses of the defined area depend on the selected destination. The CODE area is located starting at address 0x00000000, the DATA area - at address 0x40000000. The program must have a CODE area named RESET. Constants placed in program memory should be declared in a section with a pair of parameters CODE, READONLY.

Indicates the entry point into the program, shows its “beginning”. One such directive must always be present in the program. Typically placed immediately after the AREA RESET, CODE directive.

Table 2.5.1 – Pseudo-instructions supported by the RealView 4 assembler

Mnemonic notation	Operation		Actual implementation
and syntax	Operation		Actual implementation
and syntax
ADR(Cond.)		to the register	Adding or subtracting a constant from PC co-
ADR(Cond.)		to the register	ADD or SUB commands
			ADD or SUB commands
ADRL(Cond)		to the register	Double ADD or SUB involving PC
ADRL(Cond)	(extended address range)		Double ADD or SUB involving PC
	(extended address range)
ASR(Cond) (S)	Arithmetic shift right
ASR(Cond) (S)	Arithmetic shift right		shift operand
ASR(Cond) (S)			shift operand

LDR(Cond.)		to the register
LDR(Cond.)		to the register	addressing (PC + immediate offset)
			addressing (PC + immediate offset)
			Placing a constant		in program memory
LDR(from index address-
LDR(from index address-
			tion. PC serves as the offset.
			tion. PC serves as the offset.
LSL(Conditional)(S)	Logical shift left
LSL(Conditional)(S)	Logical shift left		shift operand
LSL(Conditional)(S)			shift operand

LSR(Cond) (S)	Logical shift right
LSR(Cond) (S)	Logical shift right		shift operand
LSR(Cond) (S)			shift operand

POP(Cond.)	Restore registers from stack		Recovery	registers		team
POP(Cond.)	Restore registers from stack		LDMIA R13!,(...)
			LDMIA R13!,(...)
PUSH(Cond)			Preservation	registers		team
PUSH(Cond)			STMDB R13!,(...)
			STMDB R13!,(...)
ROR(Conditional)(S)	Cyclic shift right
ROR(Conditional)(S)	Cyclic shift right		shift operand
ROR(Conditional)(S)			shift operand

RRX(Cond.)(S)	Cycle right through
RRX(Cond.)(S)	transfer by 1 digit		shift operand
	transfer by 1 digit		shift operand

Name SPACE Size

Reserves memory for storing data of a given Size. The name becomes synonymous with the address of the reserved space. The unity of the address space allows this directive to be used for both permanent and RAM. The main purpose is to create global variables in RAM (in the DATA area).

Label DCB/DCW/DCD Constant

“Flash” data (numeric Constants) in program memory. The label becomes synonymous with the address to which the data will be recorded. Different directives (DCB, DCW and DCD) serve for data of different sizes: byte, 16-bit word, 32-bit word (respectively).

Serves as a sign of the end of the file. All text after this directive is ignored by the assembler.

2.5.4 Macros

A macro is a predefined program fragment that performs some common operation. Unlike subroutines called using control transfer commands, the use of macros does not reduce performance, but does not reduce program memory consumption. Because every time a macro is called, the assembler embeds its entire text into the program.

To declare a macro, use the following construction


	$ Parameter1, $ Parameter2, ...

Parameters allow you to modify the macro text each time you access it. Inside (in the body) of the macro, parameters are also used with a preceding “$” sign. Instead of parameters in the body of the macro, the parameters specified during the call are substituted.

The macro is called like this:

Name Parameter1, Parameter2, ...

It is possible to organize condition checking and branching.

IF "$Parameter" == "Value"

Please note that this design does not lead to a software check of the condition by the microcontroller. The condition is checked by the assembler during the generation of executable code.

If you are using the Raspbian distribution as the operating system of your Raspberry Pi, you will need two utilities, namely as (an assembler that converts assembly language source code into binary code) and ld (a linker that creates the resulting executable file). Both utilities are in the package software binutils , so they may already be present on your system. Of course, you'll also need a good text editor; I always recommend using Vim for program development, but it has a high entry barrier, so Nano or any other GUI text editor will work just fine.

Ready to get started? Copy the following code and save it in the myfirst.s file:

Global _start _start: mov r7, #4 mov r0, #1 ldr r1, =string mov r2, #stringlen swi 0 mov r7, #1 swi 0 .data string: .ascii "Ciao!\n" stringlen = . -string

This program just prints the string "Ciao!" to the screen, and if you've read articles on using assembly language to work with x86 CPUs, some of the instructions used may be familiar to you. But still, there are many differences between the instructions of the x86 and ARM architectures, which can also be said in the syntax of the source code, so we will analyze it in detail.

But before that, it should be mentioned that to assemble the given code and link the resulting object file into an executable file, you need to use the following command:

As -o myfirst.o myfirst.s && ld -o myfirst myfirst.o

Now you can run the created program using the command ./myfirst . You may have noticed that the executable file is a very modest size of about 900 bytes - if you were using the C programming language and the puts() function, the binary file would be about five times larger in size!

Creating your own operating system for Raspberry Pi

If you've read previous articles in this series on x86 assembly language programming, you probably remember the first time you fired up your own operating system, which displayed a message on the screen without Linux help or any other operating system. After that we improved it by adding a simple interface command line and a mechanism for loading and launching programs from disk, leaving a foundation for the future. It was a very interesting, but not very difficult job, mainly due to outside help BIOS firmware- it provided a simplified interface for accessing the screen, keyboard and floppy disk reader.

With the Raspberry Pi, you will no longer have useful BIOS features at your disposal, so you will have to develop device drivers yourself, which in itself is difficult and uninteresting work compared to drawing on the screen and implementing the execution engine own programs. At the same time, there are several guides on the network that describe in detail the initial stages of the Raspberry Pi boot process, features of the mechanism for accessing GPIO pins, and so on.

One of the best such documents is a document called Baking Pi (www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/index.html) from the University of Cambridge. It is essentially a set of tutorials that describe techniques for working with assembly language to turn on LEDs, access pixels on the screen, receive keyboard input, and so on. As you read, you will learn a lot about hardware Raspberry Pi, and the manuals were written for the original models of these single board computers, so there is no guarantee that they will be relevant for models such as A+, B+ and Pi 2.

If you prefer the C programming language, you should refer to the document from the Valvers resource, located at http://tinyurl.com/qa2s9bg and containing a description of the process of setting up a cross-compiler and building a simple operating system kernel, and in the Wiki section of the useful OSDev resource, located See http://wiki.osdev.org/Raspberry_Pi_Bare_Bones for information on how to create and run a basic OS kernel on a Raspberry Pi.

As mentioned above, the biggest problem in this case is the need to develop drivers for various Raspberry Pi hardware devices: USB controller, SD card slot and so on. After all, even the code for the mentioned devices can take tens of thousands of lines. If you still want to develop your own full-featured operating system for the Raspberry Pi, you should visit the forums at www.osdev.org and ask if anyone has already developed drivers for these devices and, if possible, adapt them for your kernel operating system, thereby saving a large amount of your time.

How does it all work

The first two lines of code are not CPU instructions, but assembler and linker directives. Each program must have a clearly defined entry point called _start, and in our case it was at the very beginning of the code. Thus, we inform the linker that code execution should begin with the first instruction and no additional actions are required.

By using following instructions we put the number 4 in register r7. (If you have never worked with assembly language before, you should know that a register is a memory location located directly in the central processing unit. In most modern central processing units there are a small number of registers implemented compared to millions or billions of RAM cells, but registers are indispensable because they operate much faster.) ARM architecture chips provide developers with a large number of registers general purpose: The designer can use up to 16 registers named r0 to r15, and these registers are not bound by any historical restrictions, as is the case in the x86 architecture, where some of the registers can be used for certain purposes at certain times.

So, although the mov instruction is very similar to the x86 instruction of the same name, you should still pay attention to the hash symbol next to the number 4, indicating that what follows is an integer value and not a memory address. In this case, we want to use the Linux kernel write system call to print our string; To use system calls, you must fill the registers with the necessary values before asking the kernel to do its work. The system call number must be placed in register r7, with the number 4 being the write system call number.

With the following mov instruction, we place the file handle to which the string "Ciao!" is to be written, that is, the standard output stream handle, into the r0 register. Since the standard output stream is used in this case, its standard descriptor, that is, 1, is placed in the register. Next, we need to place the address of the string we want to output into register r1 using the ldr instruction (a "load into register" instruction; note the equal sign indicating that what follows is a label rather than an address). At the end of the code, namely in the data section, we declare this string in the form of a sequence of ASCII characters. To successfully use the "write" system call, we also have to tell the operating system kernel how long the output string is, so we put the value of stringlen in the r2 register. (The value of stringlen is calculated by subtracting the end address of the string from the beginning address.)

At this point, we have filled all the registers with the necessary data and are ready to transfer control to the Linux kernel. To do this, we use the swi instruction, whose name stands for “software interrupt,” which jumps into the OS kernel space (in almost the same way as the int instruction in articles on the x86 architecture). The OS kernel examines the contents of the r7 register, finds the integer value 4 in it, and concludes: “So, the calling program wants to print a string.” After that, it examines the contents of other registers, prints a string, and returns control to our program.

Thus, we see the line “Ciao!” on the screen, after which we can only correctly terminate the program execution. We solve this problem by placing the exit system call number in register r7 and then calling software interrupt instruction number zero. And that’s all - the OS kernel finishes executing our program and we move back to the command shell.

Vim (left) is excellent text editor for writing code in assembly language - a file for syntax highlighting of this language for the ARM architecture is available at http://tinyurl.com/psdvjen.

Advice: When working with assembly language, you should not skimp on comments. We did not use a large number of comments in this article in order to ensure that the code took up as little space as possible on the pages of the magazine (and also because we described in detail the purpose of each of the instructions). But when developing complex programs whose code seems obvious at first glance, you should always think about what it will look like after you partially forget the ARM assembly language syntax and return to development after a few months. You can forget about all the tricks and shortcuts used in the code, after which the code will look like complete gobbledygook. Based on all of the above, you should add as many comments to your code as possible, even if some of them seem too obvious at the moment!

Reverse engineering

Converting a binary file to assembly language code can also be useful in some cases. The result of this operation is usually not very high-quality code without readable label names and comments, which, nevertheless, can be useful for studying the transformations that were performed by the assembler on your code. To disassemble the myfirst binary, simply run the following command:

Objdump -d myfirst

This command will disassemble the executable code section of the binary file (but not the data section, since it contains ASCII text). If you look at the code obtained as a result of disassembly, you will probably notice that the instructions in it are practically the same as the instructions in the original code. Disassemblers are used primarily when you need to study the behavior of a program that is only available in binary code form, such as a virus or a simple closed-source program whose behavior you want to emulate. At the same time, you should always remember the restrictions imposed by the author of the program under study! Disassembling a binary program file and simply copying the resulting code into your project code is, of course, a bad idea; at the same time, you can quite use the resulting code to study the principle of operation of the program.

Subroutines, loops and conditional statements

Now that we know how to design, assemble and link simple programs, let's move on to look at something more complex. The following program uses subroutines to print strings (thanks to them, we can reuse code fragments and save ourselves from having to perform the same operations of filling registers with data). This program implements a main event loop that allows the output of a string until the user enters "q". Study the code and try to understand (or guess!) the purpose of the instructions, but don’t despair if you don’t understand something, because a little later we will also look at it in great detail. Note that the @ symbols in ARM assembly language highlight comments.

Global _start _start: ldr r1, =string1 mov r2, #string1len bl print_string loop: mov r7, #3 @ read mov r0, #0 @ stdin ldr r1, =char mov r2, #2 @ two characters swi 0 ldr r1, =char ldrb r2, cmp r2, #113 @ ASCII code for "q" beq done ldr r1, =string2 mov r2, #string2len bl print_string b loop done: mov r7, #1 swi 0 print_string: mov r7, #4 mov r0, #1 swi 0 bx lr .data string1: .ascii "Enter q to quit!\n" string1len = . - string1 string2: .ascii "That wasn't q...\n" string2len = . - string2 char: .word 0

Our program begins by placing a pointer to the beginning of the string and its length into the appropriate registers for subsequent execution of the write system call, immediately after which it jumps to the print_string subroutine located below in the code. To make this transition, the bl instruction is used, the name of which stands for "branch and link" ("branch with address preservation"), and it itself stores the current address in the code, which allows you to return to it later using the bx instruction. The print_string routine simply fills other registers to make the write system call in the same way as our first program before jumping into OS kernel space and then returning to the stored code address using the bx instruction.

Returning to the calling code, we can find a label called loop - the name of the label already hints that we will return to it in a while. But first we use another system call called read (numbered 3) to read the character entered by the user using the keyboard. So we put the value 3 in register r7 and the value 0 (standard input handle) in register r0 since we need to read user input and not data from a file.

Next, we place the address where we want to store the character read and placed by the OS kernel in register r1 - in our case, this is the char memory area described at the end of the data section. (In fact, we need a machine word, that is, a memory area for storing two characters, because it will also store the code for the Enter key. When working with assembly language, it is important to always remember the possibility of overflowing memory areas, because there are no high-level mechanisms ready to come to your aid!).

Returning to the main code, we see that the value 2 is placed in register r2, corresponding to the two characters we want to store, and then we jump into kernel space to perform the read operation. The user enters a character and presses the Enter key. Now we need to check what the character is: we put the address of the memory area (char in the data section) in the r1 register, and then use the ldrb instruction to load the byte from the memory area pointed to by the value in that register.

The square brackets in this case indicate that the data is stored in the memory area of interest to us, and not in the register itself. Thus, register r2 now contains a single character from the char memory area from the data section, and this is the exact character that the user entered. Our next task will be to compare the contents of register r2 with the character "q", which is the 113th character of the ASCII table (refer to the character table located at www.asciichart.com). We now use the cmp instruction to perform the comparison operation, and then use the beq instruction, which stands for "branch if equal," to jump to the done label if the value in register r2 is 113. If if this is not the case, then we print our second line, after which we jump to the beginning of the loop using the b instruction.

Finally, after the done mark, we tell the OS kernel that we want to terminate the program, just like in the first program. To run this program, simply assemble and link it according to the instructions given for the first program.

So, we have reviewed a fairly large amount of information in the most condensed form, but it will be better if you take a look at self-study material by experimenting with the above code. No the best way getting to know a programming language rather than conducting experiments that involve modifying someone else's code and observing the effect achieved. You can now develop simple ARM assembly language programs that read user input and output data using loops, comparisons, and subroutines. If you haven't encountered assembly language before today, I hope this article has made the language a little clearer to you and helped dispel the popular stereotype that it is a mystical craft reserved for only a few talented developers.

Of course, the information provided in the article regarding the use of assembly language for the ARM architecture is just the tip of the iceberg. Using this programming language is always associated with a huge number of nuances and if you want us to write about them in one of the following articles, just let us know about it! In the meantime, we recommend visiting an excellent resource with a lot of materials for learning techniques for creating programs for Linux systems, executed on computers with ARM architecture central processors, which is located at http://tinyurl.com/nsgzq89. Happy programming!

Previous articles from the "Assembly School" series: