In fact, one of the great ideas in computer science is the idea that programs could be stored just as data was stored. Before that, people envisioned the hardware running a fixed program, and data being stored in memory.
Most assembly languages give some minimal support to placing data in memory. A program is usually divided into a data segment and a text segment. The text segment contains the program. The data segment essentially contains global data.
This is how a typical MIPS program looks:
.data and .text are both directives to the assembler.
.data tells the assembler that the upcoming section is
considered data. .text tells the assembler that the upcoming
section is considered assembly language instructions. In general, you
place the data segment first and the text segment second, though it's
strictly not necessary to do so.
Notice that shortly after the .text, we have another
assembler directive. .global tells the assembler that the
label following it (in this case, "main") is accessible outside the
file. This is useful when you want to link several files together.
You want to indicate to the assembler which labels can be accessed
outside the file, and which ones are private to the file.
We've spent a great deal of time talking about the text segment,
but not that much about the data segment. So, we can do that now.
The data segment consists of declarations. Declarations
isn't really an official MIPS term, but I use it because it resembles
declarations in a language like C.
A single declaration consists of:
This stores str in memory, but without a null terminator.
This stores str in memory, but with a null terminator. The
"z" refers to zero, which is the ASCII code for the null character.
This is how C-style strings are stored.
Store n bytes contiguously in memory (you get to pick
n). I'll assume the values b1,...,bn can be written in
either in base 10 or in hex. I'll also assume commas are needed to
separate the values. Finally, I assume that the values can be written
on more than one line.
Store n 16-bit halfwords contiguously in memory (you get to
pick n). I'll assume the values h1,...,hn can be
written in either in base 10 or in hex. I'll also assume commas are
needed to separate the values. I assume that the values can be written
on more than one line. Finally, I assume the halfwords are half word
aligned in memory, i.e., initial byte stored at addresses divisible by
2.
Store n 32-bit words contiguously in memory (you get to pick
n). I'll assume the values w1,...,wn can be written in
either in base 10 or in hex. I'll also assume commas are needed to
separate the values. I assume that the values can be written
on more than one line. Finally, I assume the words are
word-aligned in memory, i.e., initial byte stored at addresses
divisible by 4.
Reserves numBytes of space in memory.
As you can see, the choice of types is quite limited: non-null
terminated strings, null-terminated strings, bytes, halfwords, words,
and bytes without values.
You can have one more declaration in the .data segment.
The assembler tries to store the data in consecutive memory locations,
and tries to observe word alignment, if applicable.
You can use the pseudoinstruction la to load the address.
la stands for "load address". This pseudoinstruction takes
a label as its only operand.
la is really basically ori or some similar real
instruction (possibly lui combined with ori). The real
instructions for la should be identical (or nearly so) with
li (load immediate).
Real load instructions (e.g. lw, lh, lb) copy data from
memory to registers. These load pseudoinstructions (and the
real lui instruction) copy immediate values into registers.
Here's how we'd use it:
Using the data segment allows you to initialize arrays using
data values. Unfortunately, the data is global, and may not be how
C compilers allocates spaces for arrays, which can be done locally
on the stack (at least, for statically allocated arrays).
.data # Tells assembler we're in the data segment
val: .word 10, -14, 30 # Three words placed in memory
.text # Tells assembler we're in the text segment
.global main # Tells assembler main is accessible outside file
main: addi $sp, $sp, -8
In assmebly language code, there are instructions, data, and
assembler directives. Instructions and data should be
self-explanatory. Assembler directives provide information to the
assembler. Unfortunately, the directives vary from one ISA to the
next, and sometimes from one assembler to the next.
Even though labels look like variable declarations, it really
isn't. It's merely an address in memory. In particular, the assembler
doesn't check if you use the label correctly based on the type.
Data Types
What kind of types are permitted?
.data # Tells assembler we're in the data segment
val: .word 10, -14, 30
str: .ascii "Hello, world"
num: .byte 0x01, 0x03
arr: .space 100
We have four declarations above. Each starts with a label, which
consists of the identifier and a colon, then the "type", then possibly
the data.
Using la
Suppose you want to access an address in memory corresponding to
some label in the data segment. For example, you may have "declared"
an array in memory called arr.
.data
arr: .space 100
.text
.global main
main: la $t0, arr # Place address of label, arr, in $t0
Using the Stack
Another way to declare an array is to use the stack. For example,
if you want to declare a 100 element int array, subtract 400 from the
stack pointer, and now you can an array. The main problem with doing
this is that the array has to be initialized using instructions.
Summary
For writing simple assmebly languages programs, it's convenient
to use the data segment to declare data. You can use la
to access the address from the data segment. The assembler does
the work of computing the actual address for you, so you don't
have to keep track of it yourself.