Have you ever wondered what was the difference between the -fpic and -fPIC compiler command line flags were? No? Then you can go back to doing normal things. But if you have, then read on.
For a start, the GCC documentation does provide a hint (emphasis added):
__pic__
' and '__PIC__
' are
defined to 1.
__pic__
' and '__PIC__
' are
defined to 2.
So it has got something to do with size limitations of the Global Offset Table (GOT), and for changing the value of two preprocessor macros, it only matters on some architectures.
For the sake of this short piece, lets not dive into the fine details of the GOT and inner workings of dynamic linking. Briefly, the GOT is a table of pointers that gets filled by the run-time linker to manage cross-references between dynamic shared objects (i.e. libraries).
Then why is it that small pic has a size limitation and large PIC does not?
It boils down to the architecture-specific machine code emitted by the compiler to read pointers from the GOT. Specifically, small pic requires fewer and more optimized instructions than large PIC, at the obvious cost of the size limitations.
Here is a simple example on the most popular of affected platforms.
First consider the following simple C module,
with a function foobar
referencing two external objects, foo
and bar
:
/* foobar.c */ extern int foo; extern int bar; int foobar(void) { return foo + bar; }
Then lets compile it:
aarch64-linux-gnu-gcc -Og -fno-pic foobar.c -c -o foobar.o aarch64-linux-gnu-gcc -Og -fpic foobar.c -c -o foobar-pic.o aarch64-linux-gnu-gcc -Og -fPIC foobar.c -c -o foobar-PIC.o
And disassemble the results:
% aarch64-linux-gnu-objdump -dr foobar.o foobar.o: file format elf64-littleaarch64 Disassembly of section .text: 0000000000000000 <foobar>: 0: 90000000 adrp x0, 0 <foo> 0: R_AARCH64_ADR_PREL_PG_HI21 foo 4: b9400001 ldr w1, [x0] 4: R_AARCH64_LDST32_ABS_LO12_NC foo 8: 90000000 adrp x0, 0 <bar> 8: R_AARCH64_ADR_PREL_PG_HI21 bar c: b9400000 ldr w0, [x0] c: R_AARCH64_LDST32_ABS_LO12_NC bar 10: 0b000020 add w0, w1, w0 14: d65f03c0 ret
The compiler does not know where the two objects will be located, so it emits static relocations. In normal usage, the linker would resolve and strip the relocations from the final executable.
When generating plain dumb position-dependent code, each object is referenced with a pair of ADRP and LDR instructions, as per the small memory model:
x0
here).
w0
and w1
).
(AArch64 also features tiny and large memory models, but that is another topic). After both values are loaded, they are added (ADD) and the function returns (RET).
With small pic, the example is simple enough that the result looks very similar:
% aarch64-linux-gnu-objdump -dr foobar-pic.o foobar-pic.o: file format elf64-littleaarch64 Disassembly of section .text: 0000000000000000 <foobar>: 0: 90000000 adrp x0, 0 <_GLOBAL_OFFSET_TABLE_> 0: R_AARCH64_ADR_PREL_PG_HI21 _GLOBAL_OFFSET_TABLE_ 4: f9400001 ldr x1, [x0] 4: R_AARCH64_LD64_GOTPAGE_LO15 foo 8: f9400000 ldr x0, [x0] 8: R_AARCH64_LD64_GOTPAGE_LO15 bar c: b9400021 ldr w1, [x1] 10: b9400000 ldr w0, [x0] 14: 0b000020 add w0, w1, w0 18: d65f03c0 ret
The compiler still emits an ADRP instruction with position-relative
page high 21-bits address relocation (R_AARCH64_ADR_PREL_PG_HI21)...
However there is only one such instruction where there were previously two.
And it refers to a third _GLOBAL_OFFSET_TABLE_
symbol
instead of foo
and bar
.
All accesses to the GOT are done through the same base register. That register contains the address of the 4 KiB page where the GOT starts.
This saves one instruction per access to the GOT, after the first one. But it limits the GOT size to the range of the 64-bits load GOT page low 15-bits relocation type (R_AARCH64_LD64_GOTPAGE_LO15). Such a relocation can generate a byte offset between 0 and 32760 in multiple of 8. Considering that the GOT might start at any offset between 0 and 4088 (4096 minus 8 bytes), the size of the GOT cannot safely exceed 32760 minus 4088 equals 28 KiB (i.e. 3584 entries).
With large PIC, the generated byte code is essentially identical to the non-PIC code with one ADRP and one LDR instruction for each object:
% aarch64-linux-gnu-objdump -dr foobar-PIC.o foobar-PIC.o: file format elf64-littleaarch64 Disassembly of section .text: 0000000000000000 <foobar>: 0: 90000000 adrp x0, 0 <foo> 0: R_AARCH64_ADR_GOT_PAGE foo 4: f9400000 ldr x0, [x0] 4: R_AARCH64_LD64_GOT_LO12_NC foo 8: b9400002 ldr w2, [x0] c: 90000001 adrp x1, 0 <bar> c: R_AARCH64_ADR_GOT_PAGE bar 10: f9400021 ldr x1, [x1] 10: R_AARCH64_LD64_GOT_LO12_NC bar 14: b9400020 ldr w0, [x1] 18: 0b000040 add w0, w2, w0 1c: d65f03c0 ret
The relocation types are however different, GOT-specific ones. As the relocated page address can now vary from one object to the other, the GOT size is no longer constrained to the range of the relocation of the LDR instruction, and can be as large as the memory model allows.
And there is the difference between the pic and PIC.