Viewing entries in
Programming

About a dwarf

Comment

About a dwarf

Dwarf started as a joke about elf. This is not to say that dwarf is a joke. Dwarf is one of the most used formats for storing debugging information, and its origin is linked to the Executable and Linkable Format (elf).

The official site for Dwarf contains a nice summary of dwarf. However, I wanted an even shorter description to refresh my memory if I ever need to.

Let's discuss this hypothetical program:

brush: c
int main(int argc, char **argv) {
   return argc;
}

Dwarf is organized as a tree hierarchy of Debug Information Entries (DIEs). Each DIE has a tag (DW_TAG_xxx) and zero or more attributes (DW_AT_xxx), which give the information relevant to the tag.

Flattening the tree

The tree is flattened into a sequence of DW_TAGs. The tree is flattened depth first, with a list of siblings terminated by a null entry. If finding a sibling is important, the producer might insert a DW_AT_sibling that points to the next sibling.

Main Tags

  • Compilation unit: This is the typical C file. E.g. main.c. Typical information: directory, file name, highest and lowest PC, producer (compiler used).
  • Subprogram: This is a function. E.g. main, printf. Typical information: name, file, start line, return type, highest and lowest PC, external.
  • Formal parameter: use to pass the name and type of parameters to a function. Also indicated where the parameter is stored at. E.g. register, memory address

Type information

Information about the types used in the program. It shows that dwarf had C origins.

  • DW_AT_type: The basic data types provided by the compiler. E.g. int, short. Typical information: name, size, encoding. Optional, bit size, bit offset.
  • Pointer type: This is a pointer to another type (base or other)
  • Const type: Used to add the const qualifier of C.
  • DW_TAG_structure_type: structure
    • DW_TAG_member
  • DW_TAG_union_type : union
  • DW_TAG_enumeration_type: enumeration
  • DW_TAG_typedef: typedef
  • DW_TAG_array_type: array
    • DW_TAG_subrange_type: defines the range for an array
  • DW_TAG_inheritance: defines class inheritance

Variables are special:

  • Variable: with type and name information. Also very important is where the data is stored. This is with the location DIE. This links a set of PC values to a location. A location can be a register or some memory location obtained by some computation. E.g. position in the stack frame.

Dwarf expressions

In several places it is needed to compute an address or some other value. This is accomplished with a stack-based machine that allows calculating very complex expressions. These are built with the DW_OP_xxx operations. There are a lot of DW_OP_xxx operations, covering even control code.

Location information

Location information provides a means to calculate the address of a variable in memory or in a register. Due to the nature of optimizations, the location a variable is stored at can change for different parts of the program.

If the variable has a single location for the lifetime, then this information is placed inline. If the variable has multiple locations depending on the lifetime of the variable, the information is moved to the .debug_loc section. This section holds a list of single locations delimited by start and end addresses.

Data encoding

Data encoding is taken seriously in Dwarf because debugging information can easily exceed the size of the program even if carefully encoded. Suffice to say that it is not meant to be read by the naked eye and that complex state machines are used to encode the information.

Line information

Line information maps the assembly addresses with the original source code. Special compression techniques are used for this.

Elf sections

Dwarf data is split in different sections. When stored in an elf file, the following sections are created:

  • .debug_info: Contains all the main DIE blocks.
  • .debug_types: Contains all the type DIE blocks.
  • .debug_line: Contains the mapping between line numbers and PC addresses
  • .debug_ranges: Contains address ranges used in DIE blocks (DW_AT_ranges). I guess it is used to define live ranges for variables.
  • .debug_loc: Contains location information used to describe the location variables are stored at (DW_AT_location).
  • .debug_str: Contains strings reference by the DIE blocks in the .debug_info section.
  • .debug_abbrev: Abbreviations used in the .debug_info section.
  • .debug_aranges: Maps between PC and line numbers. I am not sure the relation netween this and the .debug_loc section.
  • .debug_frame: Information about the call frame. I guess it is used to unwind the stack.
  • .debug_macinfo: Information about macros.
  • .debug_pubnames: Information about global functions and variables.
  • .debug_pubtypes: Information about global types.

Cover picture by mustamirri.

https://www.ibm.com/developerworks/aix/library/au-dwarf-debug-format/

Comment

Swift overflow operators

Comment

Swift overflow operators

No arithmetic overflow by default! What do I mean by this? Well, in the majority of languages I worked with, when you add to integers and the result is above the maximum integer, the result overflows and wraps over and you end up with a number of the opposite sign.

Swift takes a different stance at this. In Swift, integer operations are always checked for overflow and if overflow takes place, bad things happen. Well, bad things are not really bad per se, since they help you to find bugs. Let's see some examples.

brush: swift
var a : Int = Int.max
a = a + 1

When we run the code above in a playground, the following error is triggered:

Playground execution aborted: Execution was interrupted, reason: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0).

So, there is some sort of runtime checking that the value does not overflow. How is this done by the compiler? The code when compiled to x86 gives something like this:

0x108b11912 <+98>:  incq   %rax
0x108b11915 <+101>: seto   %dl
0x108b11918 <+104>: movq   %rax, -0x38(%rbp)
0x108b1191c <+108>: movb   %dl, -0x39(%rbp)
0x108b1191f <+111>: jo     0x108b11942
....
0x108b11942 <+146>: ud2

As we can see, the compiler increments the register (incq), then checks the overflow bit (seto) and finally jumps if there was an overflow (jo). The destination is an undefined instruction (ud) that triggers the exception. This is obviously much more expensive than a simple addition. There is no free lunch!

If we try the same code in the compiler, we get a more interesting error:

error: arithmetic operation '-9223372036854775808 - 1' (on type 'Int') results in an overflow
var a : Int = Int.min - 1
              ~~~~~~~ ^ ~

As expected, the compiler performed constant propagation and detected the error. Apparently the playground has less checks than the compiler.

In most cases, the overhead to check for overflow will be minimal and compensated by the added safety. However, there will be some cases where performance is needed. Swift solves this by using special operators: the overflow operators. These are the standard +, - and * preceded with an ampersand: &+, &- and &*.

The code above with overflow operators looks like this:

brush: swift
var a : Int = Int.max
a = a &+ 1

And the generated assembly is:

0x10214f929 <+105>: incq   %rax
0x10214f92c <+108>: seto   %dl
...
0x10214f944 <+132>: movb   %dl, -0x39(%rbp)

For some reason, there is still some overflow detection even though it is not directly used in the code. Perhaps there is some status structure used to record that some computation overflowed? To be examined later...

Comment