C++ got struct from C. The initial state of affairs. Memory, padding and first attempts at copy.

This article is part of a series on the history of regular data type in C++.

Struct

struct in C (and C++) is a language construct that defines a type by grouping sequentially a finite number elements of other types, these are known as member variables. It is different from arrays which group sequentially a fine number of the same type. It’s also different from the less often used union which overlaps other types.

Memory and padding

For example the type below is composed out of two ints and two chars:

1
2
3
4
5
6
7
struct sample
{
  int w;
  char x;
  int y;
  char z;
};

There are choices that are platform dependent, but here is what’s happening on a typical one nowadays: The ints take 4 bytes each, their address is usually aligned to be a multiple of 4 and are interpreted as a two’s complement signed integer. The chars take a byte each, they don’t have any alignment requirements.

Because of the int members, the addresses for sample variables would be aligned to a multiple of 4. That’s where int w is located taking 4 bytes, followed by char x taking 1 bytes. Padding is added at in the middle to align the int y address as a multiple of 4. Padding is added at the end so that if you have an array of sample, the address of int w is also aligned as a multiple of 4. The size of sample is 16 bytes.

1
2
3
4
5
6
7
8
9
struct sample
{
  int w; // 4 bytes
  char x; // 1 byte
  // 3 bytes of padding here
  int y; // 4 bytes
  char z; // 1 byte
  // 3 bytes of padding at the end
}; // takes overall 16 bytes

Padding can be reduced by sorting members by size:

1
2
3
4
5
6
7
8
struct better_sample
{
  int w; // 4 bytes
  int y; // 4 bytes
  char x; // 1 byte
  char z; // 1 byte
  // 2 bytes of padding at the end
}; // takes overall 12 bytes

Virtual machine

What such a struct type really does is it provides an interpretation of memory.

This is not a small thing. Many languages create abstraction layers around data types. For example in Lisp derived languages (Javascript included), a data type is defined indirectly via a dictionary that is required to implement lambdas properly (to capture creation context).

Although theoretically we imagine that the C/C++ code we write runs on some imaginary machine, rather than an explicit virtual machine (as in Java), or an implicit machinery like in Lisp/Javascript/Python etc, in C/C++ this imaginary virtual machine is quite similar to the actual physical machine: it has data and program, the data memory is split into stack, heap and static initialized one, so little to no translation is required to make the code run efficiently.