Tale of OpenBSD secure memory allocator internals - malloc(3)
Hi there,
It’s been a long time since I have written anything, so, continuing with the journey of reading and exloring the OpenBSD source codes with my friend debugger. This time it is about OpenBSD malloc(3), secure memory allocator
I will try to keep it as n part series due to length of the blog post and this series will be mostly focussed on user-space code of malloc(3) and friends
First of all, I would like to thanks Otto Moerbeek, Bryan Steele and Fabien Romano for helping me to understand the malloc(3) internals and resolving my queries/doubts.
Lets start…
Consider the following code-snippet to start our memory allocator journey on OpenBSD 6.6-stable malloc(3)
|
|
Compile libc with debug symbols and turn off the optimization by compiling it with option “-O0 -g”.
cd $openbsd_src_directory
cd lib/libc
CFLAGS="-g -O0" CXXFLAGS="-g -O0" make obj
CFLAGS="-g -O0" CXXFLAGS="-g -O0" make
Used debugger for understanding the code internals instead of printf style debugging.
For printf style debugging, one can use dprintf(3) or write(2) calls to print anything but keep in mind that after installation of libc with printf it will print a lot of information for every executable, so, it is not a good idea to install it.
Suggestion: one can either compile libc with or without printf statements and use LD_PRELOAD to load it and use debugger. With printf debugging one use the same LD_PRELOAD for the code snippet.
From the code snippet jumped to malloc(3) defination from libc present in the file /usr/src/lib/libc/stdlib/malloc.c:1278
1277 void *
1278 malloc(size_t size)
1279 {
1280 void *r;
1281 struct dir_info *d;
1282 int saved_errno = errno;
1283
1284 PROLOGUE(getpool(), "malloc")
1285 r = omalloc(d, size, 0, CALLER);
1286 EPILOGUE()
1287 return r;
1288 }
As explained by the developer Otto@ in the tweet on twitter that struct dir_info contains all the meta information malloc needs to keep track of what page regions have been allocated, which pages regions are in the free-cache and for pages of chunks which chunks are free and which are allocated. (UPDATE: tweet is no more available)
So, we can see that after some basic initialization, there are some macros, PROLOGUE and EPILOGUE
prologue: english meaning of prologue refers to introductory or an opening to a story which sets the context and in computer terms, it referes the same meaning which means it set-up the necessary things to start execution.
epilogue: in english, it means conclusion or ending part and in computer terms, it refers clearing up things which set-up by prologue.
defination of PROLGOUE macro:
1256 #define PROLOGUE(p, fn) \
1257 d = (p); \
1258 if (d == NULL) { \
1259 _malloc_init(0); \
1260 d = (p); \
1261 } \
1262 _MALLOC_LOCK(d->mutex); \
1263 d->func = fn; \
1264 if (d->active++) { \
1265 malloc_recurse(d); \
1266 return NULL; \
1267
} \
From the above malloc(3) defination, we can see that the first argument of PROLOGUE is function getpool() and second is the function string, that is, “malloc” and from the PROLOGUE macro expansion, we can see that it first calls getpool() and the defination of getpool() as mentioned
270 static inline struct dir_info *
271 getpool(void)
272 {
273 if (!mopts.malloc_mt)
274 return mopts.malloc_pool[1];
275 else /* first one reserved for special pool */
276 return mopts.malloc_pool[1 + TIB_GET()->tib_tid %
277 (mopts.malloc_mutexes - 1)];
278 }
from the above code defination, it first checks whether it has multi-threads or single-threads. And, in our case it is single threaded executable, so after if it returns mopts.malloc_pool[1] which is NULL.
then as per the PROLOGUE defination, it checks whether d is NULL or not. In our case it is NULL as at first it will always be NULL and after that it calls _malloc_init(0) but when d is not NULL then it assigns fn to d->func and after that increment the d->active within if cond then calls malloc_recurse(d) and returns NULL but for our situation it is not the flow.
Defination of _malloc_init(0)
1207 void
1208 _malloc_init(int from_rthreads)
1209 {
1210 u_int i, nmutexes;
1211 struct dir_info *d;
1212
1213 _MALLOC_LOCK(1);
1214 if (!from_rthreads && mopts.malloc_pool[1]) {
1215 _MALLOC_UNLOCK(1);
1216 return;
1217 }
1218 if (!mopts.malloc_canary)
1219 omalloc_init();
1220
1221 nmutexes = from_rthreads ? mopts.malloc_mutexes : 2;
1222 if (((uintptr_t)&malloc_readonly & MALLOC_PAGEMASK) == 0)
1223 mprotect(&malloc_readonly, sizeof(malloc_readonly),
1224 PROT_READ | PROT_WRITE);
1225 for (i = 0; i < nmutexes; i++) {
1226 if (mopts.malloc_pool[i])
1227 continue;
1228 if (i == 0) {
1229 omalloc_poolinit(&d, MAP_CONCEAL);
1230 d->malloc_junk = 2;
1231 d->malloc_cache = 0;
1232 } else {
1233 omalloc_poolinit(&d, 0);
1234 d->malloc_junk = mopts.def_malloc_junk;
1235 d->malloc_cache = mopts.def_malloc_cache;
1236 }
1237 d->mutex = i;
1238 mopts.malloc_pool[i] = d;
1239 }
1240
1241 if (from_rthreads)
1242 mopts.malloc_mt = 1;
1243 else
1244 mopts.internal_funcs = 1;
1245
1246 /*
1247 * Options have been set and will never be reset.
1248 * Prevent further tampering with them.
1249 */
1250 if (((uintptr_t)&malloc_readonly & MALLOC_PAGEMASK) == 0)
1251 mprotect(&malloc_readonly, sizeof(malloc_readonly), PROT_READ);
1252 _MALLOC_UNLOCK(1);
1253 }
from the above code-snippet, we can see that first it does some MALLOC_LOCKing and checks for from_rthreads and mopts.malloc_pool[1] as a result it is not true due to mopts.malloc_pool[1] which is NULL. Then, after that it checks for mopts.malloc_canary and for the first time it will be 0 due to the structure initialized to zero (0) and then it calls function omalloc_init() and the defination as given below:
405 static void
406 omalloc_init(void)
407 {
408 char *p, *q, b[16];
409 int i, j, mib[2];
410 size_t sb;
411
412 /*
413 * Default options
414 */
415 mopts.malloc_mutexes = 8;
416 mopts.def_malloc_junk = 1;
417 mopts.def_malloc_cache = MALLOC_DEFAULT_CACHE;
418
419 for (i = 0; i < 3; i++) {
420 switch (i) {
421 case 0:
422 mib[0] = CTL_VM;
423 mib[1] = VM_MALLOC_CONF;
424 sb = sizeof(b);
425 j = sysctl(mib, 2, b, &sb, NULL, 0);
426 if (j != 0)
427 continue;
428 p = b;
429 break;
430 case 1:
431 if (issetugid() == 0)
432 p = getenv("MALLOC_OPTIONS");
433 else
434 continue;
435 break;
436 case 2:
437 p = malloc_options;
438 break;
439 default:
440 p = NULL;
441 }
442
443 for (; p != NULL && *p != '\0'; p++) {
444 switch (*p) {
445 case 'S':
446 for (q = "CFGJ"; *q != '\0'; q++)
447 omalloc_parseopt(*q);
448 mopts.def_malloc_cache = 0;
449 break;
450 case 's':
451 for (q = "cfgj"; *q != '\0'; q++)
452 omalloc_parseopt(*q);
453 mopts.def_malloc_cache = MALLOC_DEFAULT_CACHE;
454 break;
455 default:
456 omalloc_parseopt(*p);
457 break;
458 }
459 }
460 }
461
462 #ifdef MALLOC_STATS
463 if (mopts.malloc_stats && (atexit(malloc_exit) == -1)) {
464 dprintf(STDERR_FILENO, "malloc() warning: atexit(2) failed."
465 " Will not be able to dump stats on exit\n");
466 }
467 #endif /* MALLOC_STATS */
468
469 while ((mopts.malloc_canary = arc4random()) == 0)
470 ;
471 }
OpenBSD malloc(3) has lots of security features and they are configurable using three options:
- systcl(8)
- environment variable MALLOC_OPTIONS
- compile-time option malloc_options
Suppose, for better security we would like to use the canaries, so that can be enabled by setting the option using systcl(8), for example: systcl vm.malloc_conf=C
omalloc_init()
code workings:
- Initializes some variables of malloc_readonly structure to default values like mopts.malloc_mutexes is the default number of mutexes, mopts.def_malloc_junk is the default number of junk filling and mopts.def_malloc_cache is the default number of free page cache
- Then, it checks the option which we have used to configure malloc and we have used sysctl(8). So, for sysctl it goes to case 0: and get the value to char b[16] and then it assigns to pointer to character variable p then after looping for other two it goes for extracting the value that is there in p
- further, it checks for the malloc option S which enables all the security auditing features of OpenBSD malloc(3)
- If it is there then it calls omalloc_parseopt(*q); function then after that it sets mopts.def_malloc_cache to 0 or to MALLOC_DEFAULT_CACHE, depends on whether it is S or s
- If it is not there, then it simply calls the function omalloc_parseopt(*p); omalloc_parseopt(char opt) is responsible for extracting the character and in our case it is C for malloc canaries, after parsing it goes to case ‘C’ and sets mopts.chunk_canaries to 1, it does the same for other options and set or enables (initializes) their variables.
Defination of omalloc_parseopt(char opt) as mentioned:
322 static void
323 omalloc_parseopt(char opt)
324 {
325 switch (opt) {
326 case '+':
327 mopts.malloc_mutexes <<= 1;
328 if (mopts.malloc_mutexes > _MALLOC_MUTEXES)
329 mopts.malloc_mutexes = _MALLOC_MUTEXES;
330 break;
331 case '-':
332 mopts.malloc_mutexes >>= 1;
333 if (mopts.malloc_mutexes < 2)
334 mopts.malloc_mutexes = 2;
335 break;
336 case '>':
337 mopts.def_malloc_cache <<= 1;
338 if (mopts.def_malloc_cache > MALLOC_MAXCACHE)
339 mopts.def_malloc_cache = MALLOC_MAXCACHE;
340 break;
341 case '<':
342 mopts.def_malloc_cache >>= 1;
343 break;
344 case 'c':
345 mopts.chunk_canaries = 0;
346 break;
347 case 'C':
348 mopts.chunk_canaries = 1;
349 break;
350 #ifdef MALLOC_STATS
351 case 'd':
352 mopts.malloc_stats = 0;
353 break;
354 case 'D':
355 mopts.malloc_stats = 1;
356 break;
357 #endif /* MALLOC_STATS */
358 case 'f':
359 mopts.malloc_freecheck = 0;
360 mopts.malloc_freeunmap = 0;
361 break;
362 case 'F':
363 mopts.malloc_freecheck = 1;
364 mopts.malloc_freeunmap = 1;
365 break;
366 case 'g':
367 mopts.malloc_guard = 0;
368 break;
369 case 'G':
370 mopts.malloc_guard = MALLOC_PAGESIZE;
371 break;
372 case 'j':
373 if (mopts.def_malloc_junk > 0)
374 mopts.def_malloc_junk--;
375 break;
376 case 'J':
377 if (mopts.def_malloc_junk < 2)
378 mopts.def_malloc_junk++;
379 break;
380 case 'r':
381 mopts.malloc_realloc = 0;
382 break;
383 case 'R':
384 mopts.malloc_realloc = 1;
385 break;
386 case 'u':
387 mopts.malloc_freeunmap = 0;
388 break;
389 case 'U':
390 mopts.malloc_freeunmap = 1;
391 break;
392 case 'x':
393 mopts.malloc_xmalloc = 0;
394 break;
395 case 'X':
396 mopts.malloc_xmalloc = 1;
397 break;
398 default:
399 dprintf(STDERR_FILENO, "malloc() warning: "
400 "unknown char in MALLOC_OPTIONS\n");
401 break;
402 }
403 }
As we can see that there are some MALLOC_STATS in the entire source, which means it dumps extra malloc stats information. So, coming back to omalloc_init()
, after completing omalloc_parseopt(char opt) function, there are some malloc stats condition then, a random cookie gets assign to mopts.malloc_canary
using arc4random()
Coming back to defination of _malloc_init(0)
, after the completion of omalloc_init(), there is a variable named nmutexes which gets assign bydefault to 2 for single-threaded programs, we have already seen the initalization of mopts.malloc_mutexes
default to 8
for multi-threaded programs in the function omalloc_init(). For single-threaded, the value of variable from_rthread
is 0 Then, it checks for MALLOC_PAGEMASK
bit for malloc_readonly address or simply we can say that it is checking the last 12 bits of the malloc_readonly structure address and all 12 bits must be 0 to apply the perm bit PROT_READ and PROT_WRITE. Could say from the debugger that it is also checking for even address
After that, there is a loop till the nmutexes which is 2 for our single-threaded case.
- For single-threaded programs, it maintains two malloc dir_info pools as indicates from the variable nmutexes. One for MAP_CONCEALED memory and one for regular
- For multi-threaded porgram more pools are created. This is to avoid contention, accesses to diffrent pools can run concurently.
reference:
if i == 0
responsible for MAP_CONCEALED memory through call omalloc_poolinit(&d, MAP_CONCEAL);
and else responsible for omalloc_poolinit(&d, 0);
regular memory pools.
Defination of omalloc_poolinit():
473 static void
474 omalloc_poolinit(struct dir_info **dp, int mmap_flag)
475 {
476 char *p;
477 size_t d_avail, regioninfo_size;
478 struct dir_info *d;
479 int i, j;
480
481 /*
482 * Allocate dir_info with a guard page on either side. Also
483 * randomise offset inside the page at which the dir_info
484 * lies (subject to alignment by 1 << MALLOC_MINSHIFT)
485 */
486 if ((p = MMAPNONE(DIR_INFO_RSZ + (MALLOC_PAGESIZE * 2), mmap_flag)) ==
487 MAP_FAILED)
488 wrterror(NULL, "malloc init mmap failed");
489 mprotect(p + MALLOC_PAGESIZE, DIR_INFO_RSZ, PROT_READ | PROT_WRITE);
490 d_avail = (DIR_INFO_RSZ - sizeof(*d)) >> MALLOC_MINSHIFT;
491 d = (struct dir_info *)(p + MALLOC_PAGESIZE +
492 (arc4random_uniform(d_avail) << MALLOC_MINSHIFT));
493
494 rbytes_init(d);
495 d->regions_free = d->regions_total = MALLOC_INITIAL_REGIONS;
496 regioninfo_size = d->regions_total * sizeof(struct region_info);
497 d->r = MMAP(regioninfo_size, mmap_flag);
498 if (d->r == MAP_FAILED) {
499 d->regions_total = 0;
500 wrterror(NULL, "malloc init mmap failed");
501 }
502 for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
503 LIST_INIT(&d->chunk_info_list[i]);
504 for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
505 LIST_INIT(&d->chunk_dir[i][j]);
506 }
507 STATS_ADD(d->malloc_used, regioninfo_size + 3 * MALLOC_PAGESIZE);
508 d->mmap_flag = mmap_flag;
509 d->malloc_junk = mopts.def_malloc_junk;
510 d->malloc_cache = mopts.def_malloc_cache;
511 d->canary1 = mopts.malloc_canary ^ (u_int32_t)(uintptr_t)d;
512 d->canary2 = ~d->canary1;
513
514 *dp = d;
515 }
omalloc_poolinit(struct dir_info **dp, int mmap_flag) workings
:
- After some basic variable declarations, first it maps page into memory using mmap(2) in MMAPNONE(sz, f) where sz is the size and f is the flag
96 #define MMAPNONE(sz,f) mmap(NULL, (sz), PROT_NONE, \
97 MAP_ANON | MAP_PRIVATE | (f), -1, 0)
- So, as from the above macro defination, we can see that the first parameter of mmap(2), that is, addr is NULL and as per the mmap(2) man page
The mmap() function causes the contents of fd, starting at offset, to be
mapped in memory at the given addr. The mapping will extend at least len
bytes, subject to page alignment restrictions.
The addr argument describes the address where the system should place the
mapping. If the MAP_FIXED flag is specified, the allocation will happen
at the specified address, replacing any previously established mappings
in its range. Otherwise, the mapping will be placed at the available
spot at addr; failing that it will be placed "close by".
If addr is NULL the system can pick any address. Except for MAP_FIXED mappings, the
system will never replace existing mappings.
The len argument describes the minimum amount of bytes the mapping will
span. Since mmap() maps pages into memory, len may be rounded up to hit
a page boundary. If len is 0, the mapping will fail with EINVAL.
Following are the page mapping calculations based on debugging
sz = DIR_INFO_RSZ + (MALLOC_PAGESIZE * 2)
where DIR_INFO_RSZ = ((sizeof(struct dir_info) + MALLOC_PAGEMASK) & ~MALLOC_PAGEMASK)
MALLOC_PAGEMASK = 4095 bytes
MALLOC_PAGESIZE = 4096 bytes
So, `DIR_INFO_RSZ` becomes `(4284 + 4095) & ~4095 = 8192 bytes`
and sz becomes 8192 + (4096 * 2) = 16384 bytes
and, f = MMAP_CONCEAL; which means it prevents from leaking the information and also it excludes the mapping from core dumps
- So in above page mapping calculation,
DIR_INFO_RSZ
indicates that to store thedir_info
structure we need2 pages
and allocate4 pages
PROT_NONE using MMAPNONE with flag as “MAP_CONCEAL”, which we can see from theomalloc_poolinit():L486
- Then, making two middle pages R/W which can be seen from the macro value of DIR_INFO_RSZ, 2 pages
- then, calculate the d_avail to randomise offset inside the page at which the dir_info present (subject to alignment by 1 « MALLOC_MINSHIFT)
d_avail = (8192 - 4824) >> 4 = 3368 >> 4 = 210
the d_avail is 210, now calculate offset, dd = (struct dir_info *)(p + MALLOC_PAGESIZE + (arc4random_uniform(d_avail) << MALLOC_MINSHIFT));
- So, as mentioned by the developer Otto@ in the mailing list that the dir_info structure ends up on an aligned address somewhere in the middle pages on an offset between 0 and (210«4) which further resolves in between 0 to 3360, counting from the start of the two middle pages. So, it becomes like d = [p + MALLOC_PAGESIZE + (0..3360)], where (0..3360) is the offset value from 0 to 3360 max
- from my understanding, it looks like there is no guard page bydefault enabled on the both the sides, however, it is there for only one side but after confirming the understanding with the developer Otto@, he mentioned that
dir_info is special and having a guard page on both sides for regular allocation can be done, but would waste more pages and also mentioned to note that allocations are already spread throughout the address space, so it is very likely that an allocation is surrounded by unmapped pages
. So, finally the conclusion is that there will be guard pages on each side as per the malloc design so no need to allocate extra as that will be waste of pages. - Then it initializes random bytes using the rbytes_init(d).
Defination of rbytes_init(struct dir_info *d); as mentioned:
303 static void
304 rbytes_init(struct dir_info *d)
305 {
306 arc4random_buf(d->rbytes, sizeof(d->rbytes));
307 /* add 1 to account for using d->rbytes[0] */
308 d->rbytesused = 1 + d->rbytes[0] % (sizeof(d->rbytes) / 2);
309 }
- rbytes_init() is responsible for filling the d->rbytes with the random bytes using the arc4random_buf() and keeps track/count of the rbytes used through d->rbytesused
- after this, coming back to omalloc_poolinit(), it is initializing the default values for some of the structure members, d->regions_free = dir->regions_total = MALLOC_INTIAL_REGIONS; where the value of macro MALLOC_INITIAL_REGIONS is 512. Also, calculates the total region_info size,
regioninfo_size = d->regions_total * sizeof(struct region_info);
105 struct region_info {
106 void *p; /* page; low bits used to mark chunks */
107 uintptr_t size; /* size for pages, or chunk_info pointer */
108 #ifdef MALLOC_STATS
109 void *f; /* where allocated from */
110 #endif
111 };
-
The structure struct region_info is used to keep track of mmap’ed regions by storing their address and size into a hash table as mentioned in the slides of Otto@
-
further, it maps pages of size regioninfo_size for regions with the flag MAP_CONCEAL and assigns it to d->r and then check if mapping failed
-
While understanding the internals of malloc(3), I did not understand the working of the code-snippet as mentioned:
...
...
...
502 for (i = 0; i <= MALLOC_MAXSHIFT; i++) {
503 LIST_INIT(&d->chunk_info_list[i]);
504 for (j = 0; j < MALLOC_CHUNK_LISTS; j++)
505 LIST_INIT(&d->chunk_dir[i][j]);
506 }
...
...
...
What is the purpose of creating these many linked lists?
- after discussing the query with the developer Otto@, I came to know that these many lists is used to allow more randomization, as per the words by the developer, Otto@ said More than one list of free chunk pages per chunk size is maintained to allow for more randomization.
- in short, the above nested for loops will create 12 chunk_info_list where i is 0 to 11 and for each and every ith index there is j, so as per that,
chunk_info_list[0]
chunk_dir[0][0]
chunk_dir[0][1]
chunk_dir[0][2]
chunk_dir[0][3]
...
...
...
chunk_info_list[11]
chunk_dir[11][0]
chunk_dir[11][1]
chunk_dir[11][2]
chunk_dir[11][3]
...
...
...
- also confirmed with the developer that we can also allow more randomization by increasing the MALLOC_CHUNK_LISTS but at the cost of overhead. Increasing the MALLOC_MAXSHIFT is not possible because it is the shift of the max chunk size that fits in a page. We will see later how these lists helps to achieve randomization
- then, adds and updated the d->malloc_used for dumping the malloc stats information
- further, updated the default initialization of variables given below:
...
...
...
508 d->mmap_flag = mmap_flag;
509 d->malloc_junk = mopts.def_malloc_junk;
510 d->malloc_cache = mopts.def_malloc_cache;
511 d->canary1 = mopts.malloc_canary ^ (u_int32_t)(uintptr_t)d;
512 d->canary2 = ~d->canary1;
513
514 *dp = d;
515 }
/* where it is already mentioned above code snippets that mopts.def_malloc_junk is 1 and
* mopts.def_malloc_cache is 64, already initialized default values in omalloc_init()
*/
-
then,
dir_info
uses two canaries. Its calculation is easy to understand. Whatever the address of d ( struct dir_info *d) just typecast it to (u_int32_t)(uintptr_t) and XOR it withmopts.malloc_canary
and assign the result to d->canary1 and then compute the not (~) ofd->canary1
then assign the result to d->canary2 or in simpler words, convert into binary then take each bit and flips them to their logical opposite. convert 1’s to 0’s and vice versa then convert it to hex and assign it to d->canary2 -
Finally assign the initialized dir_info structure d to *dp
Coming back to _malloc_init(int from_rthreads), so, as per the defination _malloc_init(), for i == 0, we have completed the function omalloc_poolinit(&d, MAP_CONCEAL);, now further, it assigns d->malloc_junk = 2; and d->malloc_cache = 0;
Then, it saves the index i to d->mutex and stored the initialized MAP_CONCEALED pool to mopts.malloc_pool[i] = d;
The value of nmutexes is 2, so, the loop will execute again and will do the whole same operations for index i == 1 (or i != 0). So, again it invokes omalloc_poolinit(&d, 0) but without MAP_CONCEAL flag and this time it is 0. It performs same operation with second argument as 0.
After complete execution of function omalloc_poolinit(&d, 0), it sets junk value to default junk value and same for malloc_cache, that is, d->malloc_junk = mopts.def_malloc_junk; and d->malloc_cache = mopts.def_malloc_cache;. Now, again performes the same operation like for saving the new index 1 to d->mutex and stores the pool to mopts.malloc_pool[i] = d if i == 1
Now, mopts.malloc_pool has two pools,
- mopts.malloc_pool[0]
- mopts.malloc_pool[1]
Next, it checks for multi-threading from variable from_rthreads, if the value is non-zero then it sets mopts.malloc_mt = 1 which shows that program is multi-threaded but for our case it is zero (0), so if cond fails and else cond executes and sets mopts.internal_funcs = 1, from the structure malloc_readonly, it shows that setting internal_funcs means to use better function like recallocarray/freezero but there is no such things occurs for our case, maybe we have to use it for some other scenarios like in case of, calloc, etc. recallocarray(3) - mailing list
Finally, it sets the perms to readonly, to prevent further tampering with them, again it checks last 12 bits and if they are 0 then it sets perms to PROT_READ, and malloc_init(0) completed. Now, coming back to PROGLOGUE macro code as mentioned below:
1256 #define PROLOGUE(p, fn) \
1257 d = (p); \
1258 if (d == NULL) { \
1259 _malloc_init(0); \
1260 d = (p); \
1261 } \
1262 _MALLOC_LOCK(d->mutex); \
1263 d->func = fn; \
1264 if (d->active++) { \
1265 malloc_recurse(d); \
1266 return NULL; \
1267 } \
So, we have completed the malloc_init(0) function. Remember, p refers to the function getpool(), which gets called in lineno. 1260 and due to single-threaded program, it returns mopts.malloc_pool[1], which has the regular pool, not MAP_CONCEALED one, that, further returns to d then malloc_lock and assigns fn to d->fn, here fn means func string “malloc”. It checks and incremented the d->active and doesn’t go inside the if code-flow due to 0 value of d->active
malloc(size) code from libc:
1277 void *
1278 malloc(size_t size)
1279 {
1280 void *r;
1281 struct dir_info *d;
1282 int saved_errno = errno;
1283
1284 PROLOGUE(getpool(), "malloc")
1285 r = omalloc(d, size, 0, CALLER);
1286 EPILOGUE()
1287 return r;
1288 }
From line no. 1285, after PROLOGUE it calls omalloc(d, size, CALLER). Defination of omalloc(d, size, CALLER) as mentioned:
1126 static void *
1127 omalloc(struct dir_info *pool, size_t sz, int zero_fill, void *f)
1128 {
1129 void *p;
1130 size_t psz;
1131
1132 if (sz > MALLOC_MAXCHUNK) {
1133 if (sz >= SIZE_MAX - mopts.malloc_guard - MALLOC_PAGESIZE) {
1134 errno = ENOMEM;
1135 return NULL;
1136 }
1137 sz += mopts.malloc_guard;
1138 psz = PAGEROUND(sz);
1139 p = map(pool, NULL, psz, zero_fill);
1140 if (p == MAP_FAILED) {
1141 errno = ENOMEM;
1142 return NULL;
1143 }
1144 if (insert(pool, p, sz, f)) {
1145 unmap(pool, p, psz, 0, 0);
1146 errno = ENOMEM;
1147 return NULL;
1148 }
1149 if (mopts.malloc_guard) {
1150 if (mprotect((char *)p + psz - mopts.malloc_guard,
1151 mopts.malloc_guard, PROT_NONE))
1152 wrterror(pool, "mprotect");
1153 STATS_ADD(pool->malloc_guarded, mopts.malloc_guard);
1154 }
1155
1156 if (MALLOC_MOVE_COND(sz)) {
1157 /* fill whole allocation */
1158 if (pool->malloc_junk == 2)
1159 memset(p, SOME_JUNK, psz - mopts.malloc_guard);
1160 /* shift towards the end */
1161 p = MALLOC_MOVE(p, sz);
1162 /* fill zeros if needed and overwritten above */
1163 if (zero_fill && pool->malloc_junk == 2)
1164 memset(p, 0, sz - mopts.malloc_guard);
1165 } else {
1166 if (pool->malloc_junk == 2) {
1167 if (zero_fill)
1168 memset((char *)p + sz - mopts.malloc_guard,
1169 SOME_JUNK, psz - sz);
1170 else
1171 memset(p, SOME_JUNK,
1172 psz - mopts.malloc_guard);
1173 } else if (mopts.chunk_canaries)
1174 fill_canary(p, sz - mopts.malloc_guard,
1175 psz - mopts.malloc_guard);
1176 }
1177
1178 } else {
1179 /* takes care of SOME_JUNK */
1180 p = malloc_bytes(pool, sz, f);
1181 if (zero_fill && p != NULL && sz > 0)
1182 memset(p, 0, sz);
1183 }
1184
1185 return p;
1186 }
omalloc(struct dir_info *pool, size_t sz, int zero_fill, void *f); function internals:
- There are two code flows
- if ( sz > MALLOC_MAXCHUNK). Our size is 8 as mentioned in the sample code, so this condition will not be true but still we are going to divein a little. First, it checks if the size is greater than the MALLOC_MAXCHUNK, which is 2048
- If size is less, then it uses 2nd code flow which is else condition, that is, it calls malloc_bytes(pool, sz, f); which we will see further, for now we are going to explore the case when the size is greater:
- After checking the size with MALLOC_MAXCHUNK, it further checks if the requested size is greater than SIZE_MAX (excluding mopts.malloc_guard and MALLOC_PAGESIZE) and if yes, then it sets errno to ENOMEM and returns NULL
- then if size is less, it adds the malloc_guard value to requested size and it roundsup the page to multiple of pagesize, for example: if requested size is <= 4096 then it rounds it to 4096 but if requested size is 4097 then it roundsup to 8192
- further, it maps the page to size psz with hint address is equals to NULL then check if it fails
- next, it calls insert(pool, p, sz, f); function to keep track of mmap’ed regions by storing their address and size into a hash table, we will dive more in-depth later, if it fails to insert then it unmaps the page and return ENOMEM
- it checks for malloc_guard page option, if it sets then it adds a guarded page as per the size mentioned by malloc_guard variable with PROT_NONE perms bit and then it adds the information for stats
- then, it checks for MALLOC_MOVE_COND, if the condition is true then it shifts the allocations towards the end, as mentioned in the source code comment,
70 /*
71 * We move allocations between half a page and a whole page towards the end,
72 * subject to alignment constraints. This is the extra headroom we allow.
73 * Set to zero to be the most strict.
74 */
75 #define MALLOC_LEEWAY 0
76 #define MALLOC_MOVE_COND(sz) ((sz) - mopts.malloc_guard < \
77 MALLOC_PAGESIZE - MALLOC_LEEWAY)
78 #define MALLOC_MOVE(p, sz) (((char *)(p)) + \
79 ((MALLOC_PAGESIZE - MALLOC_LEEWAY - \
80 ((sz) - mopts.malloc_guard)) & \
81 ~(MALLOC_MINSIZE - 1)))`
- after checking for condition, first, it checks for junk value, that is,pool->malloc_junk == 2 if it equals then it fills the mmap’ed page p with SOME_JUNK which has byte 0xdb, then as previously mentioned it shifts the allocation towards end with macro MALLOC_MOVE(p, sz), later it checks, if zero-filling requires, if yes then it fills the same JUNKed filled page with zeroes
- if the MALLOC_MOVE_COND condition is not true then it checks for junk value, that is, pool->malloc_junk == 2, if it equals to 2 then it checks for zero filling and if it requires then it fills from the address as mentioned in the code snippet and if zero filling not requires then it simply fills with SOME_JUNK as mentioned
...
...
...
1166 if (pool->malloc_junk == 2) {
1167 if (zero_fill)
1168 memset((char *)p + sz - mopts.malloc_guard,
1169 SOME_JUNK, psz - sz);
1170 else
1171 memset(p, SOME_JUNK,
1172 psz - mopts.malloc_guard);
-
if pool->malloc_junk == 2 is not true then it checks for canaries, that is, else if (mopts.chunk_canaries) then it calls the fill_canary() to fill the canaries
-
coming back to omalloc(), sz > MALLOC_MAXCHUNK has already covered now if requested size is less than the MALLOC_MAXCHUNK then it calls the function malloc_bytes(pool, sz, f); and the defination as given below:
948 /*
949 * Allocate a chunk
950 */
951 static void *
952 malloc_bytes(struct dir_info *d, size_t size, void *f)
953 {
954 u_int i, r;
955 int j, listnum;
956 size_t k;
957 u_short *lp;
958 struct chunk_info *bp;
959 void *p;
960
961 if (mopts.malloc_canary != (d->canary1 ^ (u_int32_t)(uintptr_t)d) ||
962 d->canary1 != ~d->canary2)
963 wrterror(d, "internal struct corrupt");
964
965 j = find_chunksize(size);
966
967 r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
968 listnum = r % MALLOC_CHUNK_LISTS;
969 /* If it's empty, make a page more of that size chunks */
970 if ((bp = LIST_FIRST(&d->chunk_dir[j][listnum])) == NULL) {
971 bp = omalloc_make_chunks(d, j, listnum);
972 if (bp == NULL)
973 return NULL;
974 }
975
976 if (bp->canary != (u_short)d->canary1)
977 wrterror(d, "chunk info corrupted");
978
979 i = (r / MALLOC_CHUNK_LISTS) & (bp->total - 1);
980
981 /* start somewhere in a short */
982 lp = &bp->bits[i / MALLOC_BITS];
983 if (*lp) {
984 j = i % MALLOC_BITS;
985 k = ffs(*lp >> j);
986 if (k != 0) {
987 k += j - 1;
988 goto found;
989 }
990 }
991 /* no bit halfway, go to next full short */
992 i /= MALLOC_BITS;
993 for (;;) {
994 if (++i >= bp->total / MALLOC_BITS)
995 i = 0;
996 lp = &bp->bits[i];
997 if (*lp) {
998 k = ffs(*lp) - 1;
999 break;
1000 }
1001 }
1002 found:
1003 #ifdef MALLOC_STATS
1004 if (i == 0 && k == 0) {
1005 struct region_info *r = find(d, bp->page);
1006 r->f = f;
1007 }
1008 #endif
1009
1010 *lp ^= 1 << k;
1011
1012 /* If there are no more free, remove from free-list */
1013 if (--bp->free == 0)
1014 LIST_REMOVE(bp, entries);
1015
1016 /* Adjust to the real offset of that chunk */
1017 k += (lp - bp->bits) * MALLOC_BITS;
1018
1019 if (mopts.chunk_canaries && size > 0)
1020 bp->bits[bp->offset + k] = size;
1021
1022 k <<= bp->shift;
1023
1024 p = (char *)bp->page + k;
1025 if (bp->size > 0) {
1026 if (d->malloc_junk == 2)
1027 memset(p, SOME_JUNK, bp->size);
1028 else if (mopts.chunk_canaries)
1029 fill_canary(p, size, bp->size);
1030 }
1031 return p;
1032 }
malloc_bytes(pool, sz, f);
is used to allocate a chunk and after some basic variable declarations, it checks for internal structure corruption with canaries of d and mopts.malloc_canary- Then, it finds the chunk size with the function find_chunksize(size);, defination as given below:
919 static int
920 find_chunksize(size_t size)
921 {
922 int r;
923
924 /* malloc(0) is special */
925 if (size == 0)
926 return 0;
927
928 if (size < MALLOC_MINSIZE)
929 size = MALLOC_MINSIZE;
930 size--;
931
932 r = MALLOC_MINSHIFT;
933 while (size >> r)
934 r++;
935 return r;
936 }
- first it will check if given size is 0 then it returns 0, then it checks if size is less than MALLOC_MINSIZE then it sets size to MALLOC_MINSIZE, else, it decrements the size and sets
r = MALLOC_MINSHIFT;
and the value of MALLOC_MINSHIFT is 4. Then it right shifts the size by r and it keeps incrementing the r till right shifting result becomes zero, then returns r. In our case, the size is 8, which is less than MALLOC_MINSIZE and then it sets the size to MALLOC_MINSIZE, which is 16, then it decrements and it becomes 15 and 15 » 4, as we know r is MALLOC_MINSHIFT, which is 4. So, the output of15 »4
is 0 then it returns r, which is 4 - coming back to
malloc_bytes()
afterfind_chunksize(size)
, it calls((u_int)getrbyte(d) << 8) | getrbyte(d);
, so, defination ofgetrbyte(d)
as given below:
311 static inline u_char
312 getrbyte(struct dir_info *d)
313 {
314 u_char x;
315
316 if (d->rbytesused >= sizeof(d->rbytes))
317 rbytes_init(d);
318 x = d->rbytes[d->rbytesused++];
319 return x;
320 }
- in the above code, first it checks whether the rbytesused is more than the sizeof(rbytes), if yes, then it initializes the random bytes using rbytes_init(d) and then assigns the value of d->rbytes[d->rbytesused] to x and also increments the d->rbytesused count and returns the same but if the condition fails then it will directly assigns the rbytesused count to x and increment the same and return the same x
- So as per the name indicates, my feeling is that the getrbyte() means get-random-byte(), basically the whole line
r = ((u_int)getrbyte(d) << 8) | getrbyte(d);
does the following magic
get-random-byte then left shits by 8 then OR'ed with again some random byte
let's suppose,
1) a = get-random-bytes
2) left shift a by 8
3) b = get-random-bytes
4) r = a | b
for example, 0xd4 -> 0xd400 -> 0xd4cb, so r = 0xd4cb
- Then it finds the listnum,
listnum = r % MALLOC_CHUNK_LISTS;
, where MALLOC_CHUNK_LISTS is 4 - as per our understanding about the for loops on randomization, we have seen that there is nested for loops which creates lists, helpful to allow randomization. So, here we can see that it randomly chooses the created lists for allocation of chunk_info
- then, it chooses some random list and makes it first list, now if the list is empty, that is, head is NULL then it calls function
omalloc_make_chunks(d, j, listnum); to allocate page of chunks. Defination of omalloc_make_chunks() as given below:
885 /*
886 * Allocate a page of chunks
887 */
888 static struct chunk_info *
889 omalloc_make_chunks(struct dir_info *d, int bits, int listnum)
890 {
891 struct chunk_info *bp;
892 void *pp;
893
894 /* Allocate a new bucket */
895 pp = map(d, NULL, MALLOC_PAGESIZE, 0);
896 if (pp == MAP_FAILED)
897 return NULL;
898
899 /* memory protect the page allocated in the malloc(0) case */
900 if (bits == 0 && mprotect(pp, MALLOC_PAGESIZE, PROT_NONE) == -1)
901 goto err;
902
903 bp = alloc_chunk_info(d, bits);
904 if (bp == NULL)
905 goto err;
906 bp->page = pp;
907
908 if (insert(d, (void *)((uintptr_t)pp | (bits + 1)), (uintptr_t)bp,
909 NULL))
910 goto err;
911 LIST_INSERT_HEAD(&d->chunk_dir[bits][listnum], bp, entries);
912 return bp;
913
914 err:
915 unmap(d, pp, MALLOC_PAGESIZE, 0, d->malloc_junk);
916 return NULL;
917 }
- The above code is used to allocate a page of chunks, as mentioned in the comments. First, it maps/allocates page up to len MALLOC_PAGESIZE with hint addr as NULL using map(d, NULL, MALLOC_PAGESIZE, 0);, then it checks for its failure and returns NULL if fails. Defination of map() as given below
751 static void *
752 map(struct dir_info *d, void *hint, size_t sz, int zero_fill)
753 {
754 size_t psz = sz >> MALLOC_PAGESHIFT;
755 struct region_info *r, *big = NULL;
756 u_int i;
757 void *p;
758
759 if (mopts.malloc_canary != (d->canary1 ^ (u_int32_t)(uintptr_t)d) ||
760 d->canary1 != ~d->canary2)
761 wrterror(d, "internal struct corrupt");
762 if (sz != PAGEROUND(sz))
763 wrterror(d, "map round");
764
765 if (hint == NULL && psz > d->free_regions_size) {
766 _MALLOC_LEAVE(d);
767 p = MMAP(sz, d->mmap_flag);
768 _MALLOC_ENTER(d);
769 if (p != MAP_FAILED)
770 STATS_ADD(d->malloc_used, sz);
771 /* zero fill not needed */
772 return p;
773 }
774 for (i = 0; i < d->malloc_cache; i++) {
775 r = &d->free_regions[(i + d->rotor) & (d->malloc_cache - 1)];
776 if (r->p != NULL) {
777 if (hint != NULL && r->p != hint)
778 continue;
779 if (r->size == psz) {
780 p = r->p;
781 r->p = NULL;
782 d->free_regions_size -= psz;
783 if (mopts.malloc_freeunmap)
784 mprotect(p, sz, PROT_READ | PROT_WRITE);
785 if (zero_fill)
786 memset(p, 0, sz);
787 else if (d->malloc_junk == 2 &&
788 mopts.malloc_freeunmap)
789 memset(p, SOME_FREEJUNK, sz);
790 d->rotor += i + 1;
791 return p;
792 } else if (r->size > psz)
793 big = r;
794 }
795 }
796 if (big != NULL) {
797 r = big;
798 p = r->p;
799 r->p = (char *)r->p + (psz << MALLOC_PAGESHIFT);
800 if (mopts.malloc_freeunmap)
801 mprotect(p, sz, PROT_READ | PROT_WRITE);
802 r->size -= psz;
803 d->free_regions_size -= psz;
804 if (zero_fill)
805 memset(p, 0, sz);
806 else if (d->malloc_junk == 2 && mopts.malloc_freeunmap)
807 memset(p, SOME_FREEJUNK, sz);
808 return p;
809 }
810 if (hint != NULL)
811 return MAP_FAILED;
812 if (d->free_regions_size > d->malloc_cache)
813 wrterror(d, "malloc cache");
814 _MALLOC_LEAVE(d);
815 p = MMAP(sz, d->mmap_flag);
816 _MALLOC_ENTER(d);
817 if (p != MAP_FAILED)
818 STATS_ADD(d->malloc_used, sz);
819 /* zero fill not needed */
820 return p;
821 }
parameters for the map() function given below:
calls map(d, NULL, MALLOC_PAGESIZE, 0);
first parameter, d belongs to struct dir_info *d
second parameter, NULL belongs to *hint
third parameter, MALLOC_PAGESIZE belongs to sz
fourth parameter, 0 indicates zero_filling requirements
-
First, map() assigns the right shifted value of sz by
MALLOC_PAGESHIFT
bytes, which is4096 » 12 => 0x1
. So, psz = 0x1 then, it defines some pointer objects for the structure region_info and some other pointer variables, later, it checks for internal struct corruption through canaries. Then, it verifies the roundness of size sz. if it passes, then it checks whether the provided hint is NULL and psz is greater than free_pages_cache. So, for our case, the d->free_regions_size is 0x0 and psz is 0x1 and the condition satisfies and it comes under if code, where if code first does MALLOC_LEAVE, which means it checks if it is multi-threaded program then decrement the d->active count and unlock malloc and returns but if it is single-threaded like for our case, then it simply returns without doing anything -
Then, it maps pages with provided flag mmap_flag and size sz. mmap_flag is 0x0, as if we remember correctly then second time getpool() returns d with mmap_flag is 0x0. Then, it does MALLOC_ENTER which is the opposite of MALLOC_LEAVE which means check if the program is multi-threaded then lock malloc and increment the d->active count but if the program is single-threaded then it does nothing and simply returns. Then , it checks whether p is failed. If not then it adds stats information, that is, increments the d->malloc_used to sz, that is, d->malloc_used += sz then it returns p
-
For our case, map() completed but other code paths may be explained later in the upcoming malloc series, as the blog has already become lengthy
-
Then, coming back to omalloc_make_chunks(), if the bits is 0 means for malloc(0) case, it memory protects the allocated page with PROT_NONE but for our case, bits is 4 and we have used malloc(8)
-
Then, it allocates the chunk info by calling the function alloc_chunk_info(d, bits); and code for the same given below:
847 static struct chunk_info *
848 alloc_chunk_info(struct dir_info *d, int bits)
849 {
850 struct chunk_info *p;
851
852 if (LIST_EMPTY(&d->chunk_info_list[bits])) {
853 size_t size, count, i;
854 char *q;
855
856 if (bits == 0)
857 count = MALLOC_PAGESIZE / MALLOC_MINSIZE;
858 else
859 count = MALLOC_PAGESIZE >> bits;
860
861 size = howmany(count, MALLOC_BITS);
862 size = sizeof(struct chunk_info) + (size - 1) * sizeof(u_short);
863 if (mopts.chunk_canaries)
864 size += count * sizeof(u_short);
865 size = _ALIGN(size);
866
867 q = MMAP(MALLOC_PAGESIZE, d->mmap_flag);
868 if (q == MAP_FAILED)
869 return NULL;
870 STATS_ADD(d->malloc_used, MALLOC_PAGESIZE);
871 count = MALLOC_PAGESIZE / size;
872
873 for (i = 0; i < count; i++, q += size) {
874 p = (struct chunk_info *)q;
875 LIST_INSERT_HEAD(&d->chunk_info_list[bits], p, entries);
876 }
877 }
878 p = LIST_FIRST(&d->chunk_info_list[bits]);
879 LIST_REMOVE(p, entries);
880 if (p->shift == 0)
881 init_chunk_info(d, p, bits);
882 return p;
883 }
- First it checks if the list is empty where bits is 4 for our case, and if it is empty then it checks if bits is 0, if yes, then it calculates the count by dividing the MALLOC_PAGESIZE with MALLOC_MINSIZE but in our case bits is equals to 4, so, it calculates count for all cases when bit != 0 by MALLOC_PAGESIZE » bits, which comes as 256 for our case
- Then, howmany(x, y) is
((x) + ((y) - 1) / y)
, so as per that, x = count and y is MALLOC_BITS, which is 256 and 16 respectively for our case and the result is 16 - Then, it calculates size from adding the sizeof(struct chunk_info) + (size - 1) * sizeof(u_short), for our case, it came as 40 + (16 - 1) * 2 = 70 and then it checks for mopt.chunk_canaries and if it is there then it calculates size += count * sizeof(u_short); remember we have compiled our sample code with malloc option C, so, chunk_canaries is there, for our case size became, size = 70 + 256 * 2 = 582 then it aligns the size to ALIGN(size), which make it from 582 to 584
- In simpler words, the above point explains the size calculations for a chunk. It includes the size = [size of structure chunk_info] + [ size of bitmap] and as we already know that the bits will be different for different chunk size, so, the size of bitmap varies as per the different bits. It also adds the size of canary, if enable
- Then, it maps page using mmap(2) upto MALLOC_PAGESIZE with mmap_flag value as 0 due to the mopts.malloc_pool[1], mopts.malloc_pool[0] has MAP_CONCEALed memory.Then, it adds information about malloc usage count from adding MALLOC_PAGESIZE to variable d->malloc_used. Then, it calculates count from count = MALLOC_PAGESIZE / size where it is 4096 / 584 = 7, for our case
- Then, it creates count number of lists for chunk_info. For our case, count is 7 so, it creates 7 lists with the list head updated with p. Then, it chooses first list or in other words we can say it chooses the last inserted head list then removes it from the lists. Then it checks for p->shift and here shift means how may shifts it will take to get the p->size, if it is zero (0) then it calls function
init_chunk_info(d, p, bits);
then after the initializaton it returns p.
Defination of the init_chunk_info()
as given below:
823 static void
824 init_chunk_info(struct dir_info *d, struct chunk_info *p, int bits)
825 {
826 int i;
827
828 if (bits == 0) {
829 p->shift = MALLOC_MINSHIFT;
830 p->total = p->free = MALLOC_PAGESIZE >> p->shift;
831 p->size = 0;
832 p->offset = 0xdead;
833 } else {
834 p->shift = bits;
835 p->total = p->free = MALLOC_PAGESIZE >> p->shift;
836 p->size = 1U << bits;
837 p->offset = howmany(p->total, MALLOC_BITS);
838 }
839 p->canary = (u_short)d->canary1;
840
841 /* set all valid bits in the bitmap */
842 i = p->total - 1;
843 memset(p->bits, 0xff, sizeof(p->bits[0]) * (i / MALLOC_BITS));
844 p->bits[i / MALLOC_BITS] = (2U << (i % MALLOC_BITS)) - 1;
845 }
init_chunk_info()
is used to initialize the chunk information. It checks for whether the value of bits is 0 or not and if 0 then it initializes some default value as mentioned in the code and it seems that bit == 0, is the case of malloc(0) but if it is not 0 then it initializes some other values as mentioned in the code. Then, it assigns dir_info canary1 to chunk canary with typecasts in u_short, that isp->canary = (u_short)d->canary1;
- Then, set all the valid bits in the bitmap. For our case, i = p->total - 1;, where p->total = 256 and i = 255 then it copies the 0xff to p->bits upto sizeof(p->bits[0]) * (i / MALLOC_BITS)) - 1 which we have observed 30 for our case.
- Then it assigns some value to p->bits, where a bitmap specifing which chunks are still free to use, for better explanation on p->bits, for our case, i = 255, MALLOC_BITS = 16, so, it becomes,
p->bits[255 / 16] = (2U << ( 255 % 16)) - 1
then, p->bits[15] = (2U << ( 15 )) - 1
then, p->bits[15] = 65536 - 1
then, p->bits[15] = 65535
- Now, the chunk initializaton is completed, we can see that after the function init_chunk_info(d, p, bits);, it returns the initialized chunk p
- after the completion of both the functions init_chunk_info() and alloc_chunk_info(), it returns the allocated and initialized chunk to the function caller function omalloc_make_chunks(), returns the new initialized and allocated chunk p to bp and after lineno. 903 it checks bp for NULL and if bp is not NULL then it assigns the new page that it creates on line no. 895, that is, bp->page = pp;. Then, it invokes insert(d, (void *)((uintptr_t)pp | (bits + 1)), (uintptr_t)bp, NULL) and as we have previously understood that insert() is use to keep track of regions by keeping their address and size into a hash table. Defination for insert(struct dir_info *d, void *p, size_t sz, void *f) as given below:
564 /*
565 * The hashtable uses the assumption that p is never NULL. This holds since
566 * non-MAP_FIXED mappings with hint 0 start at BRKSIZ.
567 */
568 static int
569 insert(struct dir_info *d, void *p, size_t sz, void *f)
570 {
571 size_t index;
572 size_t mask;
573 void *q;
574
575 if (d->regions_free * 4 < d->regions_total) {
576 if (omalloc_grow(d))
577 return 1;
578 }
579 mask = d->regions_total - 1;
580 index = hash(p) & mask;
581 q = d->r[index].p;
582 STATS_INC(d->inserts);
583 while (q != NULL) {
584 index = (index - 1) & mask;
585 q = d->r[index].p;
586 STATS_INC(d->insert_collisions);
587 }
588 d->r[index].p = p;
589 d->r[index].size = sz;
590 #ifdef MALLOC_STATS
591 d->r[index].f = f;
592 #endif
593 d->regions_free--;
594 return 0;
595 }
- we can see that initially it checks for whether the slots is too filled, which means more than 75% (>75%), if yes then it calls
omalloc_grow(d);
and returns 1, if no then means it is less 75% (<75%) then it decrements d->regions_total by 1 and assign it to mask variable. Then it calculates, index, through anding (&) a hash(p) function and mask. Defination of hash(p) function given below:
254 static inline size_t
255 hash(void *p)
256 {
257 size_t sum;
258 uintptr_t u;
259
260 u = (uintptr_t)p >> MALLOC_PAGESHIFT;
261 sum = u;
262 sum = (sum << 7) - sum + (u >> 16);
263 #ifdef __LP64__
264 sum = (sum << 7) - sum + (u >> 32);
265 sum = (sum << 7) - sum + (u >> 48);
266 #endif
267 return sum;
268 }
- hash(p) - after doing some shifting, it returns the sum. Coming back to insert(), d->r refers to the structure struct region_info, which has address and size to maintain mmap’ed regions into hash table. After that it assigns the already existing address to q, that is, q = d->r[index].p. If q is not NULL then it again calulcates the index and then finds the q and increment the collisions information for stats. if q is NULL, then it simply keeps the new addr p to struct region_info, same for size, then it decrements the free slots count
- then, coming back to omalloc_make_chunks(), it inserts list head using LIST_INSERT_HEAD(&d->chunk_dir[bits][listnum], bp, entries);, where bits is 4 for our case and listnum is random for each invocaton then returns the associated meta-info, bp with the allocated-initialized chunk information and page of chunks, bp->page
- as per my understanding, from the malloc_bytes the bp seems to be the allocated chunks in a page, and pp seems to be the bucket of chunks or can say a page of chunks as we can see from the omalloc_make_chunks():L895
- now, coming back to the malloc_bytes(), it checks for chunk canary corruption on line no. 976, if(bp->canary != (u_short)d->canary1), if the condition is true then calls wrterror() with the string “chunk info corrupted” and then calls abort()
- if the condition is false then it calculates i, i = (r / MALLOC_CHUNK_LISTS) & (bp->total - 1);, where r is the random byte, MALLOC_CHUNK_LISTS is 4 and bp->total is the total no. of chunks. Then, after calculating the i it calculates the index of the chunk which is free, as if we remember bp->bits is responsible for tracking the free chunks to use.
Found one of the interesting old paper which provides some details about the phkmalloc internals. According to the paper, it is mentioned that
struct pginfo {
struct pginfo *next; /* next on the free list */
void *page; /* Pointer to the page */
u_short size; /* size of this page's chunks */
u_short shift; /* How far to shift for this size chunks */
u_short free; /* How many free chunks */
u_short total; /* How many chunk */
u_int bits[1]; /* Which chunks are free */
};
- where the next field is a pointer to the next structure in the list, or 0 if the element is the last of the list.
- The page field points to the beginning of the page (it is always a multiple of malloc_pagesize).
- The size field is set to the size in bytes of the chunks contained in this page.
- The free field is the number of free chunks in the page.
- The total field is the sum of the number of free chunks in the page and of the number of allocated chunks in the page. The page is freed (ie. returned to the lower layer) whenever ( free == total ) becomes true.
- the bits field is a variable size array indicating which chunks in the
page are free chunks. Each chunk is associated with a unique bit in the field,
namely the bit obtained when applying the ( 1 « n ) mask to bits[i] where:
( i * MALLOC_BITS ) + n = ( chunk_address & malloc_pagemask ) / chunk_size
MALLOC_BITS is the number of bits a u_int has on the specific architecture where phkmalloc is using. It is defined as follows:
#define MALLOC_BITS ((int)(8*sizeof(u_int)))
This bit is set to one when the chunk is free, and to 0 when the chunk is in use. Of course, the effective size of the bits field will depend on the number of chunks in the page. the calculation given below computes the offset of the chunk in the page
...
...
...
981 /* start somewhere in a short */
982 lp = &bp->bits[i / MALLOC_BITS];
983 if (*lp) {
984 j = i % MALLOC_BITS;
985 k = ffs(*lp >> j);
986 if (k != 0) {
987 k += j - 1;
988 goto found;
989 }
990 }
991 /* no bit halfway, go to next full short */
992 i /= MALLOC_BITS;
993 for (;;) {
994 if (++i >= bp->total / MALLOC_BITS)
995 i = 0;
996 lp = &bp->bits[i];
997 if (*lp) {
998 k = ffs(*lp) - 1;
999 break;
1000 }
1001 }
1002 found:
1003 #ifdef MALLOC_STATS
1004 if (i == 0 && k == 0) {
1005 struct region_info *r = find(d, bp->page);
1006 r->f = f;
1007 }
1008 #endif
1009
1010 *lp ^= 1 << k;
1011
1012 /* If there are no more free, remove from free-list */
1013 if (--bp->free == 0)
1014 LIST_REMOVE(bp, entries);
1015
1016 /* Adjust to the real offset of that chunk */
1017 k += (lp - bp->bits) * MALLOC_BITS;
1018
1019 if (mopts.chunk_canaries && size > 0)
1020 bp->bits[bp->offset + k] = size;
1021
1022 k <<= bp->shift;
1023
1024 p = (char *)bp->page + k;
1025 if (bp->size > 0) {
1026 if (d->malloc_junk == 2)
1027 memset(p, SOME_JUNK, bp->size);
1028 else if (mopts.chunk_canaries)
1029 fill_canary(p, size, bp->size);
1030 }
1031 return p;
1032 }
- It assigns the address of bp->bits[13] to lp, for our case i = 0xd3 (or 211) and MALLOC_BITS is 16 then if *lp is not null then it assigns i % MALLOC_BITS to j, which is j = 3, then it goes to find the first set bit and assign it to k, k = ffs(*lp » j);, ffs() defination as given below
1 /* ffs -- Find the first bit set in the parameter
2
3 @deftypefn Supplemental int ffs (int @var{valu})
4
5 Find the first (least significant) bit set in @var{valu}. Bits are
6 numbered from right to left, starting with bit 1 (corresponding to the
7 value 1). If @var{valu} is zero, zero is returned.
8
9 @end deftypefn
10
11 */
12
13 int
14 ffs (register int valu)
15 {
16 register int bit;
17
18 if (valu == 0)
19 return 0;
20
21 for (bit = 1; !(valu & 1); bit++)
22 valu >>= 1;
23
24 return bit;
25 }
- For our case,
- *lp is 0xffff and j is 3, so the outcome of operation *lp » j is 8191 and ffs(8191) is 1 which returns to k and if k != 0 then k = k + j - 1 => k = 1 + 3 - 1 => k = 3, then goto found
- lp = *lp ^ 1 « k, then 0xffff ^ 1 « 0x3 => *lp = 0xfff7. Here, it sets the bits of the bitmap. Then, if bp->free is 0 then remove lists bp and decrement the bp->free value, if not 0 then also decrement the bp->free and k = k + (lp - bp->bits) * MALLOC_BITS; => k = 0xd3 (or 211). Then, check if mopts.chunk_canaries is enable and size > 0, if yes, then bp->bits[bp->offset + k] = size => bp->bits[16 + 211] = 0x8
- Then, k «= bp->shift which means k = k « bp->shift => k = 0xd3 (or 211) * 16 => k = 0xd30 (or 3376). Then, calculate offset in the page, p = (char *)bp->page + k;, for our case, bp->page is 0xcde2d918000 and k is 0xd30 => p = 0xcde2d918d30. Then it checks if the bp->size > 0, if true then check for more junking through, d->malloc_junk == 2, if more junking set, then it copies SOME_JUNK to p till the bp->size, but if no more_junking option set then it checks for mopt.chunk_canaries, and if canary is enable then, fill_canary functions calls with p, size and bp->size
- Here, in short, information about the offset calculations
- bp->page or pp is the page of chunks and bp is the associated meta-info
- k is the chunk number, before shiting, which means k is the index for the chunk in the page bp->page
- After shifting, k becomes the byte offset of the chunk inside the bp->page page
fill_canary()
function as given below
938 static void
939 fill_canary(char *ptr, size_t sz, size_t allocated)
940 {
941 size_t check_sz = allocated - sz;
942
943 if (check_sz > CHUNK_CHECK_LENGTH)
944 check_sz = CHUNK_CHECK_LENGTH;
945 memset(ptr + sz, SOME_JUNK, check_sz);
946 }
- For our case, check_sz is 16 - 8, which further resolves down to check_sz = 8, then check for maximum chunk length and if greater than CHUNK_CHECK_LENGTH then sets the CHUNK_CHECK_LENGTH to check_sz and then copy the SOME_JUNK to ptr + sz till size check_sz and further ptr + sz becomes
0xcde2d918d38
, where ptr=0xcde2d918d30 and sz=0x8 copies SOME_JUNK till check_sz which has 0x10 length. Now, if in the case of heap-overflow, if any buffer tries to write more than size sz, that is, 0x8 then it corrupts the SOME_JUNK value and that detects as heap-overflow attack and abort(). Rest about canary validation we will see on further series on malloc family calls - Then, finally it returns the page + offset address filled with canary
- Now, the malloc_bytes() is completed and as we can see further in omalloc() defination, after the malloc_bytes() function call, it checks for zero_filling and also checks if p != NULL and sz > 0 and if all conditions meets then it copies zero to entire address p till size sz and returns p
- coming back to malloc(size_t size), after completing the omalloc() function, it expands EPILOGUE() code. Following the code explains about the EPILOGUE() macro
1269 #define EPILOGUE() \
1270 d->active--; \
1271 _MALLOC_UNLOCK(d->mutex); \
1272 if (r == NULL && mopts.malloc_xmalloc) \
1273 wrterror(d, "out of memory"); \
1274 if (r != NULL) \
1275 errno = saved_errno; \
- above code shows what EPILOGUE() does
- It decrements the active count
- Unlock the locked stuff
- Then, it checks if r is NULL and also checks for mopts.malloc_xmalloc then it “prints out of memory” as error and abort() but if r is not NULL then it saves saved_errno to errno variable and completes the epilogue() process
1277 void *
1278 malloc(size_t size)
1279 {
1280 void *r;
1281 struct dir_info *d;
1282 int saved_errno = errno;
1283
1284 PROLOGUE(getpool(), "malloc")
1285 r = omalloc(d, size, 0, CALLER);
1286 EPILOGUE()
1287 return r;
1288 }
- after EPILOGUE(), it returns the r, which has the address value, 0xcde2d918d30, from the malloc_bytes():L1024 - (char *)bp->page + k
- coming back to our sample code we have buff1 = (char *)malloc(8); and after that it copies the user controlled input to buff1 through vulnerable string function, strcpy() and then calls free(buff1).
Sample code showing the canary overwrite and detection:
openbsd# LD_PRELOAD=/usr/src/lib/libc/obj/libc.so.95.1 gdb -q sample
(gdb) br main
Breakpoint 1 at 0x1363: file sample.c, line 7.
(gdb) r AAAABBBBCCCCDDDD
Starting program: /root/test/sample AAAABBBBCCCCDDDD
Breakpoint 1 at 0x4517aff7363: file sample.c, line 7.
Error while reading shared library symbols:
Dwarf Error: wrong version in compilation unit header (is 4, should be 2) [in module /usr/libexec/ld.so]
Breakpoint 1, main (argc=2, argv=0x7f7ffffee678) at sample.c:7
7 char *buff1 = NULL;
Current language: auto; currently minimal
(gdb) n
8 buff1 = (char *)malloc(8);
(gdb) n
9 strcpy(buff1, argv[1]);
(gdb) x/16x buff1
0x453ec9b6fe0: 0x00000000 0x00000000 0xdbdbdbdb 0xdbdbdbdb
0x453ec9b6ff0: 0x00000000 0x00000000 0x00000000 0x00000000
0x453ec9b7000: Cannot access memory at address 0x453ec9b7000
(gdb) n
10 free(buff1);
(gdb) x/16x buff1
0x453ec9b6fe0: 0x41414141 0x42424242 0x43434343 0x44444444
0x453ec9b6ff0: 0x00000000 0x00000000 0x00000000 0x00000000
0x453ec9b7000: Cannot access memory at address 0x453ec9b7000
(gdb) c
Continuing.
sample(3165) in free(): chunk canary corrupted 0x453ec9b6fe0 0x8@0x8
Program received signal SIGABRT, Aborted.
thrkill () at -:3
3 -: No such file or directory.
in -
Current language: auto; currently asm
(gdb)
- From the above debugging, we can see that just after the malloc(3) allocation there are canaries with 0xdb bytes, which is filled by the fill_canary() which we have already seen and understood.
- Then, after the memory allocation that is explained above, it copies the user controlled input to that memory area and after giving string AAAABBBBCCCCDDDD, we can see that the canaries are corrupted as it has overwritten the 0xdb bytes with the 0x43 and 0x44 which is C and D then next it calls free(buff1), which is responsible to validate canaries and calls abort() if corrupted with the information like chunk address and size.
- So, validation of canaries in user-related chunk is validated in free() which we will cover in the further series of malloc(3) friends library calls
References:
- malloc(3) man page
- OpenBSD community - mailing list
- A new malloc(3) for OpenBSD slides by Otto Moerbeek
- BSD Heap Smashing, paper by bbp@krukh.net on phkmalloc [from security point of view]
- GitHub code history for old commit messages
- information on all the commits in mailing list
- phkmalloc, paper by Poul-Henning Kamp
- malloc(3) code series by Adam Wolk and @DuClare
- discussions on twitter with developers
- Google :)
Finally!! It was a bit long journey. If something is missing or not correct, please feel free to update.
Again, a huge thanks to all OpenBSD community and especially Otto@ for helping me to understand the malloc internals and patiently listening my queries :).
Happy Kernel Hacking