Hi there,

Have you ever thought what happens internally when we execute any command/executable from the command line?

We will be diving in to the ocean of kernel code internals and working of the process lifecycle (creation) in OpenBSD operating system from user-space to kernel space.

We will execute the ls command from the command line then trace it through the debugger to see the magic.

Following are the basic lifecycle of a process,

  • creation
  • execution
  • exit

For this blog, we will be limiting the focuss to only the creation of the process.

Debugger used: GDB to debug the kernel code.

Whenever we execute any command from the cli, for example ls, the parent process is ksh because the default shell on OpenBSD is ksh which invokes ls command or any other command.

So mostly, every process is created by sys_fork() function responsible for creating new processes (child process) which further calls fork1(9) function:

based on man-page fork1(9) — OpenBSD kernel developer’s manual

#include <sys/types.h>
#include <sys/proc.h>

int
fork1(struct proc *p1, int flags, void (*func)(void *), void *arg, register_t *retval, struct proc **rnewprocp);

fork1(9) creates a new process out of p1, which should be the current thread. This function is used primarily to implement the fork(2) and vfork(2) system calls, as well as the kthread_create(9) function.

Life cycle of a process (in brief):

“ls” → fork(2) → sys_fork() → fork1() → sys_execve() → sys_exit() → exit1()

working of fork1(): after “ls” from user-space it goes to fork() from libc and then from there to sys_fork()

int
sys_fork(struct proc *p, void *v, register_t *retval)
{
    int flags;

    flags = FORK_FORK;
    if (p->p_p->ps_ptmask & PTRACE_FORK)
        flags |= FORK_PTRACE;
    return fork1(p, flags, fork_return, NULL, retval, NULL);
}
FORK_FORK: It is a macro which defines that the call is done by the fork(2). Used only for statistics.

#define FORK_FORK 0x00000001
So, the value of “flags” variable is set to 1 , because the call is done by fork(2)
check for PTRACE and if enable then update the flags with FORK_PTRACE else leave it and return to the fork1()

fork1() initial code

int
fork1(struct proc *curp, int flags, void (*func)(void *), void *arg,
    register_t *retval, struct proc **rnewprocp)
{
    struct process *curpr = curp->p_p;
    struct process *pr;
    struct proc *p;
    uid_t uid = curp->p_ucred->cr_ruid;
    struct vmspace *vm;
    int count;
    vaddr_t uaddr;
    int error;
    struct  ptrace_state *newptstat = NULL;

    KASSERT((flags & ~(FORK_FORK | FORK_VFORK | FORK_PPWAIT | FORK_PTRACE
        | FORK_IDLE | FORK_SHAREVM | FORK_SHAREFILES | FORK_NOZOMBIE
        | FORK_SYSTEM | FORK_SIGHAND)) == 0);
    KASSERT((flags & FORK_SIGHAND) == 0 || (flags & FORK_SHAREVM));
    KASSERT(func != NULL);

    if ((error = fork_check_maxthread(uid)))
        return error;

    if ((nprocesses >= maxprocess - 5 && uid != 0) ||
        nprocesses >= maxprocess) {
        static struct timeval lasttfm;

        if (ratecheck(&lasttfm, &fork_tfmrate))
            tablefull("process");
        nthreads--;
        return EAGAIN;
    }
    nprocesses++;
...
...
...
  • From the above code-snippet, curp->p_p->ps_comm is “ksh”, which is a parent process that will fork “ls”
  • Initially some process structures, then, setting uid = curp->p_ucred->cr_ruid
  • Then, the structure for process address space information
  • Then, some variables and ptrace_state structure and then the condition which is checking for flag values using KASSERT.
  • fork_check_maxthread(uid): it is used to the check or track the number of threads invoked by the specific uid
  • It checks the number of threads invoked by specific uid shouldn’t be greater than the number of maximum threads allowed or for maxthread 5 because the last 5 process from the maxthread is reserved for the root
  • If it is greater than defined maxthread or maxthread — 5, it will print the messagetablefullonce every 10 seconds. Else, it will increment the number of threads

Defination of fork_check_maxthread(uid)

int
fork_check_maxthread(uid_t uid)
{
    /*
     * Although process entries are dynamically created, we still keep
     * a global limit on the maximum number we will create. We reserve
     * the last 5 processes to root. The variable nprocesses is the
     * current number of processes, maxprocess is the limit.  Similar
     * rules for threads (struct proc): we reserve the last 5 to root;
     * the variable nthreads is the current number of procs, maxthread is
     * the limit.
     */
    if ((nthreads >= maxthread - 5 && uid != 0) || nthreads >= maxthread) {
        static struct timeval lasttfm;

        if (ratecheck(&lasttfm, &fork_tfmrate))
            tablefull("proc");
        return EAGAIN;
    }
    nthreads++;

    return 0;
}
  • after fork_check_thread(), similar code logic is implemented for tracking process as visible from the fork1(9) defination

fork1(9) defination continued…

...
...
    /*
     * Increment the count of processes running with this uid.
     * Don't allow a nonprivileged user to exceed their current limit.
     */
    count = chgproccnt(uid, 1);
    if (uid != 0 && count > lim_cur(RLIMIT_NPROC)) {
        (void)chgproccnt(uid, -1);
        nprocesses--;
        nthreads--;
        return EAGAIN;
    }

    uaddr = uvm_uarea_alloc();
    if (uaddr == 0) {
        (void)chgproccnt(uid, -1);
        nprocesses--;
        nthreads--;
        return (ENOMEM);
    }

    /*
     * From now on, we're committed to the fork and cannot fail.
     */
    p = thread_new(curp, uaddr);
    pr = process_new(p, curpr, flags);

    p->p_fd     = pr->ps_fd;
    p->p_vmspace    = pr->ps_vmspace;
    if (pr->ps_flags & PS_SYSTEM)
        atomic_setbits_int(&p->p_flag, P_SYSTEM);

    if (flags & FORK_PPWAIT) {
        atomic_setbits_int(&pr->ps_flags, PS_PPWAIT);
        atomic_setbits_int(&curpr->ps_flags, PS_ISPWAIT);
    }
...
...
  • It is changing the count of threads for a specific user via chgproccnt(uid,1) and the defination given below:
/*
 * Change the count associated with number of threads
 * a given user is using.
 */
int
chgproccnt(uid_t uid, int diff)
{
    struct uidinfo *uip;
    long count;

    uip = uid_find(uid);
    count = (uip->ui_proccnt += diff);
    uid_release(uip);
    if (count < 0)
        panic("chgproccnt: procs < 0");
    return count;
}
  • struct uidinfo structure maintains every uid resource consumption counts including the process count and socket buffer space usage.

  • uid_find(uid) function looks up and returns the uidinfo structure for uid. If no uidinfo structure exists for uid, a new structure will be allocated and initialized.

  • Then, it increments the number of processes, that is, ui_proccnt by diff then returns count. coming back to fork1(9) code after chgproccnt(), it checks for the non-privileged “uid” and also checks for the number of process is greater than the soft limit of resources which is 0x7fffffffffffffff based on the observation through debugging.

  • If non-privileged is allowed and the count is increased by the maximum resource limit, it will decrease the count via chgproccnt() by passing -1 as diff parameter and also decrease the number of processes and threads.

  • Next, uvm_uarea_alloc() function allocates a thread’s uarea, the memory where its kernel stack and PCB are stored.

  • Now, it checks if the “uaddr” variable doesn’t contain any thread’s address, if it is zero, then it decrements the count of the number of process and thread.

  • there are some important functions:

    • thread_new(struct proc *parent, vaddr_t uaddr)
    • process_new(struct proc *p, struct process *parent, int flags)
/*
* Allocate and initialize a thread (proc) structure, given the parent thread.
*/
struct proc *
thread_new(struct proc *parent, vaddr_t uaddr)
{
   struct proc *p; 

   p = pool_get(&proc_pool, PR_WAITOK);
   p->p_stat = SIDL;           /* protect against others */
   p->p_flag = 0;
   p->p_limit = NULL;

   /*
    * Make a proc table entry for the new process.
    * Start by zeroing the section of proc that is zero-initialized,
    * then copy the section that is copied directly from the parent.
    */
   memset(&p->p_startzero, 0,
       (caddr_t)&p->p_endzero - (caddr_t)&p->p_startzero);
   memcpy(&p->p_startcopy, &parent->p_startcopy,
       (caddr_t)&p->p_endcopy - (caddr_t)&p->p_startcopy);
   crhold(p->p_ucred);
   p->p_addr = (struct user *)uaddr;

   /*
    * Initialize the timeouts.
    */
   timeout_set(&p->p_sleep_to, endtsleep, p);

   /*
    * set priority of child to be that of parent
    * XXX should move p_estcpu into the region of struct proc which gets
    * copied.
    */
   scheduler_fork_hook(parent, p);

#ifdef WITNESS
   p->p_sleeplocks = NULL;
#endif

#if NKCOV > 0
   p->p_kd = NULL;
#endif

   return p;
}
       }
  • in the thread_new(), we will get our user-space process, that is, in our case “ls”. The process gets retrieved from the pool of processess, that is, proc_pool through pool_get() function.
  • then, we set the state of the thread to be SIDL , which means that the process/thread is being created by fork(2), then set p→p_flag = 0
  • then, it zeroes the section of “proc”. the below code snippet from “sys/proc.h”
/* The following fields are all zeroed upon creation in fork. */
#define p_startzero p_dupfd
    int p_dupfd;     /* Sideways return value from filedescopen. XXX */

    long    p_thrslpid; /* for thrsleep syscall */

    /* scheduling */
    u_int   p_estcpu;       /* [s] Time averaged val of p_cpticks */
    int p_cpticks;   /* Ticks of cpu time. */
    const volatile void *p_wchan;   /* [s] Sleep address. */
    struct  timeout p_sleep_to;/* timeout for tsleep() */
    const char *p_wmesg;        /* [s] Reason for sleep. */
    fixpt_t p_pctcpu;       /* [s] %cpu for this thread */
    u_int   p_slptime;      /* [s] Time since last blocked. */
    u_int   p_uticks;       /* Statclock hits in user mode. */
    u_int   p_sticks;       /* Statclock hits in system mode. */
    u_int   p_iticks;       /* Statclock hits processing intr. */
    struct  cpu_info * volatile p_cpu; /* [s] CPU we're running on. */

    struct  rusage p_ru;        /* Statistics */
    struct  tusage p_tu;        /* accumulated times. */
    struct  timespec p_rtime;   /* Real time. */

    int  p_siglist;     /* Signals arrived but not delivered. */

/* End area that is zeroed on creation. */
  • the above code snippet: all the variables will be zeroed via memset() upon creation in the fork. then, they are copying the section from parent→p_startcopy to p→p_startcopy using memcpy. Following are the field members that will be copied.
#define p_startcopy p_sigmask
    sigset_t p_sigmask; /* Current signal mask. */

    u_char  p_priority; /* [s] Process priority. */
    u_char  p_usrpri;   /* [s] User-prio based on p_estcpu & ps_nice. */
    int p_pledge_syscall;   /* Cache of current syscall */

    struct  ucred *p_ucred;     /* cached credentials */
    struct  sigaltstack p_sigstk;   /* sp & on stack state variable */

    u_long  p_prof_addr;    /* tmp storage for profiling addr until AST */
    u_long  p_prof_ticks;   /* tmp storage for profiling ticks until AST */

/* End area that is copied on creation. */
#define p_endcopy   p_addr
  • coming back to thread_new(), crhold(p->p_ucred) functions will increment the reference count in struct ucred structure, that is, p->p_ucred->cr_ref++.
  • then, typecast the thread’s addr, that is, (struct user *)uaddr and save it in kernel’s virtual addr of u-area.
  • Now, it will initialize the timeout.
  • dummy function to show the timeout_set function working. timeout_set(timeout, b, argument);:
void
timeout_set(struct timeout *new, void (*fn)(void *), void *arg)
{
        new->to_func = fn;
        new->to_arg = arg;
        new->to_flags = TIMEOUT_INITIALIZED;
}
  • above function will initialize the timeout struture and call the function b with argument.
  • scheduler_fork_hook(parent, p): It is a macro which will update the p_estcpu of child from parent’s p_estcpu.
  • p_estcpu holds an estimate of the amount of CPU that the process has used recently
/* Inherit the parent’s scheduler history */
#define scheduler_fork_hook(parent, child) do {    \
 (child)->p_estcpu = (parent)->p_estcpu;           \
} while (0)
  • then, return the newly created thread p.
  • another important function is process_new() which creates the process in a similar fashion like thread_new().

Defination of process_new():

/*
 * Allocate and initialize a new process.
 */
struct process *
process_new(struct proc *p, struct process *parent, int flags)
{
    struct process *pr;

    pr = pool_get(&process_pool, PR_WAITOK);

    /*
     * Make a process structure for the new process.
     * Start by zeroing the section of proc that is zero-initialized,
     * then copy the section that is copied directly from the parent.
     */
    memset(&pr->ps_startzero, 0,
        (caddr_t)&pr->ps_endzero - (caddr_t)&pr->ps_startzero);
    memcpy(&pr->ps_startcopy, &parent->ps_startcopy,
        (caddr_t)&pr->ps_endcopy - (caddr_t)&pr->ps_startcopy);

    process_initialize(pr, p);
    pr->ps_pid = allocpid();
    lim_fork(parent, pr);

    /* post-copy fixups */
    pr->ps_pptr = parent;

    /* bump references to the text vnode (for sysctl) */
    pr->ps_textvp = parent->ps_textvp;
    if (pr->ps_textvp)
        vref(pr->ps_textvp);

    /* copy unveil if unveil is active */
    unveil_copy(parent, pr);

    pr->ps_flags = parent->ps_flags &
        (PS_SUGID | PS_SUGIDEXEC | PS_PLEDGE | PS_EXECPLEDGE | PS_WXNEEDED);
    if (parent->ps_session->s_ttyvp != NULL)
        pr->ps_flags |= parent->ps_flags & PS_CONTROLT;
...
...
}
  • the initials of process_new() is similar to thread_new() like select process from process_pool via pool_get() then zeroing using memset and copying using memcpy. Next, the initialization of process using process_initialize(), defination as given below:
/*
 * Initialize common bits of a process structure, given the initial thread.
 */
void
process_initialize(struct process *pr, struct proc *p)
{
    /* initialize the thread links */
    pr->ps_mainproc = p;
    TAILQ_INIT(&pr->ps_threads);
    TAILQ_INSERT_TAIL(&pr->ps_threads, p, p_thr_link);
    pr->ps_refcnt = 1;
    p->p_p = pr;

    /* give the process the same creds as the initial thread */
    pr->ps_ucred = p->p_ucred;
    crhold(pr->ps_ucred);
    KASSERT(p->p_ucred->cr_ref >= 2);   /* new thread and new process */

    LIST_INIT(&pr->ps_children);
    LIST_INIT(&pr->ps_ftlist);
    LIST_INIT(&pr->ps_kqlist);
    LIST_INIT(&pr->ps_sigiolst);

    mtx_init(&pr->ps_mtx, IPL_MPFLOOR);

    timeout_set(&pr->ps_realit_to, realitexpire, pr);
    timeout_set(&pr->ps_rucheck_to, rucheck, pr);
}

process_initialize() internals:

  • ps_mainproc : It is the original and main thread in the process. It’s only special for handling of p_xstat, some signals and ptrace behaviours that need to be fixed.
  • Copy initial thread, that is, p to pr->mainproc
  • Initialize the queue with referenced by head. Here, head is pr→ps_threads. Then, Insert “elm” at the TAIL of the queue. Here, elm is p .
  • set the number of references to 1, that is, pr->ps_refcnt = 1
  • copy the process pr to the process of initial thread.
  • set the same creds for process as the initial thread.
  • condition check for the new thread and the new process via KASSERT.
  • Initialize the List referenced by head. Here, head is pr->ps_children
  • Again, initialize timeout. (for detail, see thead_new)
  • after the process initialization, pid allocation takes place. ps→ps_pid = allocpid(); where allocpid() returns unused pid
  • allocpid() internally calls the arc4random_uniform() which further calls arc4random() then through arc4random() a completely randomized number is returned which is used as pid.
  • then, for the availability of pid (or for unused pid) it verifies that whether the new pid is already taken or not by any process. It verifies this one by one in the process, process groups, and zombie process by using function ispidtaken(pid_t pid) which internally calls following functions:
    • prfind(pid_t pid) : Locate a process by number
    • pgfind(pid_t pgid) : Locate a process group by number
    • zombiefind(pid_t pid :Locate a zombie process by number
/*
 * Checks for current use of a pid, either as a pid or pgid.
 */
pid_t oldpids[128];
int
ispidtaken(pid_t pid)
{
    uint32_t i;

    for (i = 0; i < nitems(oldpids); i++)
        if (pid == oldpids[i])
            return (1);

    if (prfind(pid) != NULL)
        return (1);
    if (pgfind(pid) != NULL)
        return (1);
    if (zombiefind(pid) != NULL)
        return (1);
    return (0);
}

/* Find an unused pid */
pid_t
allocpid(void)
{
    static pid_t lastpid;
    pid_t pid;

    if (!randompid) {
        /* only used early on for system processes */
        pid = ++lastpid;
    } else {
        /* Find an unused pid satisfying lastpid < pid <= PID_MAX */
        do {
            pid = arc4random_uniform(PID_MAX - lastpid) + 1 +
                lastpid;
        } while (ispidtaken(pid));
    }

    return pid;
}
  • store the pointer to parent process in pr→ps_pptr
  • Increment the number of references count in process limit structure, that is, struct plimit
  • Store the vnode of executable of parent into pr→ps_textvp ,that is,pr→ps_textvp = parent→ps_textvp;
if (prps_textvp)
        vref(prps_textvp); /* vref --> vnode reference */
  • above code snippet means, if valid vnode found then increment the v_usecount++ variable inside the struct vnode structure of the executable. Now, the calculation for setting up process flags:
prps_flags = parent ps_flags & (PS_SUGID | PS_SUGIDEXEC | PS_PLEDGE | PS_EXECPLEDGE | PS_WXNEEDED);
pr ps_flags = parent ps_flags & (0x10 | 0x20 | 0x100000 | 0x400000 | 0x200000)
if (vnode of controlling terminal != NULL)
        prps_flags |= parentps_flags & PS_CONTROLT;

process_new() continued…

    /*
     * Duplicate sub-structures as needed.
     * Increase reference counts on shared objects.
     */
    if (flags & FORK_SHAREFILES)
        pr->ps_fd = fdshare(parent);
    else
        pr->ps_fd = fdcopy(parent);
    if (flags & FORK_SIGHAND)
        pr->ps_sigacts = sigactsshare(parent);
    else
        pr->ps_sigacts = sigactsinit(parent);
    if (flags & FORK_SHAREVM)
        pr->ps_vmspace = uvmspace_share(parent);
    else
        pr->ps_vmspace = uvmspace_fork(parent);

    if (parent->ps_flags & PS_PROFIL)
        startprofclock(pr);
    if (flags & FORK_PTRACE)
        pr->ps_flags |= parent->ps_flags & PS_TRACED;
    if (flags & FORK_NOZOMBIE)
        pr->ps_flags |= PS_NOZOMBIE;
    if (flags & FORK_SYSTEM)
        pr->ps_flags |= PS_SYSTEM;

    /* mark as embryo to protect against others */
    pr->ps_flags |= PS_EMBRYO;

    /* Force visibility of all of the above changes */
    membar_producer();

    /* it's sufficiently inited to be globally visible */
    LIST_INSERT_HEAD(&allprocess, pr, ps_list);

    return pr;
}

if-else conditions explained:

- if child_able_to_share_file_descriptor_table_with_parent:
         pr->ps_fd = fdshare(parent)      /* share the table */
  else
         pr->ps_fd = fdcopy(parent)       /* copy the table */
- if child_able_to_share_the_parent's_signal_actions:
         pr->ps_sigacts = sigactsshare(parent) /* share */
  else
         pr->ps_sigacts = sigactsinit(parent)  /* copy */
- if child_able_to_share_the_parent's addr space:
         pr->ps_vmspace = uvmspace_share(parent)
  else
         pr->ps_vmspace = uvmspace_fork(parent)
- if process_able_to_start_profiling:
         smartprofclock(pr);    /* start profiling on a process */
- if check_child_able_to_start_ptracing:
         pr->ps_flags |= parent->ps_flags & PS_PTRACED
- if check_no_signal_or_zombie_at_exit:
         pr->ps_flags |= PS_NOZOMBIE /*No signal or zombie at exit */
- if check_signals_stat_swaping:
         pr->ps_flags |= PS_SYSTEM

update the pr→ps_flags with PS_EMBRYO by ORing it, that is, pr→ps_flags |= PS_EMBRYO /* New process, not yet fledged */

membar_producer() → Force visibility of all of the above changes. - All stores preceding the memory barrier will reach global visibility before any stores after the memory barrier reach global visibility.

In short, as per my understanding based on the discussion with OpenBSD community, it means forcefully make visible changes globally.

Insert the new element, that is, pr at the head of the list. Here, head is allprocess and returns pr

fork1() code continued…

...
...
...
    p->p_fd     = pr->ps_fd;
    p->p_vmspace    = pr->ps_vmspace;
    if (pr->ps_flags & PS_SYSTEM)
        atomic_setbits_int(&p->p_flag, P_SYSTEM);

    if (flags & FORK_PPWAIT) {
        atomic_setbits_int(&pr->ps_flags, PS_PPWAIT);
        atomic_setbits_int(&curpr->ps_flags, PS_ISPWAIT);
    }

#ifdef KTRACE
    /*
     * Copy traceflag and tracefile if enabled.
     * If not inherited, these were zeroed above.
     */
    if (curpr->ps_traceflag & KTRFAC_INHERIT)
        ktrsettrace(pr, curpr->ps_traceflag, curpr->ps_tracevp,
            curpr->ps_tracecred);
#endif

    /*
     * Finish creating the child thread.  cpu_fork() will copy
     * and update the pcb and make the child ready to run.  If
     * this is a normal user fork, the child will exit directly
     * to user mode via child_return() on its first time slice
     * and will not return here.  If this is a kernel thread,
     * the specified entry point will be executed.
     */
    cpu_fork(curp, p, NULL, NULL, func, arg ? arg : p);

    vm = pr->ps_vmspace;

    if (flags & FORK_FORK) {
        forkstat.cntfork++;
        forkstat.sizfork += vm->vm_dsize + vm->vm_ssize;
    } else if (flags & FORK_VFORK) {
        forkstat.cntvfork++;
        forkstat.sizvfork += vm->vm_dsize + vm->vm_ssize;
    } else {
        forkstat.cntkthread++;
    }

    if (pr->ps_flags & PS_TRACED && flags & FORK_FORK)
        newptstat = malloc(sizeof(*newptstat), M_SUBPROC, M_WAITOK);

    p->p_tid = alloctid();

    LIST_INSERT_HEAD(&allproc, p, p_list);
    LIST_INSERT_HEAD(TIDHASH(p->p_tid), p, p_hash);
    LIST_INSERT_HEAD(PIDHASH(pr->ps_pid), pr, ps_hash);
    LIST_INSERT_AFTER(curpr, pr, ps_pglist);
    LIST_INSERT_HEAD(&curpr->ps_children, pr, ps_sibling);

Substructures p→p_fd and p→p_vmspace directly copied values of pr→ps_fd and pr→ps_vmspace

/* substructures: */
struct filedesc *p_fd;      /* copy of p_p->ps_fd */
struct vmspace *p_vmspace;  /* copy of p_p->ps_vmspace */             

if (process_has_no_signals_stats_or_swapping) then atomically set bits using atomic_setbits_int(pr →ps_flags, PS_SYSTEM);

if (child_is_suspending_the_parent_process_until_the_child_is terminated (by calling _exit(2) or abnormally), or makes a call to execve(2)) then atomically set bits, atomic_setbits_int(pr →ps_flags, PS_PPWAIT); atomic_setbits_int(pr →ps_flags, PS_ISPWAIT);

#ifdef KTRACE
/* Some KTRACE related things */
#endif
cpu_fork(curp, p, NULL, NULL, func, arg ?arg: p)

/*
 * Finish creating the child thread. cpu_fork() will copy
 * and update the pcb and make the child ready to run. The
 * child will exit directly to user mode via child_return()
 * on its first time slice and will not return here.
 */

to create or update PCB and make child ready to RUN.

Address space vm = pr→ps_vmspace

if (call is done by fork syscall); then
    increment the number of fork() system calls.
    update the vm_pages affected by fork() syscall with addition of data page and stack page. 
else if (call is done by vfork() syscall); then
    do as same as if it was fork syscall but for vfork system call. (see above if {for fork})
else
    increment the number of kernel threads created.

If (process is being traced && created by fork system call);then
{
    The malloc() function allocates the uninitialized memory in the kernel address space for an object whose size is specified by size, that is, here, sizeof(*newptstat). And, struct ptrace_state *newptstat
}

allocate thread ID, that is, p→p_tid = alloctid();

/* Find an unused tid */
pid_t
alloctid(void)
{
    pid_t tid;

    do {
        /* (0 .. TID_MASK+1] */
        tid = 1 + (arc4random() & TID_MASK);
    } while (tfind(tid) != NULL);

    return (tid);
}

calling arc4random() directly and using tfind function for finding the thread ID by number

  • inserts the new element p at the head of the allprocess list
  • insert the new element p at the head of the thread hash list
  • insert the new element pr at the head of the process hash list
  • insert the new element pr after the curpr element
  • insert the new element pr at the head of the children process list

fork1() continued…


    if (pr->ps_flags & PS_TRACED) {
        pr->ps_oppid = curpr->ps_pid;
        if (pr->ps_pptr != curpr->ps_pptr)
            proc_reparent(pr, curpr->ps_pptr);

        /*
         * Set ptrace status.
         */
        if (newptstat != NULL) {
            pr->ps_ptstat = newptstat;
            newptstat = NULL;
            curpr->ps_ptstat->pe_report_event = PTRACE_FORK;
            pr->ps_ptstat->pe_report_event = PTRACE_FORK;
            curpr->ps_ptstat->pe_other_pid = pr->ps_pid;
            pr->ps_ptstat->pe_other_pid = curpr->ps_pid;
        }
    }

    /*
     * For new processes, set accounting bits and mark as complete.
     */
    getnanotime(&pr->ps_start);
    pr->ps_acflag = AFORK;
    atomic_clearbits_int(&pr->ps_flags, PS_EMBRYO);

    if ((flags & FORK_IDLE) == 0)
        fork_thread_start(p, curp, flags);
    else
        p->p_cpu = arg;

    free(newptstat, M_SUBPROC, sizeof(*newptstat));

    /*
     * Notify any interested parties about the new process.
     */
    KNOTE(&curpr->ps_klist, NOTE_FORK | pr->ps_pid);

    /*
     * Update stats now that we know the fork was successful.
     */
    uvmexp.forks++;
    if (flags & FORK_PPWAIT)
        uvmexp.forks_ppwait++;
    if (flags & FORK_SHAREVM)
        uvmexp.forks_sharevm++;

    /*
     * Pass a pointer to the new process to the caller.
     */
    if (rnewprocp != NULL)
        *rnewprocp = p;
If (isProcessPTRACED())
{
    then save the parent process id during ptracing, that is, 
    prps_oppid = curprps_pid
    If (pointer to parent process_of_child != pointer to parent process_of_current_process)
    {
        proc_reparent(pr, curprps_pptr); /* Make current process the new parent of process child, that is, pr*/
    }
}

Now, check whether newptstat contains some address, in our case, newptstat contains a kernel virtual address returned by malloc(9)

If above condition is true, that is, newptstat != NULL . Then, set the ptrace status:

  • Set newptstat point to the ptrace state structure then make the newptstat point to NULL
  • Update the ptrace status to the curpr process and also the pr process.
curpr->ps_ptstat->pe_report_event = PTRACE_FORK;
pr->ps_ptstat->pe_report_event = PTRACE_FORK;
curpr->ps_ptstat->pe_other_pid = pr->ps_pid;
pr->ps_ptstat->pe_other_pid = curpr->ps_pid;

for the new process set accounting bits and mark it as complete.

  • get the nano time to start the process.
  • Set accounting flags to AFORK which means forked but not execed.
  • atomically clear the bits.
  • Then, check for the new child is in the IDLE state or not, if yes then make it runnable and add it to the run queue by fork_thread_start()
  • If it is not in the IDLE state then put arg to the current CPU, running on
  • Freeing the memory or kernel virtual address that is allocated by malloc for newptstat through free

Notify any interested parties about the new process via KNOTE

Now, update the stats counter for successfully forked

        uvmexp.forks++; /* -->For forks */ 

        if (flags & FORK_PPWAIT)
                uvmexp.forks_ppwait++; /* --> counter for forks where parent waits */
        if (flags & FORK_SHAREVM)
                uvmexp.forks_sharevm++; /* --> counter for forks where vmspace is shared */

Now, pass pointer of the new process to the caller.

if (rnewprocp != NULL)
        *rnewprocp = p;
    /*
     * Preserve synchronization semantics of vfork.  If waiting for
     * child to exec or exit, set PS_PPWAIT on child and PS_ISPWAIT
     * on ourselves, and sleep on our process for the latter flag
     * to go away.
     * XXX Need to stop other rthreads in the parent
     */
    if (flags & FORK_PPWAIT)
        while (curpr->ps_flags & PS_ISPWAIT)
            tsleep(curpr, PWAIT, "ppwait", 0);

    /*
     * If we're tracing the child, alert the parent too.
     */
    if ((flags & FORK_PTRACE) && (curpr->ps_flags & PS_TRACED))
        psignal(curp, SIGTRAP);

    /*
     * Return child pid to parent process
     */
    if (retval != NULL) {
        retval[0] = pr->ps_pid;
        retval[1] = 0;
    }
    return (0);
}

setting the PPWAIT on child and the PS_ISPWAIT on ourselves, that is, the parent and then go to the sleep on our process via tsleep Check, If the child is started with tracing enables && the current process is being traced then alert the parent by using SIGTRAP signal Now, return the child pid to the parent process then we can see in the debugger that after the fork1, it jumps to sys/arch/amd64/amd64/trap.c file for system call handling and for the setting frame.

Some of the machine independent (MI) functions defined in sys/sys/syscall_mi.h file, like, mi_syscall(), mi_syscall_return() and mi_child_return().

after handling the system calls from “trap.c”, control pass to the sys_execve() system call

References:

  • OpenBSD Source Codes
  • OpenBSD kernel Internals — The Hitchhiker’s Guide
  • OpenBSD manual pages
  • BSD Virtual Memory
  • NetBSD manual pages
  • FreeBSD manual pages
  • Understanding The Linux Kernel
  • Linux Kernel Development - Robert Love
  • Google :)

Finally!! If something is missing or not correct, please feel free to update.

Happy Kernel Hacking