Difference between Process and Thread

Processes and threads are both units of execution managed by the operating system, and they are easy to mix up. The short version: a process is a running instance of a program with its own memory, while a thread is a lighter-weight unit of execution that lives inside a process and shares that memory with its siblings. This post looks at each in turn, then summarizes how the two differ.

Process

What Is a Process?

A process is an instance of a computer program that is being executed. It holds the program code together with its current activity. Depending on the operating system (OS), a single process may be made up of multiple threads of execution that run instructions concurrently.

Process States

An operating system kernel that supports multitasking needs its processes to carry a state. The names of these states are not standardized across systems, but their meanings are broadly similar.

Orphan and Zombie Processes

Orphan Process

An orphan process is a process whose parent has finished or terminated, though it remains running itself.

In a Unix-like operating system, any orphaned process is immediately adopted by the special init system process. This operation, called re-parenting, happens automatically. Even though the process now technically has init as its parent, it is still called an orphan because the process that originally created it no longer exists.

A process can be orphaned unintentionally, such as when the parent terminates or crashes. The process group mechanism in most Unix-like systems helps guard against accidental orphaning: in coordination with the user’s shell, it tries to terminate all child processes with the SIGHUP signal rather than letting them keep running as orphans.

Zombie Process

On Unix and Unix-like operating systems, a zombie process (or defunct process) is a process that has completed execution (via the exit system call) but still has an entry in the process table; it is a process in the “terminated” state. This occurs for child processes, where the entry is kept so that the parent can read the child’s exit status. Once that status is read via the wait system call, the zombie’s entry is removed from the process table and it is said to be “reaped”. A child process always becomes a zombie first before being removed from the table. Under normal operation, zombies are waited on by their parent and reaped almost immediately; processes that stay zombies for a long time usually indicate a bug and a resource leak.

The term derives from the common idea of a zombie, an undead person: the child process has “died” but has not yet been “reaped”. Unlike normal processes, the kill command has no effect on a zombie.

Inter-Process Communication (IPC)

Inter-process communication (IPC) is a set of programming interfaces that let a programmer coordinate activities among different processes that run concurrently in an operating system. This allows a program to handle many user requests at the same time. Since even a single user request may spawn multiple processes on the user’s behalf, those processes need to communicate, and the IPC interfaces make that possible. Each IPC method has its own advantages and limitations, so it is not unusual for a single program to use several of them.

IPC methods include pipes and named pipes, message queues, semaphores, shared memory, and sockets.

Thread

What Is a Thread?

A thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler (typically part of an operating system). The implementation of threads and processes differs from one OS to another, but in most cases a thread is a component of a process. Multiple threads can exist within the same process and share resources such as memory. In particular, the threads of a process share its instructions (the code) and its context (the values its variables reference at any given moment).

Multithreading

Multithreading is a widespread programming and execution model that allows multiple threads to exist within the context of a single process. These threads share the process’s resources but execute independently, giving developers a useful abstraction of concurrent execution. Multithreading can also be applied to a single process to enable parallel execution on a multiprocessing system.

On a single processor, multithreading is generally implemented by time-division multiplexing (as in multitasking): the CPU switches between different threads. This context switching usually happens often enough that the user perceives the threads as running at the same time. On a multiprocessor or multi-core system, threads can be truly concurrent, with each core executing a separate thread simultaneously. Note that the hardware threads the OS uses to implement multiprocessing are distinct from the software threads described above; software threads are a pure software construct, and the CPU has no notion of their existence.

Drawbacks of Threads

  • Synchronization: Because threads share the same address space, the programmer must be careful to avoid race conditions and other non-intuitive behavior. To manipulate data correctly, threads often need to coordinate in time so that operations happen in the right order, and they may require mutually exclusive operations (often implemented with semaphores) to prevent shared data from being modified and read simultaneously. Careless use of such primitives can lead to deadlocks.
  • A crashing thread takes down the process: An illegal operation performed by one thread crashes the entire process, so a single misbehaving thread can disrupt every other thread in the application.

User-Level vs. Kernel-Level Threads

There are two distinct models of thread control: user-level threads and kernel-level threads. The thread library that implements user-level threads typically runs on top of the system in user mode, so these threads are invisible to the operating system. They have extremely low overhead and can achieve high performance for computation. However, a blocking system call such as read() blocks the entire process, and scheduling controlled by the thread runtime may let some threads monopolize the CPU and starve others. Access to multiple processors is also not guaranteed, since the OS is unaware that these threads exist.

Kernel-level threads, on the other hand, do guarantee access to multiple processors, but their computing performance is lower than user-level threads because of the added load on the system. Synchronizing and sharing resources among kernel-level threads is still cheaper than a multi-process model, though more expensive than with user-level threads. Many thread libraries today are implemented as a hybrid model that draws advantages from both, and the key design consideration is how to minimize system overhead while still providing access to multiple processors.

Process vs. Thread

AspectProcessThread
IndependenceTypically independentExists as a subset of a process
StateCarries considerably more state informationShares process state, memory, and other resources with sibling threads
Address spaceHas its own separate address spaceShares the process’s address space
CommunicationInteracts only through OS-provided IPC mechanismsCommunicates directly via shared memory
Context switchingSlower, since more state must be saved and restoredFaster between threads in the same process

References