Questions About Chapter 03. Processes

schuam · December 2020

Hi everyone,

I'm new here. My name is Andreas. I've actually been using Linux on my personal laptop for a quite some years now, but now I decided that it was time to dig a little deeper into the operating system and its inner workings, and this course seemed to be a good way to gain some knowledge.

Anyways, I read through chapter 03 "Processes" and three questions came up:

What exactly is the difference between a thread/task and a process?
The section "Programs, Processes, and Threads" starts out by defining what a program is, what a process is, and then talks about tasks/threads without really defining these two terms. In section "Creating Processes" one can read "(...) in Linux, there really is not much difference on a technical level between creating a full process or just a new thread, as each mechanism takes about the same time and uses roughly the same amount of resources." I still down really understand the difference.
When a process creates a child process, does that always happen via the an exec system call?
In section "Creating Processes in a Command Shell" the first sentence of the third bullet point reads as: "The command is loaded onto the child process's space via the exec system call." Doesn't that mean that the parent process always terminates? Or did I understand the section before ("Creating Processes") wrong? I thought when fork is used the parent process keeps running and when exec is used the parent process terminates.
When a program uses shared libraries, a linker has to link them into the program at the start of the program, correct? Does that mean that the procedure to start a program that uses shared libraries takes longer compared to a program that uses static libraries?

I'd appreciate any help in figuring that stuff out.

Take care everyone,
Andreas

coop · December 2020

A process has all sorts of resources (memory, etc) dedicated to it. When it has multiple threads they share those resources, rather than have each their own. In other operating systems (e.g., Windows), there are a lot of steps to create a new process, but only a few to create a new thread. In linux, there are really very few differences between the making a new process (say with one thread) and adding a thread to an existing process, mostly because Linux has been extensively optimized to make a process or thread quickly and efficiently with fewer operations. Also scheduling is done at the thread level, not process level, which makes a big difference in run time operations.

When a new process is created, either through a fork or clone call (both using the same kernel code) the parent process may either continue or terminate. from a code view, if fork() is called, the parent continues. If exec..() is called, the parent terminates. There is not one path here.

As far as the shared libraries go, it is more efficient to share libraries as the code needed will already be resident in memory when the process starts. That is why shared libraries exist. (If you are ancient you remember when we used to talk about TSR programs -- terminate and stay resident -- which we don't anymore because we share libraries, or as is sometimes say use "dll"s (dynamic load libraries, which is language Windows also uses)

schuam · December 2020

Thank you very much for your fast response @coop!

A process has all sorts of resources (memory, etc) dedicated to it. When it has multiple threads they share those resources, rather than have each their own. In other operating systems (e.g., Windows), there are a lot of steps to create a new process, but only a few to create a new thread. In linux, there are really very few differences between the making a new process (say with one thread) and adding a thread to an existing process, mostly because Linux has been extensively optimized to make a process or thread quickly and efficiently with fewer operations. Also scheduling is done at the thread level, not process level, which makes a big difference in run time operations.

All right, all threads of a process share the resources of the process they belong to. Got it. If scheduling is done at the thread level, does that mean, that any process has to have at least one thread?

When a new process is created, either through a fork or clone call (both using the same kernel code) the parent process may either continue or terminate. from a code view, if fork() is called, the parent continues. If exec..() is called, the parent terminates. There is not one path here.

That's what I thought: fork() -> the parent continues; exec() -> the parent terminates. But than the sentence in the chapter "The command is loaded onto the child process's space via the exec system call." is not quite correct or misleading. Or is "exec system call" here a kind of synonym for either fork() or exec().

As far as the shared libraries go, it is more efficient to share libraries as the code needed will already be resident in memory when the process starts. That is why shared libraries exist. (If you are ancient you remember when we used to talk about TSR programs -- terminate and stay resident -- which we don't anymore because we share libraries, or as is sometimes say use "dll"s (dynamic load libraries, which is language Windows also uses)

Well, I didn't consider that the code of static libraries has to be loaded into memory every time a program is started. That's a good point. But do actually all shared libraries are in always in memory?

coop · December 2020

a process always has to have at least one thread, otherwise what would it do?

A process will look to see if any shared libraries it needs are loaded, and if not it will load them. Note that almost every process/program uses some of the same shared libraries, such as libc, so there is virtually always a gain. Static libraries are usually only used in some startup programs and these days in third party vendor applications and their use there is subject to holy wars i won't get into.

To see what shared libraries you use just do something with the ldd utility as in:

c8:/usr/src>ldd /usr/bin/emacs
    linux-vdso.so.1 (0x00007ffd15ddc000)
    libtiff.so.5 => /lib64/libtiff.so.5 (0x00007f8ca3690000)
    libjpeg.so.62 => /lib64/libjpeg.so.62 (0x00007f8ca3427000)
    libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f8ca31f2000)
    libgif.so.7 => /lib64/libgif.so.7 (0x00007f8ca2fe8000)
    libXpm.so.4 => /lib64/libXpm.so.4 (0x00007f8ca2dd5000)
       .....

which generates 144 lines of output.

schuam · December 2020

@coop said:
a process always has to have at least one thread, otherwise what would it do?

Well from "A process is an executing program and associated resources (...)" (see chapter 03 section Programs, Processes, and Threads), I figured, that a process does all the work, but now it occurs to me, that a process actually delegates the work to a thread (or multiple threads).

I think, I understand now. Thanks @coop!

cjclm7 · December 2020

I have also a question regarding this chapter (03 Processes) so i will use this discussion instead of creating a new discussion.

On chapter 03 Processes, Page 7, we can read:

"(...)In kernel (system) mode, the CPU has full access to all hardware on the system, including peripherals, memory, disks, etc.(...)"

is it correct? Shouldn't be the process instead of CPU?

"(...)In kernel (system) mode, the process has full access to all hardware on the system, including peripherals, memory, disks, etc.(...)"_

coop · December 2020

No, the process definitely does not have access directly, it is the CPU. It is not the best English perhaps, but it is correct. The process itself can do nothing without going into kernel-mode which means it is the kernel that does it.

Questions About Chapter 03. Processes

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)