Shell Basics: Pipelines and Built in Commands
2024-10-28
This blog post is a summary and reflection based on my practice implementing a shell in Rust following a tutorial.
Introduction
A shell is essentially an interface that facilitates interaction between the user and the operating system. When you input a command in the terminal, the shell parses it, translates it into commands or system calls that the operating system can understand, and finally interacts with the OS kernel through system call interfaces.
When implementing a shell, the core task is to process the user’s input and handle each part accordingly.
When a user inputs a command in the terminal, for example:
|
|
We need to parse it into:
- Command:
rm - Option:
-rf - Argument:
/path/to/directory
Each of these parts needs to be handled separately to execute the user’s intent correctly.
Another example:
|
|
This command involves a pipeline operation and should be parsed into two subcommands:
-
Subcommand 1:
cat access.log- Command:
cat - Argument:
access.log
- Command:
-
Subcommand 2:
grep "404"- Command:
grep - Argument:
"404"
- Command:
Here, the output of the first subcommand serves as the input to the second subcommand.
Implementation in Rust
In Rust, you can split the user’s input command string by the pipe symbol | into multiple subcommands. Then, split each subcommand by spaces into the command and its arguments. Here’s a sample code:
|
|
In this structure:
commandsis an iterator ofsubcommandssplit by the pipe symbol.subcommandis the currentsubcommandstring being processed.actionis the command part of thesubcommand.argsare the arguments of thesubcommand.
Pipelines
Concept of Pipelines
In a shell, you can use the pipe symbol | to pass the output of one command as the input to the next command. For example:
|
|
The execution process of this command is:
lslists all files in the current directory.- The pipe
|passes the output oflsto thegrepcommand. grep "Cargo"filters lines containing “Cargo” from the input and outputs the result.
How Pipelines Work
To implement pipeline functionality in a shell, you need to control the standard input (stdin) and standard output (stdout) of child processes. The specific steps are:
- Create the first child process and set its
stdoutto a pipe so that its output can be passed to the next process. - For each subsequent child process:
- Set its
stdinto thestdoutof the previous process. - If there is a next process, set its
stdoutto a pipe.
- Set its
Implementation in Rust
Using Rust’s standard library Command and Stdio, we can conveniently implement pipeline operations. Below is an example demonstrating how to connect two commands in Rust:
|
|
Key Implementation Points
-
Configure the input and output of child processes:
- Use
stdout(Stdio::piped())to redirect the standard output of a child process to a pipe. - For child processes that need to read input from the previous command, use
stdin(Stdio::from(previous_stdout)).
- Use
-
Manage the lifecycle of child processes:
- Use
.spawn()to start a child process. - For child processes that need to capture output, use
.wait_with_output()to wait for the process to finish and get the output.
- Use
-
Error Handling:
- Use
expectorif let Err(e) = ...at each step to handle potential errors and ensure program robustness.
- Use
Handling Built-in Commands
Why Built-in Commands Are Needed
Certain commands (like cd) need to modify the state of the shell process itself. For example, the cd command changes the current working directory. If cd is implemented as an external command (child process), it can only change the working directory of the child process and will not affect the parent process (the shell).
Therefore, these built-in commands need to be implemented directly within the shell to modify the shell process’s state.
Code Implementation Example
When processing user input commands, you can use a match statement to handle built-in commands specially. For example:
|
|
In this example:
env::set_current_diris used to change the current working directory.- If the change fails, an error message is output.
- After executing the built-in command,
previous_commandis set toNone, indicating that there’s no need to handle pipelines.