-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
477 lines (373 loc) · 19.4 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
RSH
1. Overview
RSH is meant to be as close to bash syntax as possible. However, since I am
human, attented university, and am not a CS major, there is only so much I
could do. RSH consists of several main parts that work together to create what
someone might call a 'shell'. The parts are as follows: a terminal interface, a
lexxer -- generated by `flex', a parser, a command interpretter, and finally
an exec() syscall interface. Each of these sections contribute to the flow of
command -> action.
There are also some other nice features that will be discussed such as the
symbol table and the environment table. Piping and IO redirection is also
included in the shell, though it is more limited than what a real shell would
be capable of mainly due to the parser implementation.
2. The Terminal Interface.
In order to have some of the most basic of shell behaviors, a terminal
interface is required. In general, a programmer would not reinvent the wheel
and use a library like gnu-term-readline, which supports history, sophisticated
command line editing, basic tab completion, etc. However, as wonderful as
gnu-term-readline is, it would not fulfill the project requirements, so instead
I wrote my own simplified version of gnu-term-readline. The results are in the
source file src/readterm.c. Essentially, one must read each byte of the input
stream and process said byte. If an escape character is detected then a special
handler for the following escape sequence must be called. This is implemented
in the function _rsh_handle_escape_seq(). This function deals with all
understood escape sequences. It has sub handlers for particular escape
sequences such as the up arrow key or the DEL key. These sequences are handled
by: _rsh_do_history_completion(), _rsh_do_move_cursor(), _rsh_do_delete(). As
of now the following non printing characters or sequences can be processed:
DEL, the arrow keys, backspace (ASCII 0x7f).
Terminal history is handled in this section of the code as well. All of the
incoming data is stored into buffers, These buffers are implemented via
struct rsh_buff {
/* The underlying buffer itself. */
char *buf;
/* The offset to write the next character to. */
int offset;
/* The number of characters in the buffer. */
int len;
/* The size of the buffer. */
int size;
/* Does this buffer have stuff? */
int used;
};
This data type is used to effectively keep track of what has been typed, which
allows us to redisplay the terminal data with a cursor location so editing can
be performed in arbitrary places in the data (also implies left/right arrow
keys, del, backspace). The history itself is a circular buffer of just the text
in a buffer. The buffer itself can be regenerated from the history so that the
history can be edited once placed on the command line vie the up/down arrow
keys. However, the history itself is not changed, just the temporary buffer
holding a copy of the history item.
History itself is implemented as a circular buffer of pointers to pointers
to char:
char *history[_HIST_BUFFER_SIZE];
This allows extremely efficient usage of memory since there is only 1 pointer
worth of overhead: the pointer that delimits the end of the table. If the
history is full, then the end pointer is simply incremented and the first entry
put into the history will be replaced. In that regard the history is like a
FIFO queue. The strings themselves are stored on the heap via malloc() via
strdup().
The readline functionality must be able to interface with the lexxer
generated by `flex'. In order to do this, the `flex' input macro
YY_INPUT(buf, result, max_size)
had to be overwridden. This is done by defining a macro in the parser.lex
preamble code. The function that this macro uses is defined in src/readterm.c
as
rsh_readbuf(char *buffer, size_t max)
This function must translate between line reading (which is what we do on the
terminal) and buffer reading which is what `flex' does. Flex will attempt to
read at most max characters. However, if for whatever reason, the user has
typed in more than 'max' characters, read_buf must be able to handle this.
read_buf() must also be able to handle reading from a file; as such read_buf()
really just determines what actual function to call based on whether the shell
is interactive or not. For more information regarding this code, peruse the
lex.yy.c (this is the generated lexxer) and the parser.lex files.
3. The Parser
RSH's parser really sucks. I tried to implement something via yacc but
realized that would not work since I didn't really understand how to write a
proper grammer. As such instead of implementing my own shift reduce parser or
the like, I write a really simple, bad, and at least mostly functional parser.
Since it is completely adhoc'ish code, it does not support all the cool things
that a real shell like ZSH supports. But it gets the job done. That code is all
int parser.c.
4. Command Interpretter
The command interpretter is a fairly simple block of code. It reads through
each passed command and figures out what to do: varible declarations, commands,
background or foreground, builtins, etc. Once it knows what to do, it asks the
exec block of code to actually do it. The command interpretter is in command.c.
5. Exec()...
The first thing I want to say is this code was not easy to debug. Nor was
the documentation all that good. The GNU libc documentation (and example shell)
was an invaluable reasource. In any event, the basic idea is this, rsh needs to
make sure it is in its own process group, this has to do with signal handling.
When a signal is passed to a child process, rsh does not want to get that
signal as well (unless we are running as a script, but we will ge to that in a
bit). In general, rsh really only wants to get a few signals from the children:
SIGCHLD primarily. This allows the wait() syscall and friends to get
notifications about child state changes. This is crucial for the shell.
RSH keeps track of all child processes via the following data structure:
struct rsh_process_group {
/* List related stuff. */
struct rsh_process **pgroup;
int max_procs;
/* The standard I/O streams that point to the controlling terminal. Or a
* pipe if configured that way, but these should never not be 0, 1, & 2.*/
int stdin;
int stdout;
int stderr;
/* The pid of the shell. Why not? */
pid_t pid;
/* The process group ID for the shell's process group. */
gid_t pgid;
};
This struct has a list of processes. These processes are tracked via the
following data type:
struct rsh_process {
/* Process ID and process group ID. */
pid_t pid;
gid_t pgid;
/* Process' standard I/O streams. */
int stdin;
int stdout;
int stderr;
/* Zero if this process is in the foreground. */
int background;
/* Non-zero if the process is actually running. */
int running;
/* The first 128 chars (if there are that many) of the commands actual
* file name (i.e: /usr/bin/ping). */
char name[128];
/* Pipe related stuff. */
int pipe[2];
int pipe_used;
int pipe_lane; /* Which (stdin, stdout, stderr) is being piped. */
/* This is cosmetic; if set, display a message if this process terminates. */
int display_exit;
/* Terminal settings. */
struct termios term;
/* These will probably not remain valid forever, but as long as they last
* up to the fork() call, they will be good for the child process to use.
*/
char *command;
char **argv;
int argc;
};
Each process that gets spawned gets its own struct rsh_process allocated. This
allows rsh to keep track of background processes. The builtin command `dproc'
displays the list of currently running/paused processes.
By using the proces list, jobs can be controlled fairly easily. Each
spawned process is put into its own process group. This isn't what should
happen, each group of process making up a job should be given the same process
group, but as of now, that isn't implemented. In any event, the exec.c code
implements typical job control. For example, given a program called run that
just sits in a for (;;); loop:
[rsh]$ /home/alex/tmp/run
CTRL-z
Process (4786) stopped [sig=20].
[rsh]$ bg
[rsh]$ dproc
pid 4767 (running): rsh
pid 4786 (running): /home/alex/tmp/run
[rsh]$ killall -SIGSTOP run
+ stopped /home/alex/tmp/run
[rsh]$ dproc
pid 4767 (running): rsh
pid 4786 (stopped): /home/alex/tmp/run
[rsh]$ fg
Process (4786) terminated by signal.
[rsh]$ dproc
pid 4767 (running): rsh
In any event, as can be seen, the process run (pid=4786) was started normally,
got a SIGTSTP (Terminal stop) from the CTRL-z, got backgrounded, then got
stopped via a killall -SIGSTOP, was foregrounded again, and finally killed with
a CTRL-c. The dproc command was used to display the contents of the rsh shells
process list. However, to really see this in action, open up another shell
even rsh would work for this) and run top. Now you will be able to see the
process use the cpu when its running.
Outside of the shell signals were returned to normal, that is to say CTRL-c
does not print the shell history if the shell is not the foreground. This also
allows us to kill a process that we don't like without having to open another
shell.
6. The Symbol Table and Environment.
Like all shells, rsh supports the defining of scalar variables. However
there are some limitations that should be noted. Here is a basic example of a
scalar definition:
[rsh]$ PROMPT='[rsh]$ '
This sets the $PROMPT variable to '[rsh]$ '. $PROMPT is used by rsh to generate
the prompt. In any event, this adds a symbol to the rsh symbol table. This
symbol can be used later on like so:
[rsh]$ echo $PROMPT
[rsh]$
[rsh]$
Again this is just typical shell behavior. However, some more complex examples
will not work the way you expect them to. For instance:
[rsh]$ echo blah=10
[rsh]$
This occurs because variable definitions may occur anywhere in the command
line. In the ZSH shell however, you would see something like this:
[11:35AM alex@australia src]$ echo blah=10
blah=10
[11:38AM alex@australia src]$
The symbol table used by rsh also interfaces with the environment list that
libc maintains. If you try and use a variable that is in the enviroment, the
environment variable will be chosen over the symbol tabel definition. For
instance:
[rsh]$ PATH=$PATH:/my/new/path
[rsh]$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/home/alex/bin:/usr/local/sbin:/usr/sbin:/sbin
:/my/new/path
[rsh]$
This is a rather useful feature, especially since the environment varibles
carry over into sub processes:
[rsh]$ PATH=$PATH:/my/new/path
[rsh]$ env
...
PATH=/usr/local/bin:/usr/bin:/bin:/home/alex/bin:/usr/local/sbin:/usr/sbin:
/sbin:/my/new/path
[rsh]$
There are a few limitations to dealing with variables. Don't try to export a
symbol from the symbol table to the environment at the same time you try and
define it.
[rsh]$ export BLAH="hello world"
This will not work the way you expect it to. Instead:
[rsh]$ BLAH="hello world"
[rsh]$ export BLAH
7. Pipes.
RSH supports piping output from one process to another process. This is
accomplished via the pipe() system call. pipe() generates a pair of file
descriptors: one for reading, the other for writing. The write descriptor is
duped into the first processes stdout (or stderr) and the read descriptor is
duped into the second processes stdin. When the first process writes, the
second may then read that data. This is mostly implemented in exec.c though the
code in command.c also needs to make sure the pipe is actually created. The
biggest difficulty with pipes is making sure all ends get closed the
appropriate number of times. If the write end of the pipe never gets fully
closed (by rsh, first process and second process) then the pipe never
terminates and the shell recieving process hangs while waiting for the stream
to EOF. On my computer here is an example demostrating a 3 stage pipe:
[rsh]$ ps aux | grep alex | grep zsh
alex 2038 0.0 0.0 124136 2652 pts/0 Ss Oct10 0:00 zsh
alex 2073 0.0 0.0 124124 2452 pts/1 Ss Oct10 0:00 zsh
alex 2339 0.0 0.0 124128 3256 pts/7 Ss 16:43 0:00 zsh
alex 2819 0.0 0.0 124268 3384 pts/8 Ss+ 16:50 0:00 zsh
alex 2905 0.0 0.0 124256 3084 pts/2 Ss+ Oct10 0:00 zsh
alex 2940 0.0 0.0 124128 2924 pts/3 Ss Oct10 0:00 zsh
alex 4815 0.0 0.0 124136 3232 pts/9 Ss+ 17:05 0:00 zsh
alex 6582 0.0 0.0 121880 2872 pts/10 Ss 17:26 0:00 zsh
alex 8616 0.0 0.0 124132 3280 pts/4 Ss+ Oct10 0:00 zsh
alex 11339 0.0 0.0 124128 3276 pts/11 Ss 18:15 0:00 zsh
alex 11687 0.0 0.0 124280 3376 pts/12 Ss+ 18:18 0:00 zsh
alex 13374 0.0 0.0 121880 2872 pts/13 Ss 18:30 0:00 zsh
alex 18846 0.0 0.0 124152 3352 pts/6 Ss 19:33 0:00 -zsh
alex 19433 0.0 0.0 123900 3024 pts/14 Ss 19:37 0:00 -zsh
alex 20413 0.0 0.0 102732 760 pts/14 S+ 19:47 0:00 grep zsh
alex 25264 0.0 0.0 124280 3364 pts/5 Ss 07:32 0:00 zsh
[rsh]$
Wow, I have a lot of terminals open. In any event, output from `ps aux' is
piped to `grep alex' which finds all processes that I have run. Then, the
result from the first grep is piped to a second grep to find all occurences of
zsh. This command is simply for illustration purposes, it could be made with
out grep pipes, but that would not be as interesting.
This was probably the most complex thing to implement in the entire shell. Job
control was hard, but this was a nightmare.
8. Random other stuff.
A. Builtins.
There are several built in functions that you can use. These are:
cd
exit
exec
history
fg
bg
dproc
export
Most of these work as they would under a normal shell. The exceptions
are 'exec' and 'dproc'. Exec is not implemented yet at all and, when
implemented, will not be able to do shell file descriptor manipulation.
dproc is used to display processes that the shell is aware of and their
state. This is slightly different functionality than the default
behavior of `ps'.
Built in functions are mostly implemented in src/builtin.c. Each built
in function must match the prototype in defined in the builtin struct:
typedef int (* builtin_func)(int argc, char **argv,
int in, int out, int err);
struct builtin {
char *name;
builtin_func func;
};
Each built in function is then also placed in the
struct builtin builtins[];
array. This facilitates looking up built in functions by checking a
command against the list of built in functions in the builtins[] array.
This can lead to over head if a huge number of built in functions were
to be defined but that doesn't seem like a big enough problem to
warrent making a hashtable of built ins.
B. PROMPT Variable.
Like any shell, the prompt is customizable. The prompt for RSH is not
nearly as customizable as the prompt for a real shell, but I am working
on making as much as I can. Look at src/prompt.c for more information
on how to define a prompt. Simple overview: define it like you would
in bash, but look at src/prompt.c to see what escapes I have actually
implemented and what they escapes are. I tried to keep them close to
bash syntax, but I did make some changes.
C. Solaris Sucks.
Apparently the developers of the Solaris C API decided that it would
be a good idea to have stdin, stderr, and stdout be defined as macros.
These macros exapnd to some array reference which I presume contains
the value of the FILE data structure that you want to use when doing
fprintf(). This makes any code like this a pain in the ass:
struct rsh_process proc;
proc.stdin = 0;
proc.stdout = 1;
proc.stderr = 2;
This leads to compile errors. So yeah, I hate Solaris now.
D. Initialization Files.
If you wish, you can make an RC file for RSH. RSH looks for ~/.rshrc
and sources it. Don't just use your ~/.bashrc file or the like since
bash syntax will make RSH cry. Primarily things like if clauses and
for loops and name globbing will just not work and you will be very
unhappy about the results. On the other hand you can set up changes
to your path variable or maybe set JAVA_HOME or whatever else you want
to do. For the most part you wont need anything since almost everything
of relevance will be handled by the login shell (presumably bash) and
RSH will just get the results in the environment.
E. CTRL-c
CTRL-c when in RSH will display the history. However, child processes
have their signal handlers reset so that CTRL-c works as expected for
an errant child.
F. File redirection.
File redirection works. Well, it works so far as I have tested it.
echo HELLO WORLD > tmp.txt
will do as you expect. Likewise with stderr. Redirection does not work
for higher file descriptors; my lexxer is not smart enough at the
to figure that out. It would be implementable, but that would warrent
a complete reimplementation of the parser, which would be really really
time consuming. Piping as of now has not been implemented though should
actually be pretty easy; just needs the code to be written.
G. Signal handling by RSH
The only 2 signals explicitly handled by RSH are SIGCHLD and SIGINT.
SIGINT is for the assignment requirements. SIGCHLD is used to clean up
background processes that terminate. A common example of this is in
pipes. Only the last process in a pipe is foregrounded, the rest
execute in the background. As such those background processes much be
cleaned up (mostly this is just a deallocation of the processes
struct rsh_process data structure). SIGCHLD solves this problem since
it is delivered whenever a child process changes state. When we get a
SIGCHLD signal, rsh goes through the process list and does a
non-blocking waitpid() to see if a given process has terminated,
stopped, etc (we skip foreground processes here since those are handled
by foreground()). If we detect a change we act accordingly.
RSH does not display every signal it recieves, since this would bog
down the terminal command line. However, if a process exits abnormally
then the cause of that exit is displayed. For instance if a process is
killed by a SIGINT, RSH alerts the user to that.
[rsh]$ ./run_forever
^C
Process (19049) terminated by signal (2)
[rsh]$
Likewise, if a job is paused, RSH will alert the user to this event.
[rsh]$ ./run_forever
^C
Process (19397) stopped [sig=20].
[rsh]$
Also for background processes, a message will be displayed:
[rsh]$ sleep 5 &
[rsh]$
Process (19556) terminated (0)
[rsh]$
Here, the sleep program just sleeps for 5 seconds in the background.
When done, it terminated, RSH recieves a SIGCHLD, searches through the
process list, finds out that `sleep' has terminated, and displayes a
message accordingly. It recreates any buffer that is being edited on
the command line to make sure input stays coherent.