Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on processing a here-document from a command substitution in a here-document #823

Closed
stefanbidi opened this issue Feb 19, 2025 · 10 comments
Labels
bug Something is not working

Comments

@stefanbidi
Copy link

A little background before I go into it... I ran into this issue when trying to troubleshoot an issue with Binutils 2.44 on a x86_64-pc-linux-musl system running ksh93u+m version 1.0.10. A link to the bug report I found in the Binutils mailing list can be found here:

https://sourceware.org/bugzilla/show_bug.cgi?id=32580

Clearly the issue predates Binutils 2.44 by many years as the person who posted that bug report is using Solaris 2.11. Essentially, ksh93 crashes when running a very simple script with a few functions:

TEST_LINE1="Test line 1"
TEST_LINE2="Test line 2"

function1 () {
cat << EOF
	${TEST_LINE1}
EOF
}

function2 () {
cat << EOF
	${TEST_LINE2}
EOF
}

function3 () {
cat << EOF
	$(function1)
EOF
	function2
}

function3

This script produces the right output using bash, dash and oksh, however, when run with ksh93 it seg faults:

$ /usr/local/bin/ksh test
Segmentation fault (core dumped)

Please let me know if I can provide any more information.

@McDutchie McDutchie added the bug Something is not working label Feb 19, 2025
@McDutchie
Copy link

This does appear to be a very system-specific crash. I can only reproduce it on Alpine Linux (musl libc, arm64). I cannot reproduce it on Void Linux (also musl libc on arm64), Linux with glibc, or macOS.

On Alpine, the gdb backtrace (for the current 1.0 branch) is:

#0  memcpy () at src/string/aarch64/memcpy.S:145
#1  0x0000aaaaaabcfd10 in sfwrite (f=0xaaaaaac713c8 <_Stak_data>, buf=0xfffff7eb526b, n=208763)
    at /usr/local/src/ksh/src/lib/libast/sfio/sfwrite.c:130
#2  0x0000aaaaaaae7e34 in lex_advance (iop=0xfffff7fe2840, 
    buff=0xfffff7eb526b <error: Cannot access memory at address 0xfffff7eb526b>, size=208763, context=0xfffff7ff6040)
    at /usr/local/src/ksh/src/cmd/ksh93/sh/lex.c:163
#3  0x0000aaaaaaad0e70 in fcfill () at /usr/local/src/ksh/src/cmd/ksh93/sh/fcin.c:98
#4  0x0000aaaaaaaf16a0 in sh_machere (infile=0xfffff7fe2840, outfile=0xfffff7fe3b00, string=0xfffff7fe2451 "EOF")
    at /usr/local/src/ksh/src/cmd/ksh93/sh/macro.c:319
#5  0x0000aaaaaaadf578 in io_heredoc (iop=0xfffff7fe2400, name=0xfffff7fe2451 "EOF", traceon=0)
    at /usr/local/src/ksh/src/cmd/ksh93/sh/io.c:1632
#6  0x0000aaaaaaade2b0 in sh_redirect (iop=0xfffff7fe2400, flag=0) at /usr/local/src/ksh/src/cmd/ksh93/sh/io.c:1245
#7  0x0000aaaaaab2f088 in sh_ntfork (t=0xfffff7fe23c0, argv=0xfffff7fe2480, jobid=0xffffffffe200, topfd=0)
    at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:3239
#8  0x0000aaaaaab28568 in sh_exec (t=0xfffff7fe23c0, flags=4) at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:1400
#9  0x0000aaaaaab29b28 in sh_exec (t=0xfffff7fe2610, flags=4) at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:1837
#10 0x0000aaaaaab4128c in b_dot_cmd (n=0, argv=0xfffff7fef0e0, context=0xaaaaaac74aa0 <sh+1440>)
    at /usr/local/src/ksh/src/cmd/ksh93/bltins/misc.c:324
#11 0x0000aaaaaab2e418 in sh_funct (np=0xfffff7fe3910, argn=1, argv=0xfffff7fef0e0, envlist=0x0, execflg=5)
    at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:3061
#12 0x0000aaaaaab28214 in sh_exec (t=0xfffff7fef080, flags=5) at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:1315
#13 0x0000aaaaaaab5d5c in exfile (iop=0xfffff7fe5720, fno=11) at /usr/local/src/ksh/src/cmd/ksh93/sh/main.c:590
#14 0x0000aaaaaaab4f50 in sh_main (ac=2, av=0xfffffffffb68, userinit=0x0) at /usr/local/src/ksh/src/cmd/ksh93/sh/main.c:344
#15 0x0000aaaaaaab40e0 in main (argc=2, argv=0xfffffffffb68) at /usr/local/src/ksh/src/cmd/ksh93/sh/pmain.c:41

@stefanbidi
Copy link
Author

I can confirm that this is only any issue on my musl-libc system when SHOPT_SPAWN=1. The test script, as well as Binutils 2.44, ran without a hitch on my custom x86_64-pc-linux-musl system when I set SHOPT_SPAWN=0. I had no problems on x86_64-pc-linux-gnu either way.

@McDutchie
Copy link

Unfortunately, SHOPT_SPAWN is a red herring. When we add /opt/ast/bin: to the front of $PATH, so that ksh's built-in cat command is executed instead of the external command, then the crash occurs regardless. In that case, the backtrace is:

$ PATH=/opt/ast/bin:$PATH gdb --args arch/linux.arm64-64/bin/ksh -x issue823.sh
[...]
(gdb) run
Starting program: /usr/local/src/ksh/arch/linux.arm64-64/bin/ksh -x issue823.sh
+ TEST_LINE1='Test line 1'
+ TEST_LINE2='Test line 2'
+ function3
+ cat
+ 0<< \EOF
+ function1
+ cat
+ 0<< \EOF
	Test line 1
EOF
	
Program received signal SIGSEGV, Segmentation fault.
memcpy () at src/string/aarch64/memcpy.S:145
warning: 145	src/string/aarch64/memcpy.S: No such file or directory
(gdb) bt
#0  memcpy () at src/string/aarch64/memcpy.S:145
#1  0x0000aaaaaabd4d7c in sfwrite (f=0xaaaaaac813d8 <_Stak_data>, buf=0xfffff7eb127b, n=225211)
    at /usr/local/src/ksh/src/lib/libast/sfio/sfwrite.c:130
#2  0x0000aaaaaaae7f28 in lex_advance (iop=0xfffff7fe0600, 
    buff=0xfffff7eb127b <error: Cannot access memory at address 0xfffff7eb127b>, size=225211, context=0xaaaaaac8aae0)
    at /usr/local/src/ksh/src/cmd/ksh93/sh/lex.c:161
#3  0x0000aaaaaaad10e8 in fcfill () at /usr/local/src/ksh/src/cmd/ksh93/sh/fcin.c:98
#4  0x0000aaaaaaaf17a0 in sh_machere (infile=0xfffff7fe0600, outfile=0xfffff7fe3140, string=0xfffff7fe3c51 "EOF")
    at /usr/local/src/ksh/src/cmd/ksh93/sh/macro.c:319
#5  0x0000aaaaaaadf64c in io_heredoc (iop=0xfffff7fe3c00, name=0xfffff7fe3c51 "EOF", traceon=1)
    at /usr/local/src/ksh/src/cmd/ksh93/sh/io.c:1632
#6  0x0000aaaaaaade36c in sh_redirect (iop=0xfffff7fe3c00, flag=0) at /usr/local/src/ksh/src/cmd/ksh93/sh/io.c:1245
#7  0x0000aaaaaab27898 in sh_exec (t=0xfffff7fe3bc0, flags=4) at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:1176
#8  0x0000aaaaaab29c58 in sh_exec (t=0xfffff7fe3e20, flags=4) at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:1883
#9  0x0000aaaaaab445bc in b_dot_cmd (n=0, argv=0xfffff7fef0e0, context=0xaaaaaac84ac0 <sh+1456>)
    at /usr/local/src/ksh/src/cmd/ksh93/bltins/misc.c:324
#10 0x0000aaaaaab2e65c in sh_funct (np=0xfffff7fe3220, argn=1, argv=0xfffff7fef0e0, envlist=0x0, execflg=5)
    at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:3114
#11 0x0000aaaaaab2833c in sh_exec (t=0xfffff7fef080, flags=5) at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:1361
#12 0x0000aaaaaaab5f74 in exfile (iop=0xfffff7fe5ee0, fno=11) at /usr/local/src/ksh/src/cmd/ksh93/sh/main.c:591
#13 0x0000aaaaaaab5158 in sh_main (ac=3, av=0xfffffffffb38, userinit=0x0) at /usr/local/src/ksh/src/cmd/ksh93/sh/main.c:344
#14 0x0000aaaaaaab42e0 in main (argc=3, argv=0xfffffffffb38) at /usr/local/src/ksh/src/cmd/ksh93/sh/pmain.c:41

So, the bug must be in here-document processing somewhere.

@McDutchie
Copy link

I tracked down where the buffer pointer becomes corrupted in lex_advance(). It is here:

buff = lp->lexd.first;

The execution of command substitutions involves parsing and lexical analysis at runtime. A command substitution combined with nested execution of here-documents causes lp->lexd.first to be changed, without being restored when the nested execution returns.

The following patch saves and restores it when running a command substitution from a here-document, which appears to fix the bug on the system on which I can reproduce it.

diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 52545a40..043094fd 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -381,8 +381,12 @@ void sh_machere(Sfio_t *infile, Sfio_t *outfile, char *string)
 				break;
 			    }
 			    case S_PAR:
+			    {
+				char	*savefirst = lp->lexd.first;
 				comsubst(mp,NULL,1);
+				lp->lexd.first = savefirst;
 				break;
+			    }
 			    case S_EOF:
 				if((c=fcfill()) > 0)
 					goto again;

@McDutchie McDutchie changed the title Seg Fault when running simple script Segfault on processing a here-document from a command substitution in a here-document Mar 23, 2025
@McDutchie
Copy link

So, the issue occurs when a here-document is processed from a command substitution within a here-document. The ${TEST_LINE} expansion, with braces, is also required to trigger the crash. The following is a more minimal reproducer that also crashes on the same system:

TEST_LINE="Test line"
cat <<EOF
$(
cat <<EOF2
${TEST_LINE}
EOF2
)
EOF

@McDutchie
Copy link

Both here-documents and command substitutions (and arithmetic expressions) involve lexical analysis at runtime. So it is probably safest to save and restore the entire lexer state struct, not just that one first pointer. Patch version 2:

diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 52545a40..8fb91b93 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -381,8 +381,17 @@ void sh_machere(Sfio_t *infile, Sfio_t *outfile, char *string)
 				break;
 			    }
 			    case S_PAR:
+			    {
+				/*
+				 * Execute a command substitution or arithmetic expression from a here-document. As both
+				 * these and here-documents involve lexical analysis at runtime, save and restore the
+				 * lexer state to avoid hard-to-trace invalid pointers down the line in case of nesting.
+				 */
+				Lex_t sav = *lp;
 				comsubst(mp,NULL,1);
+				*lp = sav;
 				break;
+			    }
 			    case S_EOF:
 				if((c=fcfill()) > 0)
 					goto again;

McDutchie added a commit that referenced this issue Mar 23, 2025
Minimal reproducer:

TEST_LINE="Test line"
cat <<EOF
$(
cat <<EOF2
${TEST_LINE}
EOF2
)
EOF

This script crashes on (at least) Alpine Linux with musl libc. I
have not been able to get it to crash on any other system, even
with ASan or valgrind, but still, this indicates a problem.

The relevant part of the gdb(1) stack trace on that system is:

    #0  memcpy () at src/string/aarch64/memcpy.S:145
    #1  0x0000aaaaaabd4d7c in sfwrite (f=0xaaaaaac813d8
        <_Stak_data>, buf=0xfffff7eb127b, n=225211) at
        /usr/local/src/ksh/src/lib/libast/sfio/sfwrite.c:130
    #2  0x0000aaaaaaae7f28 in lex_advance (iop=0xfffff7fe0600,
        buff=0xfffff7eb127b <error: Cannot access memory at address
        0xfffff7eb127b>, size=225211, context=0xaaaaaac8aae0) at
        /usr/local/src/ksh/src/cmd/ksh93/sh/lex.c:161
    #3  0x0000aaaaaaad10e8 in fcfill () at
        /usr/local/src/ksh/src/cmd/ksh93/sh/fcin.c:98
    #4  0x0000aaaaaaaf17a0 in sh_machere (infile=0xfffff7fe0600,
        outfile=0xfffff7fe3140, string=0xfffff7fe3c51 "EOF") at
        /usr/local/src/ksh/src/cmd/ksh93/sh/macro.c:319
    #5  0x0000aaaaaaadf64c in io_heredoc (iop=0xfffff7fe3c00,
        name=0xfffff7fe3c51 "EOF", traceon=1) at
        /usr/local/src/ksh/src/cmd/ksh93/sh/io.c:1632
    #6  0x0000aaaaaaade36c in sh_redirect (iop=0xfffff7fe3c00,
        flag=0) at /usr/local/src/ksh/src/cmd/ksh93/sh/io.c:1245
    #7  0x0000aaaaaab27898 in sh_exec (t=0xfffff7fe3bc0, flags=4)
        at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:1176

So, as we can see in #2, the problem is an invalid pointer. It
becomes invalid in lex_advance() in lex.c on line 152, which is:

	 buff = lp->lexd.first;

So it looks like the lp->lexd.first pointer is somehow invalidated.

Analysis: The execution of both here-documents and command
substitutions involves parsing and lexical analysis at runtime. A
command substitution combined with nested execution of here-
documents causes lp->lexd.first to be changed, without being
restored when the nested execution returns.

So, this suggests that the fix would be to save that pointer before
calling comsubst() from sh_machere() to execute a command
substitution from within a here-document, and restore it after.

But, to avoid other potential or unforeseen problems, it is
probably best to save and restore the entire lexer state struct,
not just that one 'first' pointer.

src/cmd/ksh93/sh/macro.c: sh_machere():
- When executing a command substitution or arithmetic expression
  from a here-document, save and restore the lexer state.

Thanks to @stefanbidi for the bug report.
Resolves: #823
@McDutchie
Copy link

Nope. That last fix causes a different crash. So, force-pushed back out, and bug reopened.

@McDutchie McDutchie reopened this Mar 23, 2025
@McDutchie
Copy link

McDutchie commented Mar 23, 2025

So, not restoring lp->lexd.first causes the crash reported here, but restoring it causes a very similar crash elsewhere, in the modernish regression tests.

New strategy: just reset it. The existing lex.c code handles this.

diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 52545a40..9dc96538 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -382,6 +382,7 @@ void sh_machere(Sfio_t *infile, Sfio_t *outfile, char *string)
 			    }
 			    case S_PAR:
 				comsubst(mp,NULL,1);
+				lp->lexd.first = NULL;
 				break;
 			    case S_EOF:
 				if((c=fcfill()) > 0)

@McDutchie
Copy link

@stefanbidi, I would appreciate it if you could test the third patch, above.

@stefanbidi
Copy link
Author

I can confirm that the latest patch fixes the issue for me. The test case no longer segfaults and Binutils 2.44 now builds with correct linker scripts.

Thank you very much!

McDutchie added a commit that referenced this issue Mar 24, 2025
Minimal reproducer:

TEST_LINE="Test line"
cat <<EOF
$(
cat <<EOF2
${TEST_LINE}
EOF2
)
EOF

This script crashes on (at least) Alpine Linux with musl libc. I
have not been able to get it to crash on any other system, even
with ASan or valgrind, but still, this indicates a problem.

The relevant part of the gdb(1) stack trace on that system is:

    #0  memcpy () at src/string/aarch64/memcpy.S:145
    #1  0x0000aaaaaabd4d7c in sfwrite (f=0xaaaaaac813d8
        <_Stak_data>, buf=0xfffff7eb127b, n=225211) at
        /usr/local/src/ksh/src/lib/libast/sfio/sfwrite.c:130
    #2  0x0000aaaaaaae7f28 in lex_advance (iop=0xfffff7fe0600,
        buff=0xfffff7eb127b <error: Cannot access memory at address
        0xfffff7eb127b>, size=225211, context=0xaaaaaac8aae0) at
        /usr/local/src/ksh/src/cmd/ksh93/sh/lex.c:161
    #3  0x0000aaaaaaad10e8 in fcfill () at
        /usr/local/src/ksh/src/cmd/ksh93/sh/fcin.c:98
    #4  0x0000aaaaaaaf17a0 in sh_machere (infile=0xfffff7fe0600,
        outfile=0xfffff7fe3140, string=0xfffff7fe3c51 "EOF") at
        /usr/local/src/ksh/src/cmd/ksh93/sh/macro.c:319
    #5  0x0000aaaaaaadf64c in io_heredoc (iop=0xfffff7fe3c00,
        name=0xfffff7fe3c51 "EOF", traceon=1) at
        /usr/local/src/ksh/src/cmd/ksh93/sh/io.c:1632
    #6  0x0000aaaaaaade36c in sh_redirect (iop=0xfffff7fe3c00,
        flag=0) at /usr/local/src/ksh/src/cmd/ksh93/sh/io.c:1245
    #7  0x0000aaaaaab27898 in sh_exec (t=0xfffff7fe3bc0, flags=4)
        at /usr/local/src/ksh/src/cmd/ksh93/sh/xec.c:1176

So, as we can see in #2, the problem is an invalid pointer. It
becomes invalid in lex_advance() in lex.c on line 152, which is:

	 buff = lp->lexd.first;

So it looks like the lp->lexd.first pointer is somehow invalidated.

Analysis: The execution of both here-documents and command
substitutions involves parsing and lexical analysis at runtime. A
command substitution combined with nested execution of here-
documents causes lp->lexd.first to be changed, without being
restored when the nested execution returns.

So, this suggests that the fix would be to save that pointer before
calling comsubst() from sh_machere() to execute a command
substitution from within a here-document, and restore it after.
However, restoring it causes another, very similar crash when
running the regression tests from the modernish shell library.
So we can't rely on that pointer being valid either way.

src/cmd/ksh93/sh/macro.c: sh_machere():
- When executing a command substitution or arithmetic expression
  from a here-document, reset lp->lexd.first to NULL to avoid a
  potentially invalid address being used in lex_advance(). The
  lex.c code already handles the case of it being NULL.

Thanks to @stefanbidi for the bug report.
Resolves: #823
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

2 participants