Sed使用笔记

发表于 2019/04/20 更新于 2025/07/07

作者 wsxq2

19 分钟阅读

Sed使用笔记

本文主要提取自man sed和Sed 使用参考手册。也可以使用info sed com获得和本文相近的内容。对于更详细更完整的 sed 文档，可参阅 info sed

Sed 是一款流编辑工具，用来对文本进行过滤与替换操作。Sed 一次仅读取一行内容来对某些指令进行处理后输出，所以 Sed 处理大数据文件是很方便快捷的。

Sed 的工作流程是：首先，通过文件或者管道读取文件内容。Sed 默认并不直接修改源文件，而是将读入的内容赋值到缓冲区中，这被称之为模式空间，所有指令操作都是在模式空间中进行。然后， Sed 根据相应的指令对模式空间中的内容进行处理并输出，默认输出至标准输出（即屏幕）。

注意：sed 脚本的执行顺序不是由命令的出现的先后顺序决定的，而是由命令涉及的行号（address ranges）决定的，因为 sed 只遍历一遍待处理文本

相关资源
sed 命令行参数
sed 常用格式
Zero-address commands
Zero- or One- address commands
Commands which accept address ranges
s命令 flag
遇到过的问题
实践记录
链接

sed 常用格式

sed [options] 'script' file1 file2 ...
sed [options] –f scriptfile file1 file2 ...
sed 'script' file | sed 'script' | sed 'script'
sed 'statement1; statement2; statement3' file1 file2 ...
sed -e 'statement1' -e 'statement2' -e 'statement3' file1 file2 ...

通常一个statement为：

address{
    command1
    command2
    command3
}

如果command只有一个，则{}可以省去

Zero-address commands

`: label`	Label for b and t commands.
`#comment`	The comment extends until the next newline (or the end of a -e script fragment).
`}`	The closing bracket of a { } block.

Zero- or One- address commands

`=`	Print the current line number.
`a text`	Append text, which has each embedded newline preceded by a backslash.
`i text`	Insert text, which has each embedded newline preceded by a backslash.
`q [exit-code]`	Immediately quit the sed script without processing any more input, except that if auto-print is not disabled the current pattern space will be printed. The exit code argument is a GNU extension.
`Q [exit-code]`	Immediately quit the sed script without processing any more input. This is a GNU extension.
`r filename`	Append text read from filename.
`R filename`	Append a line read from filename. Each invocation of the command reads a line from the file. This is a GNU extension.

Commands which accept address ranges

`{`	Begin a block of commands (end with a }).
`b label`	Branch to label; if label is omitted, branch to end of script.
`c text`	Replace the selected lines with text, which has each embedded newline preceded by a backslash.
`d`	Delete pattern space. Start next cycle.
`D`	If pattern space contains no newline, start a normal new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the first newline, and restart cycle with the resultant pattern space, without reading a new line of input.
`h H`	Copy/append pattern space to hold space.
`g G`	Copy/append hold space to pattern space.
`l`	List out the current line in a ‘visually unambiguous’ form.
`l width`	List out the current line in a ‘visually unambiguous’ form, breaking it at width characters. This is a GNU extension.
`n N`	Read/append the next line of input into the pattern space.
`p`	Print the current pattern space.
`P`	Print up to the first embedded newline of the current pattern space.
`s/regexp/replacement/`	Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
`t label`	If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script.
`T label`	If no s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. This is a GNU extension.
`w filename`	Write the current pattern space to filename.
`W filename`	Write the first line of the current pattern space to filename. This is a GNU extension.
`x`	Exchange the contents of the hold and pattern spaces.
`y/source/dest/`	Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.

`s`命令 flag

`n`	1 - 512 之间的数字，表示对模式空间中指定模式的第 n 次出现进行替换。如一行中有3个A，而只想替换第二个A。
`g`	对模式空间的所有匹配进行全局更改。没有 g 则只有第一次匹配被替换。如一行中有3个A，则仅替换第一个A。
`p`	打印模式空间的内容，即表示打印行。与-n选项一起使用可以只打印匹配的行。
`w file`	将模式空间的内容写到文件 file 中。即表示把行写入一个文件。

遇到过的问题

取反

find . -type f -name "*" | sed -n '/.abc/!p'

条件判断

sed无法实现通常意义上的条件判断（即if..else..），它只有在特殊情况下才显得能够实现条件判断。比如欲解决如下问题：

if (sed command does find a match for "::=BEGIN")
then i=1 
else i=0

可以使用如下命令，显得能够实现条件判断的样子：

i=$(sed ':a;N;$!ba;s/\n/ /g' foo | sed '{/::=BEGIN/{s/.*/1/; b next}; s/.*/0/; :next}')

上述问题和解法均来自 bash - Use sed command to apply a condition to check if a pattern has been matched - Ask Ubuntu

但是对于我在处理缩略语时遇到的一个问题，经过数次尝试，使用 sed 无法实现，即使这个问题只涉及到一个简单的条件判断。当然，也有可能是我个人能力有限，没找到解决办法。关于这个问题的更多信息，参见处理缩略语

添加空行

测试文本 a

---
tags: [sed]
last_modified_time: 2019-04-21 17:59:31 +0800
---

<!-- vim-markdown-toc GFM -->

* [sed 命令行参数](#sed-命令行参数)
* [sed 常用格式](#sed-常用格式)
* [Zero-address commands](#zero-address-commands)
* [Zero- or One- address commands](#zero--or-one--address-commands)
* [Commands which accept address ranges](#commands-which-accept-address-ranges)
* [`s`命令 flag](#s命令-flag)
* [常见问题](#常见问题)
  * [取反](#取反)

要求在一行前添加如下内容（注意有个空行）：

<p id="markdown-toc"></p>

`i\n`和`a\n`

使用如下命令即可：

sed -e '/^<!-- vim-markdown-toc GFM -->$/{i\
<p id="markdown-toc"></p>\n
; }' a

i命令在指定行的前面插入文本，a命令在指定行的后面添加文本。由于a和i极为类似，故在此不再赘述

`{x;p;x}`和`G`

sed -e '/^<!-- vim-markdown-toc GFM -->$/{i\
<p id="markdown-toc"></p>
; {x;p;x}}' a

{x;p;x}在指定行前插入空行，G在指定行后添加空行。由于它们用法相似，故同样不再赘述

关于在更多位置（如文末）插入空行的方法参见 sed之添加空行 - 冰灵儿 - 博客园

实践记录

整理`nmap -sP`的输出结果

执行nmap -sP 192.168.56.0/24得到的输出：

Starting Nmap 6.40 ( http://nmap.org ) at 2019-04-20 18:06 CST
Nmap scan report for host (192.168.56.100)
Host is up (0.00014s latency).
MAC Address: 0A:00:27:00:00:1C (Unknown)
Nmap scan report for master (192.168.56.11)
Host is up.
Nmap done: 256 IP addresses (2 hosts up) scanned in 3.27 seconds

从该输出中提取 IP 地址：

sed -n 's/.*[^0-9]\(\([[:digit:]]\{1,\}\.\)\{3\}[[:digit:]]\{1,\}\).*/\1/p' nmap-sP-results.txt

整理`nmap -sS -p22 -oG`的输出结果

执行nmap -sS -p22 -oG a 202.117.0.0/16命令得到的输出：

# Nmap 7.70 scan initiated Tue Dec  4 13:42:48 2018 as: nmap -sS -p22 -oG a 202.117.0.0/16
Host: 202.117.0.0 ()    Status: Up
Host: 202.117.0.0 ()    Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.1 (0h1.xjtu.edu.cn) Status: Up
Host: 202.117.0.1 (0h1.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.2 (0h2.xjtu.edu.cn) Status: Up
Host: 202.117.0.2 (0h2.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.3 (0h3.xjtu.edu.cn) Status: Up
Host: 202.117.0.3 (0h3.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.4 (0h4.xjtu.edu.cn) Status: Up
Host: 202.117.0.4 (0h4.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.5 (voice-gw.xjtu.edu.cn)    Status: Up
Host: 202.117.0.5 (voice-gw.xjtu.edu.cn)    Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.6 (0h6.xjtu.edu.cn) Status: Up
Host: 202.117.0.6 (0h6.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
......

从该输出中提取信息：

# get active ip
sed -ne '/.*/{s/^Host: \([0-9.]\+\) .*/\1/p}' a

# get ssh-active ip
sed -ne '/22\/open\/tcp/{s/^Host: \([0-9.]\+\) (.*/\1/p}' a

# get domain-name-haved ip and domain-name
sed -ne '/.*/{s/^Host: \([0-9.]\+ ([^)]\+)\).*/\1/p}' a

# get ssh-active and domain-name-haved ip and domain-name
sed -ne '/22\/open\/tcp/{s/^Host: \([0-9.]\+ ([^)]\+)\).*/\1/p}' a

处理缩略语

有一次在写汇编语言相关的博客（ 16位汇编程序设计）时遇到了大量缩略语，处理时发现纯手打太麻烦，于是想到从网上下载一个缩略语文本，命名为 abbreviations.txt，通过一个简单的 Vim 宏从博客中提取所有缩略语到文末，再在 abbreviations.txt 中进行查找并替换，以减轻工作量。但发现其中的查找及替换竟难倒了我，于是不服，重新学了一遍 sed，信心满满地打算用一个 sed 脚本配合 shell 脚本实现。却发现自己失败了，特此记录之。后来使用 grep 配合 shell 脚本轻松实现，此外还有很多其它的 shell 版本。它们都在 Bash使用笔记 - 处理缩略语中

问题详情

将文件 word.txt 中的每一行 word 作为搜索关键字在文件 abbreviations.txt 中进行查找，如果找到相应的行则输出找到的行（找到几行就输出几行），如果找不到相应的行，则输出*[$word]:

文件 word.txt：

ABC
ADD
AF
AH
AL
AMD
ASCII
ASSUME
AT
AX
...

文件 abbreviations.txt：

*[#!]: Shebang
*[/.]: Slashdot
*[100B-FX]: 100BASE-FX
*[100B-TX]: 100BASE-TX
*[100B-T]: 100BASE-T
*[100BVG]: 100BASE-VG
*[10B-FB]: 10BASE-FB
*[10B-FL]: 10BASE-FL
*[10B-FP]: 10BASE-FP
*[10B-F]: 10BASE-F
...
*[ABCL]: Actor-Based Concurrent Language
...
*[AF]: Auxiliary carry Flag（辅助进位标志）
...

想得到的输出：

*[ABC]:
*[ADD]:
*[AF]: Anisotropic Filtering
*[AF]: Auxiliary carry Flag（辅助进位标志）
*[AH]: Accumulator register High（寄存器 A 高位）
*[AH]: Active Hub
*[AL]: Access List
*[AL]: Accumulator register Low（寄存器 A 低位）
*[AL]: Active Link
*[AMD]: Advanced Micro Devices
*[ASCII]: American Standard Code for Information Interchange
*[ASSUME]:
*[AT]: Access Time

尝试过的方法

方法一：使用`b`命令

while read line
do
	sed -n -e "/\[$line]/{p; b next;}; 1a \
		\*\[$line]: 
		;:next;" abbreviations.txt
done < "${1:-/dev/stdin}"

最初企图模仿上述方法使用b命令实现条件判断。但是失败多次之后才知道，sed 的脚本不是顺序执行的。为了提高性能，它只遍历一次要修改的文本，因此，是以行号的递增为执行顺序的。比如对于以上脚本而言，1a *[$line]无论如何都会被执行，而非只要找到一个$line就可以跳过，因为上述 sed 脚本中的第二个命令1a *[$line]对应的行号是1，所以最先执行的就是它，然后在找到了$line后再执行里面p; b next;，但这里的b next已经毫无意义了

如果上述 sed 命令显得过于复杂，导致难于理解，我们可以看个简单的 sed 命令来理解上述问题：

sed -n -e '/ABC/{p; q;}; 1a\
ABC
;' abbreviations.txt

我期待的是一旦找到 ABC 则打印当前行的内容后马上退出脚本（使用了q命令），如果没有找到则在第一行后面添加一行ABC。但实际情况为：即使找到了 ABC，依然会在第一行后面添加ABC。这便是因为 sed 会将以;或参数-e分隔的命令根据相关命令涉及的位置决定执行顺序的缘故。

于是我想到了这样：

while read line
do
	sed -n -e "/\[$line]/{p; b next;}; \$a \
		\*\[$line]: 
		;:next;" abbreviations.txt
done < "${1:-/dev/stdin}"

发现并没有解决我的问题。甚至不知道为什么会失败。感觉b命令好像是假的一样😭

方法二：去掉`b`命令

这个方法是上一个方法的最终方案的简化（去掉了疑似假的的b命令）：

while read line
do
sed -n -e "/\[$line]/{p;}; \$a \
\*\[$line]: 
;" abbreviations.txt
done < "${1:-/dev/stdin}"

自然地，该方法也存在和上一个方法的最终解决方案一样的问题，即对于 word.txt 中的每一个 word，即使在 abbreviations.txt 中找到了相应的行也依然会输出*[$word]:

方法三：使用`q`命令

while read line
do
	sed -n -e "/\[$line]/{p; q;}; \$a \
		\*\[$line]: 
		;" abbreviations.txt
done < "${1:-/dev/stdin}"

该方法看起来不错，但依然存在问题，即只能在 abbreviations.txt 中找到一个和 word 匹配的行（因为使用了q命令）

链接

下面总结了本文中使用的所有链接：

sed

本文由作者按照 CC BY 4.0 进行授权

相关资源

sed 命令行参数

sed 常用格式

Zero-address commands

Zero- or One- address commands

Commands which accept address ranges

s命令 flag

遇到过的问题

取反

条件判断

添加空行

i\n和a\n

{x;p;x}和G

实践记录

整理nmap -sP的输出结果

整理nmap -sS -p22 -oG的输出结果