本文主要提取自man sed
和Sed 使用参考手册。也可以使用info sed com
获得和本文相近的内容。对于更详细更完整的 sed 文档,可参阅 info sed
Sed 是一款流编辑工具,用来对文本进行过滤与替换操作。Sed 一次仅读取一行内容来对某些指令进行处理后输出,所以 Sed 处理大数据文件是很方便快捷的。
Sed 的工作流程是:首先,通过文件或者管道读取文件内容。Sed 默认并不直接修改源文件,而是将读入的内容赋值到缓冲区中,这被称之为模式空间,所有指令操作都是在模式空间中进行。然后, Sed 根据相应的指令对模式空间中的内容进行处理并输出,默认输出至标准输出(即屏幕)。
注意:sed 脚本的执行顺序不是由命令的出现的先后顺序决定的,而是由命令涉及的行号(address ranges)决定的,因为 sed 只遍历一遍待处理文本
- 相关资源
- sed 命令行参数
- sed 常用格式
- Zero-address commands
- Zero- or One- address commands
- Commands which accept address ranges
s
命令 flag- 遇到过的问题
- 实践记录
- 链接
相关资源
- man sed
- info sed
- sed.sf.net - The sed $HOME
-
sed.sourceforge.net/grabbag/scripts/remccoms3.sed
sed 命令行参数
使用
sed --help
可得到如下帮助:
Usage: sed [OPTION]... {script-only-if-no-other-script} [input-file]... -n, --quiet, --silent suppress automatic printing of pattern space -e script, --expression=script add the script to the commands to be executed -f script-file, --file=script-file add the contents of script-file to the commands to be executed --follow-symlinks follow symlinks when processing in place -i[SUFFIX], --in-place[=SUFFIX] edit files in place (makes backup if SUFFIX supplied) -c, --copy use copy instead of rename when shuffling files in -i mode -b, --binary does nothing; for compatibility with WIN32/CYGWIN/MSDOS/EMX ( open files in binary mode (CR+LFs are not treated specially)) -l N, --line-length=N specify the desired line-wrap length for the `l' command --posix disable all GNU extensions. -r, --regexp-extended use extended regular expressions in the script. -s, --separate consider files as separate rather than as a single continuous long stream. -u, --unbuffered load minimal amounts of data from the input files and flush the output buffers more often -z, --null-data separate lines by NUL characters --help display this help and exit --version output version information and exit If no -e, --expression, -f, or --file option is given, then the first non-option argument is taken as the sed script to interpret. All remaining arguments are names of input files; if no input files are specified, then the standard input is read. GNU sed home page: <http://www.gnu.org/software/sed/>. General help using GNU software: <http://www.gnu.org/gethelp/>. E-mail bug reports to: <bug-sed@gnu.org>. Be sure to include the word 'ed' somewhere in the 'ubject:' field.
sed 常用格式
1
2
3
4
5
sed [options] 'script' file1 file2 ...
sed [options] –f scriptfile file1 file2 ...
sed 'script' file | sed 'script' | sed 'script'
sed 'statement1; statement2; statement3' file1 file2 ...
sed -e 'statement1' -e 'statement2' -e 'statement3' file1 file2 ...
通常一个statement
为:
1
2
3
4
5
address{
command1
command2
command3
}
如果command
只有一个,则{}
可以省去
Zero-address commands
: label |
Label for b and t commands. |
#comment |
The comment extends until the next newline (or the end of a -e script fragment). |
} |
The closing bracket of a { } block. |
Zero- or One- address commands
= |
Print the current line number. |
a text |
Append text, which has each embedded newline preceded by a backslash. |
i text |
Insert text, which has each embedded newline preceded by a backslash. |
q [exit-code] |
Immediately quit the sed script without processing any more input, except that if auto-print is not disabled the current pattern space will be printed. The exit code argument is a GNU extension. |
Q [exit-code] |
Immediately quit the sed script without processing any more input. This is a GNU extension. |
r filename |
Append text read from filename. |
R filename |
Append a line read from filename. Each invocation of the command reads a line from the file. This is a GNU extension. |
Commands which accept address ranges
{ |
Begin a block of commands (end with a }). |
b label |
Branch to label; if label is omitted, branch to end of script. |
c text |
Replace the selected lines with text, which has each embedded newline preceded by a backslash. |
d |
Delete pattern space. Start next cycle. |
D |
If pattern space contains no newline, start a normal new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the first newline, and restart cycle with the resultant pattern space, without reading a new line of input. |
h H |
Copy/append pattern space to hold space. |
g G |
Copy/append hold space to pattern space. |
l |
List out the current line in a ‘visually unambiguous’ form. |
l width |
List out the current line in a ‘visually unambiguous’ form, breaking it at width characters. This is a GNU extension. |
n N |
Read/append the next line of input into the pattern space. |
p |
Print the current pattern space. |
P |
Print up to the first embedded newline of the current pattern space. |
s/regexp/replacement/ |
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp. |
t label |
If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. |
T label |
If no s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. This is a GNU extension. |
w filename |
Write the current pattern space to filename. |
W filename |
Write the first line of the current pattern space to filename. This is a GNU extension. |
x |
Exchange the contents of the hold and pattern spaces. |
y/source/dest/ |
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest. |
s
命令 flag
n |
1 - 512 之间的数字,表示对模式空间中指定模式的第 n 次出现进行替换。如一行中有3个A,而只想替换第二个A。 |
g |
对模式空间的所有匹配进行全局更改。没有 g 则只有第一次匹配被替换。如一行中有3个A,则仅替换第一个A。 |
p |
打印模式空间的内容,即表示打印行。与-n选项一起使用可以只打印匹配的行。 |
w file |
将模式空间的内容写到文件 file 中。 即表示把行写入一个文件。 |
遇到过的问题
- How to match newlines in sed « \1
- linux - sed: how to replace line if found or append to end of file if not found? - Super User
- The sed (Stream Editor) FAQ - 4.11. How do I match only the first occurrence of a pattern?
- bash - Replace a long string with the sed command: Argument list too long error - Unix & Linux Stack Exchange
- bash - Insert newline (\n) using sed - Stack Overflow
- grep - Sed to print out the line number - Unix & Linux Stack Exchange
- sed - How to insert text after a certain string in a file? - Unix & Linux Stack Exchange
取反
1
find . -type f -name "*" | sed -n '/.abc/!p'
条件判断
sed
无法实现通常意义上的条件判断(即if..else..
),它只有在特殊情况下才显得能够实现条件判断。比如欲解决如下问题:
1
2
3
if (sed command does find a match for "::=BEGIN")
then i=1
else i=0
可以使用如下命令,显得能够实现条件判断的样子:
1
i=$(sed ':a;N;$!ba;s/\n/ /g' foo | sed '{/::=BEGIN/{s/.*/1/; b next}; s/.*/0/; :next}')
上述问题和解法均来自 bash - Use sed command to apply a condition to check if a pattern has been matched - Ask Ubuntu
但是对于我在处理缩略语时遇到的一个问题,经过数次尝试,使用 sed 无法实现,即使这个问题只涉及到一个简单的条件判断。当然,也有可能是我个人能力有限,没找到解决办法。关于这个问题的更多信息,参见 处理缩略语
添加空行
测试文本 a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
---
tags: [sed]
last_modified_time: 2019-04-21 17:59:31 +0800
---
<!-- vim-markdown-toc GFM -->
* [sed 命令行参数](#sed-命令行参数)
* [sed 常用格式](#sed-常用格式)
* [Zero-address commands](#zero-address-commands)
* [Zero- or One- address commands](#zero--or-one--address-commands)
* [Commands which accept address ranges](#commands-which-accept-address-ranges)
* [`s`命令 flag](#s命令-flag)
* [常见问题](#常见问题)
* [取反](#取反)
要求在<!-- vim-markdown-toc GFM -->
一行前添加如下内容(注意有个空行):
1
<p id="markdown-toc"></p>
i\n
和a\n
使用如下命令即可:
sed -e '/^<!-- vim-markdown-toc GFM -->$/{i\ <p id="markdown-toc"></p>\n ; }' a
i
命令在指定行的前面插入文本,a
命令在指定行的后面添加文本。由于a
和i
极为类似,故在此不再赘述
{x;p;x}
和G
sed -e '/^<!-- vim-markdown-toc GFM -->$/{i\ <p id="markdown-toc"></p> ; {x;p;x}}' a
{x;p;x}
在指定行前插入空行,G
在指定行后添加空行。由于它们用法相似,故同样不再赘述
关于在更多位置(如文末)插入空行的方法参见 sed之添加空行 - 冰灵儿 - 博客园
实践记录
整理nmap -sP
的输出结果
执行nmap -sP 192.168.56.0/24
得到的输出:
Starting Nmap 6.40 ( http://nmap.org ) at 2019-04-20 18:06 CST Nmap scan report for host (192.168.56.100) Host is up (0.00014s latency). MAC Address: 0A:00:27:00:00:1C (Unknown) Nmap scan report for master (192.168.56.11) Host is up. Nmap done: 256 IP addresses (2 hosts up) scanned in 3.27 seconds
从该输出中提取 IP 地址:
1
sed -n 's/.*[^0-9]\(\([[:digit:]]\{1,\}\.\)\{3\}[[:digit:]]\{1,\}\).*/\1/p' nmap-sP-results.txt
整理nmap -sS -p22 -oG
的输出结果
执行nmap -sS -p22 -oG a 202.117.0.0/16
命令得到的输出:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Nmap 7.70 scan initiated Tue Dec 4 13:42:48 2018 as: nmap -sS -p22 -oG a 202.117.0.0/16
Host: 202.117.0.0 () Status: Up
Host: 202.117.0.0 () Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.1 (0h1.xjtu.edu.cn) Status: Up
Host: 202.117.0.1 (0h1.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.2 (0h2.xjtu.edu.cn) Status: Up
Host: 202.117.0.2 (0h2.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.3 (0h3.xjtu.edu.cn) Status: Up
Host: 202.117.0.3 (0h3.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.4 (0h4.xjtu.edu.cn) Status: Up
Host: 202.117.0.4 (0h4.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.5 (voice-gw.xjtu.edu.cn) Status: Up
Host: 202.117.0.5 (voice-gw.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
Host: 202.117.0.6 (0h6.xjtu.edu.cn) Status: Up
Host: 202.117.0.6 (0h6.xjtu.edu.cn) Ports: 22/filtered/tcp//ssh///
......
从该输出中提取信息:
1
2
3
4
5
6
7
8
9
10
11
# get active ip
sed -ne '/.*/{s/^Host: \([0-9.]\+\) .*/\1/p}' a
# get ssh-active ip
sed -ne '/22\/open\/tcp/{s/^Host: \([0-9.]\+\) (.*/\1/p}' a
# get domain-name-haved ip and domain-name
sed -ne '/.*/{s/^Host: \([0-9.]\+ ([^)]\+)\).*/\1/p}' a
# get ssh-active and domain-name-haved ip and domain-name
sed -ne '/22\/open\/tcp/{s/^Host: \([0-9.]\+ ([^)]\+)\).*/\1/p}' a
处理缩略语
有一次在写汇编语言相关的博客( 16位汇编程序设计 )时遇到了大量缩略语,处理时发现纯手打太麻烦,于是想到从网上下载一个缩略语文本,命名为 abbreviations.txt,通过一个简单的 Vim 宏从博客中提取所有缩略语到文末,再在 abbreviations.txt 中进行查找并替换,以减轻工作量。但发现其中的查找及替换竟难倒了我,于是不服,重新学了一遍 sed,信心满满地打算用一个 sed 脚本配合 shell 脚本实现。却发现自己失败了,特此记录之。后来使用 grep 配合 shell 脚本轻松实现,此外还有很多其它的 shell 版本。它们都在 Bash使用笔记 - 处理缩略语 中
问题详情
将文件 word.txt 中的每一行 word 作为搜索关键字在文件 abbreviations.txt 中进行查找,如果找到相应的行则输出找到的行(找到几行就输出几行),如果找不到相应的行,则输出*[$word]:
文件 word.txt:
ABC ADD AF AH AL AMD ASCII ASSUME AT AX ...
文件 abbreviations.txt:
*[#!]: Shebang *[/.]: Slashdot *[100B-FX]: 100BASE-FX *[100B-TX]: 100BASE-TX *[100B-T]: 100BASE-T *[100BVG]: 100BASE-VG *[10B-FB]: 10BASE-FB *[10B-FL]: 10BASE-FL *[10B-FP]: 10BASE-FP *[10B-F]: 10BASE-F ... *[ABCL]: Actor-Based Concurrent Language ... *[AF]: Auxiliary carry Flag(辅助进位标志) ...
想得到的输出:
*[ABC]: *[ADD]: *[AF]: Anisotropic Filtering *[AF]: Auxiliary carry Flag(辅助进位标志) *[AH]: Accumulator register High(寄存器 A 高位) *[AH]: Active Hub *[AL]: Access List *[AL]: Accumulator register Low(寄存器 A 低位) *[AL]: Active Link *[AMD]: Advanced Micro Devices *[ASCII]: American Standard Code for Information Interchange *[ASSUME]: *[AT]: Access Time
尝试过的方法
方法一:使用b
命令
while read line do sed -n -e "/\[$line]/{p; b next;}; 1a \ \*\[$line]: ;:next;" abbreviations.txt done < "${1:-/dev/stdin}"
最初企图模仿上述方法使用b
命令实现条件判断。但是失败多次之后才知道,sed 的脚本不是顺序执行的。为了提高性能,它只遍历一次要修改的文本,因此,是以行号的递增为执行顺序的。比如对于以上脚本而言,1a *[$line]
无论如何都会被执行,而非只要找到一个$line
就可以跳过,因为上述 sed 脚本中的第二个命令1a *[$line]
对应的行号是1
,所以最先执行的就是它,然后在找到了$line
后再执行里面p; b next;
,但这里的b next
已经毫无意义了
如果上述 sed 命令 显得过于复杂,导致难于理解,我们可以看个简单的 sed 命令来理解上述问题:
sed -n -e '/ABC/{p; q;}; 1a\ ABC ;' abbreviations.txt
我期待的是一旦找到 ABC 则打印当前行的内容后马上退出脚本(使用了q
命令),如果没有找到则在第一行后面添加一行ABC
。但实际情况为:即使找到了 ABC,依然会在第一行后面添加ABC
。这便是因为 sed 会将以;
或参数-e
分隔的命令根据相关命令涉及的位置决定执行顺序的缘故。
于是我想到了这样:
while read line do sed -n -e "/\[$line]/{p; b next;}; \$a \ \*\[$line]: ;:next;" abbreviations.txt done < "${1:-/dev/stdin}"
发现并没有解决我的问题。甚至不知道为什么会失败。感觉b
命令好像是假的一样
方法二:去掉b
命令
这个方法是上一个方法的最终方案的简化(去掉了疑似假的的b
命令):
while read line do sed -n -e "/\[$line]/{p;}; \$a \ \*\[$line]: ;" abbreviations.txt done < "${1:-/dev/stdin}"
自然地,该方法也存在和上一个方法的最终解决方案一样的问题,即对于 word.txt 中的每一个 word,即使在 abbreviations.txt 中找到了相应的行也依然会输出*[$word]:
方法三:使用q
命令
while read line do sed -n -e "/\[$line]/{p; q;}; \$a \ \*\[$line]: ;" abbreviations.txt done < "${1:-/dev/stdin}"
该方法看起来不错,但依然存在问题,即只能在 abbreviations.txt 中找到一个和 word 匹配的行(因为使用了q
命令)
链接
下面总结了本文中使用的所有链接:
- 16位汇编程序设计
- Bash使用笔记 - 处理缩略语
- Sed 使用参考手册
- bash - Use sed command to apply a condition to check if a pattern has been matched - Ask Ubuntu