shell of leetcode

1.Tenth Line

How would you print just the 10th line of a file?

For example, assume that file.txt has the following content:
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10

Your script should output the tenth line, which is:
Line 10
-------------------

# Read from the file file.txt and output the tenth line to stdout.

#Solution One:
#head -n 10 file.txt | tail -n +10

#Solution Two:
#awk 'NR==10' file.txt

#Solution Three:
sed -n 10p file.txt

涉及知识点：

－>head 用来显示档案的开头至标准输出中，默认head命令打印其相应文件的开头10行。

语法格式：head [参数]... [文件]...

命令参数：

-q 隐藏文件名

-v 显示文件名

-c<字节> 显示字节数

-n<行数> 显示的行数

－>tail命令用于显示指定文件末尾内容，不指定文件时，作为输入信息进行处理。常用查看日志文件。

命令参数：

-f 循环读取

-q 不显示处理信息

-v 显示详细的处理信息

-c<数目> 显示的字节数

-n<行数> 显示行数

--pid=PID 与-f合用,表示在进程ID,PID死掉之后结束.

-q, --quiet, --silent 从不输出给出文件名的首部

-s, --sleep-interval=S 与-f合用,表示在每次反复的间隔休眠S秒

可参考：我使用过的Linux命令之tail - 输出文件尾部/动态监视文件尾部

->awk是一个强大的文本分析工具，相对于grep的查找，sed的编辑，awk在其对数据分析并生成报告时，显得尤为强大。简单来说awk就是把文件逐行的读入，以空格为默认分隔符将每行切片，切开的部分再进行各种分析处理。

语法格式：

awk '{pattern + action}' {filenames}

pattern 表示 AWK 在数据中查找的内容，而 action 是在找到匹配内容时所执行的一系列命令　　

可参考：linux awk命令详解

->sed 是一种在线编辑器，它一次处理一行内容。处理时，把当前处理的行存储在临时缓冲区中，称为“模式空间”（pattern space），接着用sed命令处理缓冲区中的内容，处理完成后，把缓冲区的内容送往屏幕。接着处理下一行，这样不断重复，直到文件末尾。

语法格式：

sed [-hnV][-e<script>][-f<script文件>][文本文件]

2.Transpose File
Given a text file file.txt, transpose its content.

You may assume that each row has the same number of columns and each field is separated by the ' ' character.

For example, if file.txt has the following content:

name age
alice 21
ryan 30
Output the following:

name alice ryan
age 21 30

－－－－－－－－－

# Read from the file file.txt and print its transposed content to stdout.
# using awk for this purpose
awk '
    {
        for(i=1; i<=NF; i++)
        {   
            if(line[i] == "")
            {
                line[i] = $i
            }
            else
            {
                line[i] = line[i]" "$i
            }
        }
    }
    END{
         for(i=1; i<=NF; i++)
         {
             print line[i]
         }
       }
    ' file.txt

如果The number of columns is two.则可以用以下方法：

test2

name age
alice 21
ryan 30

solution：

MindeMacBook-Pro:闲杂笔记 minzhu$ cut -d " " -f1 test2 |xargs
name alice ryan
MindeMacBook-Pro:闲杂笔记 minzhu$ cut -d " " -f2 test2 |xargs
age 21 30

3.Valid Phone Numbers

Given a text file file.txt that contains list of phone numbers (one per line), write a one liner bash script to print all valid phone numbers.

You may assume that a valid phone number must appear in one of the following two formats: (xxx) xxx-xxxx or xxx-xxx-xxxx. (x means a digit)

You may also assume each line in the text file must not contain leading or trailing white spaces.

For example, assume that file.txt has the following content:

987-123-4567
123 456 7890
(123) 456-7890

Your script should output the following valid phone numbers:

987-123-4567
(123) 456-7890

------------

file.txt

987-123-4567
123 456 7890
(123) 456-7890

solution1:

grep -e '(^[0-9]{3}-[0-9]{3}-[0-9]{4}$)' -e '(^([0-9]{3})[ ]{1}[0-9]{3}-([0-9]{4})$)'  file.txt

explanation:

In Bash, we use to escape next one trailing character;
^ is used to denote the beginning of a line
$ is used to denote the end of a line
{M} is used to denote to match exactly M times of the previous occurence/regex
(...) is used to group pattern/regex together

Back to this problem: it requires us to match two patterns, for better readability, I used -e and separate the two patterns into two regexes, the first one matches this case: xxx-xxx-xxxx and the second one matches this case: (xxx) xxx-xxxx

solution2:

awk < file.txt '/^[0-9][0-9][0-9]-[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$/ || /^([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$/ {print}'

The format for 'awk':
awk < file 'pattern {action}'
or
awk 'pattern {action}' file

Note: 'print' action without any arguments means print out the whole line.

4.Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

words.txt contains only lowercase characters and space ' ' characters.
Each word must consist of lowercase characters only.
Words are separated by one or more whitespace characters.

For example, assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

-----------------　　

words.txt

the day is sunny the the
the sunny is is

solution1:

awk '{for(i=1;i<=NF;i++) a[$i]++} END {for(k in a) print k,a[k]}' words.txt | sort -k2 -nr

solution2:

sed 's/^s+//g; s/s+/ /g; s/s+$//g' words.txt | tr ' ' '
' | sort | uniq -c | sort -nr | awk -F' ' '{print $2" "$1}'

use sed to strip head & tail spaces，and change inline spaces to one space
use tr to trans space to return (these two steps also can be done cat words.txt | tr -s ' ' ' ')
sort the words
uniq to count words
sort the stats result，-n for numeric sort，-r for reverse
use awk to format the output

参考：leetcode