Linux - Regular Expressions

Linux - Regular Expressions

·

5 min read

Regular expressions are special characters that help search data, matching complex patterns.

  1. GREP (Global Regular Expression Print): It searches a file for a particular pattern of characters and displays all lines that contain that pattern.

     # search a word (root) in a file
     ubuntu@ip-172-31-1-88:~$ grep root /etc/passwd 
     # search a word (root) insensitive in a file
     ubuntu@ip-172-31-1-88:~$ grep -i Root /etc/passwd 
     # search a word (root) in multiple files
     ubuntu@ip-172-31-1-88:~$ grep root /etc/passwd /etc/group 
     # inverting the string match i.e. output all lines except lines that have a ''root' as a string 
     ubuntu@ip-172-31-1-88:~$ grep -v root /etc/passwd 
     #Display the total line numbers matched with a string (root) in a file
     ubuntu@ip-172-31-1-88:~$ grep -c root /etc/passwd 
     #Display the filenames that match the string (root)
     ubuntu@ip-172-31-1-88:~$ grep -l root /etc/passwd /etc/shadow
     #Display the filenames that do not contain a string (root)
     ubuntu@ip-172-31-1-88:~$ grep -L root /etc/passwd /etc/shadow 
     #Display the line numbers that match a string (root)
     ubuntu@ip-172-31-1-88:~$ grep -n root /etc/passwd 
     #Display the line that starts with a string (root)
     ubuntu@ip-172-31-1-88:~$ grep ^root /etc/passwd 
     #Display the line that ends with a string (/bin/bash)
     ubuntu@ip-172-31-1-88:~$ grep /bin/bash$ 
     # Search a string (root) and write the output in a new file (find.txt)
     ubuntu@ip-172-31-1-88:~$ grep root /etc/passwd > devops/find.txt
    
  2. Find: It is used to search and locate a list of files and directories based on conditions you specify for files that match the arguments. Find can be used in a variety of conditions you can find files by permissions, users, groups, file type, date, size, etc. Find is the most important and much-used command in Linux systems.

     # Find files under home directory 
     ubuntu@ip-172-31-87-84:~$ find /home -name new* 
     # Find files with SUID permission
     ubuntu@ip-172-31-87-84:~$ find /var -perm 4755
     # Find files with GUID permission 
     ubuntu@ip-172-31-87-84:~$ find /var -perm 2644
     # Find file with sticky bit permission
     ubuntu@ip-172-31-87-84:~$ find /var -perm 1755
     # Search files based on user (steve)
     ubuntu@ip-172-31-87-84:~$ find /var -user steve
     # Search files based on group (steve)
     ubuntu@ip-172-31-87-84:~$ find /var -group steve
     # Search file with less than 10MB in folder (/tmp) 
     ubuntu@ip-172-31-87-84:~$ find /tmp -size -10M  
     # Search file with more than 10MB in folder (/tmp)
     ubuntu@ip-172-31-87-84:~$ find /tmp -size +10M
    
  3. WC (Word count): It is used to count word and line numbers.

     # Count the number of lines in a file
     ubuntu@ip-172-31-87-84:~$ wc -l /etc/passwd 
     # Count the number of words in a file 
     ubuntu@ip-172-31-87-84:~$ wc -w /etc/passwd
    
  4. Head: It is used to display the top line in a file.

     # Display top 10 lines of the file
     ubuntu@ip-172-31-87-84:~$ head /etc/passwd
     # Display a top-specific number of lines in the file 
     ubuntu@ip-172-31-87-84:~$ head -n 5 /etc/passwd
    
  5. Tail: It is used to display the bottom line in a file.

     # Display the bottom 10 lines of the file
     ubuntu@ip-172-31-87-84:~$ tail /etc/passwd
     # Display a bottom-specific number of lines in the file 
     ubuntu@ip-172-31-87-84:~$ tail -n 8 /etc/passwd
    
  6. Sed (Stream Editor): It is used to parse and transform information. It can perform lots of functions on file like searching, finding and replacing, insertion or deletion. Though most common use of the SED command is for a substitution or for find and replace. We can edit files even without opening them, which is a much quicker way to find and replace something in the file, than first opening that file in vi editor and then changing it.

     ubuntu@ip-172-31-87-84:~$ cat > sample.txt
     Hello to the world of unix.
     ubuntu@ip-172-31-87-84:~$ sed 's/unix/linux/' sample.txt
     Hello to the world of linux.
    
  7. Awk: Awk is abbreviated from the names of the developers – Aho, Weinberger, and Kernighan. It is a utility and scripting language for performing simple/complex text-processing tasks. The most common action of awk is 'print'.

    AWK Operations:
    a. Scans a file line by line
    b. Splits each input line into fields
    c. Compares input line/fields to pattern
    d. Performs actions on matched lines

    2. Useful For:
    a. Transform data files
    b. Produce formatted reports

    3. Programming Constructs:
    a. Format output lines
    b. Arithmetic and string operations
    c. Conditionals and loops

    Example:

    Consider the following text file as the input file for all cases below:

     ubuntu@ip-172-31-87-84:~$ cat > employee.txt 
     Joe manager account 45000
     Cavin clerk account 25000
     Brian manager sales 50000
     Noel manager account 47000
     tarun peon sales 15000
     Danny clerk sales 23000
     Steve peon sales 13000
     Mark director purchase 80000
     # Default behaviour of awk , print every data in the file
     ubuntu@ip-172-31-87-84:~$ awk '{print}' employee.txt
     Joe manager account 45000
     Cavin clerk account 25000
     Brian manager sales 50000
     Noel manager account 47000
     Tarun peon sales 15000
     Danny clerk sales 23000
     Steve peon sales 13000
     Mark director purchase 80000
     # Print lines which match with the given pattern (/manager/) 
     ubuntu@ip-172-31-87-84:~$ awk '/manager/ {print}' employee.txt 
     Joe manager account 45000
     Brian manager sales 50000
     Noel manager account 47000
     # Print columns $1 and $4 
     ubuntu@ip-172-31-87-84:~$ awk '{print $1,$4}' employee.txt
    

    Built-In Variables In Awk

    Awk’s built-in variables include the field variables—$1, $2, $3, and so on ($0 is the entire line) — that break a line of text into individual words or pieces called fields.

    • NR: NR command keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file.

    • NF: NF command keeps a count of the number of fields within the current input record.

Examples:

Use of NR built-in variables (Display Line Number)

    $ awk '{print NR,$0}' employee.txt 
    1 Joe manager account 45000
    2 Cavin clerk account 25000
    3 Brian manager sales 50000
    4 Noel manager account 47000
    5 Tarun peon sales 15000
    6 Danny clerk sales 23000
    7 Steve peon sales 13000
    8 Mark director purchase 80000

Use of NF built-in variables (Display Last Field)

    $ awk '{print $1,$NF}' employee.txt 
    Joe  45000
    Cavin 25000
    Brian 50000
    Noel  47000
    Tarun  15000
    Danny  23000
    Steve 13000
    Mark  80000