Expert Extra Video: Parsing Output with AWK

Haley_Ruccio
Community Manager
Community Manager

This expert extra video is now locked.

1_at9bgn66.jpg

Questions? Feedback? Share in the comments!

To view more Expert Extra videos, discover the benefits of the Red Hat Learning Subscription.

Tags (1)
29 responses
Trevor
Starfighter Starfighter
Starfighter

What a very nicely done demonsration to awaken (or in my case reawaken) a love for awk!  A little taste of a variable, regex in a pattern, and a function.  Sweet!!!

Now that that love has been reawakened, I'd like to drop by, at least once a week, to leave an example.  This could easily carry on for the next 5 years - and I'd love it!

Anyway, I'll start by dropping off an example that piggybacks one of the examples that was given in the video.  It simply demonstrates the logical NOT operator ( ! )

 awk '!/foo/ {count++} END {print "num_foos:", count}' foo.txt

Not that it's necessary, but if we were to add one more line to the 
file "foo.txt", for example:    chan  bar8

Executing the above command will produce the following output:

num_foos: 1


Thanks for the very succinct and powerful demonstration of the amazing filtering tool - awk

 

 

 

 

mafridi
Moderator
Moderator

Great work!

Quick & Informative. Cheers!

Chetan_Tiwary_
Community Manager
Community Manager

Nice demonstration in the expert extra video !

@Trevor good initiaive and nice addition too ! Inspired by you - I will add an even simpler awk usage :

Chetan_Tiwary__0-1694177615789.png

The toupper() function capitalizes the first letter - substr() function in awk is used to extract a substring from a string - syntax is : substr(string, start, length)  . The substr($0, 2) expression extracts the rest of the letters of the line. The tolower() function lowercases the rest of the letters.

What a great tool awk is !!

 

Trevor
Starfighter Starfighter
Starfighter

Chetan, another great contribution by you that demonstrates your value to the learning community.  I'd give you 2 thumbs up (kudos) on this one if I could!  A (seemingly) simple use of the awk tool, but indeed a powerful demonstration of its functionality and capability.

As always, thank you for the kind comments, and for continuing to raise the bar!!!

 

 

Trevor
Starfighter Starfighter
Starfighter

Like any good tool that offers programmability,
awk makes use variables. awk includes built-in
variables, and also includes the ability to define
(declare) variables.

In this post, I'll only mention the variables - no 
examples. In subsequent postings, I'll serve up an
example (or examples)  involving each variable. As I
mentioned previously, this will be a marathon in terms
of coverage, and not a sprint!

Below is a list of the built-in variables in awk,
along with a brief explanation of each one.
There are some others that are not in my list,
only because they are variables of awk's cousin -
GNU awk (aka gawk).

Variable                  Description
$0                            Whole line
$1, $2...$NF            First field, second field,… last field
NR                           Number of Records
NF                           Number of Fields
OFS                        Output Field Separator (default " ")
FS                           input Field Separator (default " ")
ORS                        Output Record Separator (default "\n")
RS                           input Record Separator (default "\n")
FILENAME              Name of the file
ARGC                      Number or arguments
ARGV                      Array of arguments
FNR                         File Number of Records
OFMT                      Format for numbers (default "%.6g")
RSTART                  Location in the string
RLENGTH               Length of match
SUBSEP                  Multi-dimensional array separator (default "\034")

 

 

Chetan_Tiwary_
Community Manager
Community Manager

@Trevor  ok, now you are just showing off !! Excellent addition on the built-in variables !

Here is me trying to use the concepts you shared :

in-built vaiables in awkin-built vaiables in awk

Trevor
Starfighter Starfighter
Starfighter

Chetan, how did you know I was showing off?  What gave it away?
I plead guily, and I through myself on the mercy of the court !!!

You trying to use the concepts I shared?  Chetan, your humility is showing again !

I owe you another thanks for elevating one of my post.  As usual, your
examples are very rich, and provide an immense amount of clarity!!!

 

Trevor
Starfighter Starfighter
Starfighter

Here I am with another episode of "awk - The Magnificent"

As we saw previously, the all important generic syntax for this tool is:

       awk  options  ' selection_criteria  { action } '  input-file

                                      OR

       awk  options  ' pattern  { action } '  input-file


Each awk statement consists of a pattern (or selection_criteria) with an associated
action.

I feel that the {action} piece is what gives awk its real muscle.  However, the
selection_criteria definitely makes a significant contribution to the overall power
provided by this tool.

Patterns (selection_criteria) in awk control the execution of rules - a rule is executed
when its pattern matches the current input record.  What is a record?  Simply a line
in the file that awk is running against.

Note: A rule contains

There are different ways/approaches to constructing patterns.  I would like to
expound on these.

A summary of the kinds of patterns supported by awk are:

-   / regular expression /

expression

pattern1, pattern2

BEGIN

END

empty


/ regular expression /
-  this will match content in a record when the text of the input record fits the
     regular express

expression
- this will match content in a record when the expression is non-zero (a number) or
    non-null (a string)

parttern1, pattern2
- A pair of patterns, separated by a comma, specifying a range of records.  The
    range includes both the initial record that matches pattern1, and the final record
    that matches pattern2

BEGIN
END
-  The two keywords represent special patterns. 
-  These are NOT used to match input records.  Instead, they supply start-up or
     clean-up actions for the awk script.

empty
- The empty pattern matches every input record. 


Let's look at some simple examples of each:

/ regular expression /

Example 1:

           /trevor/
           - match every line (input record that contains "trevor"

Example 2:

            /trevor|lee|chandler/
            - match every line (input record) that contains either "trevor", "lee", or
               "chandler"

Example 3:

            /^t/
            - match every line (input record) that begins with the letter 't'

Example 4:

            /t[aei]/
           - match every line (input record) that has the pattern 'ta', 'te', or 'ti' somewhere
              on the line

Example 5:

            /^t[ou]/
            - match every line (input record) that begins with the pattern 'to' or 'tu'

I'll stop here with the examples for now.  If you have some familiarity the use of
regular expressions with other commands, you know that the patterns can get
pretty exotic!  Down the road, I do intend to get pretty exotic!

 

expression

Here, I'll simply refer you to Chetan's previous post, where he uses the pattern
NR == 2.   This is very common kind of expression, and is known as a comparison
expression.  Makes sense, huh?  We're using comparison operator (that everyone
is familar with), to compare the content of the variable NR being equal to 2.

 

pattern1, pattern2

Example:

                $1 == '2022', $1 == 2023
                -  match the first line (record) that contains '2022', and continue matching
                    all lines after that until a line containing '2023' is reached.  So, if there
                    a line with '2022' on it, and the line with '2023' is 15 lines below that,
                    the lines that are matched are:  1) the line with '2022' on it 2) the 14
                    lines following the line with '2022' on it, and finally 3) the line with
                    '2023' on it.  All this verbage is why examples are so critical!!!


I"m going to save the BEGIN/END patterns for a later post.  These two keywords
alone can demand a chapter!!!  Again, this is going to be marathon coverage!

 

empty

The empty pattern is essentially an awk statement without a pattern:

Example:

               awk  '{ print $6 }'  data
                -  this will match every line (record) in "data", printing the 6th field of each
                     line (record)

 

I won't cover actions in this post.  However, I'll leave you with this little tease:

* The purpose of the action is to tell awk what to do once a line matches a pattern

* There MUST BE curly braces { } for each action

* If there is no action specified in an awk statement, the curly braces { } can be
    omitted, and the equivalent action is ' {print $0}'

 

Okay, I'll stop here.  There's such a long way to go before this journey is complete.
I feel as though I've only taken 6 steps on this yellow-brick road!  In the next
episode, in all likelihood, I'll continue with more examples on patterns

 

 

Trevor
Starfighter Starfighter
Starfighter

I'm running a little bit behind on my posts.  No excuses, but I've been a little busy.  That's a typical excuse.  I may as well say that the dog ate my computer

Okay, this episode will be brief.  I will confinue with the "pattern" component of the awk tool.

Here's the content of my file named /tmp/sample:

trevor
lee
chandler
joseph
jerome
jessie
wayne
earl

Now for some examples:

$ awk '/[LiNuX]/  {print $0}'  /tmp/sample
jessie

You'll recall that a pattern of the form [abcde] instructs awk to match any line in the file that contains either the letter 'a', 'b', 'c', 'd', or 'e'.  So, in my actual command, the latters that are to be matched are 'L', 'i', 'N', 'u', 'X'.  So, why is there only one line appearing in the output?

There are definitely lines with the letters 'l'.  Why don't they appear in the output?  The pattern is specifying an uppercase 'L' letter.  The letter 'l' in the names "lee" and "chandler" are lowercase.  Okay, you already know the takeaway here - case is significant!

 

$ awk  ' /e/  {print  $0 }'   /tmp/sample

trevor
lee
chandler
joseph
jerome
jessie
wayne
earl

No surprises - hopefully - in the output that appears, based on this command.  The lowercase letter 'e' appears somewhere on each line in the file.

 

Now, how do I go about displaying only the lines that end with the 
letter 'e'.   Easy as jumping rope:

$  awk  ' /e$/  {print $0} '

lee
jerome
jessie
wayne

 

Okay, that's it for this episode.  I did say it would be a brief, although
I failed to mention that no heavy lifting would be required.  All that 
was featured was:

1)  case is significant
2)  The '$' is the special character to anchor expressions to the end
        of a line.  If you've completed the RH124 course, you've seen  
        this in action before

 

 

 

 

Chetan_Tiwary_
Community Manager
Community Manager

@Trevor Ah! you are back with a BANG !

To sum up :

1. Case sensitivity matters in patterns.
2. Use /[.......] to match any character within the brackets.
3. Use /e/ to match any occurrence of 'e' in a line.
4. Use /e$/ to match lines ending with 'e'.

 

Here is a simple count pattern example : 

Screenshot from 2023-10-10 01-03-47.png

Trevor
Starfighter Starfighter
Starfighter

Chetan -

I know all I have to do is to make sure I get on base,
and you'll bring me home to score a run!!!

Thank you for another sensational follow-up post!!!

Trevor
Starfighter Starfighter
Starfighter

Okay, we've looked at the all important 'pattern" component of the awk tool.  Now, let's go in another direction, and have a look at how awk can make use of variables.

What we'll see over the next few postings is that awk can make use of 3 categories of variables:
1) built-in
2) user-defined
3) shell varibles

Note:  I used the term "categories" in making reference to variables, and not the term "types", because awk doesn't have types of variables like some programming languagues and other applications have.  With awk, a variable is either a string OR number.

Built-in variables, as the name suggest, are variables that are built-in, predefined, in the awk tool.   They come ready to be used in a predefined way - a predefined utility if you will.

Built-in variables have values already defined in awk, but we can also alter those values.

Here's a list of the variables that are built into the awk tool, along with a brief explanation of what each built-in variable represents:

CONVFMT
This string controls conversion of numbers to strings). It works by
being passed, in effect, as the first argument to the sprintf function.
Its default value is "%.6g".

FS
FS is the input field separator. The value is a single-character string or
a multi-character regular expression that matches the separations
between fields in an input record. If the value is the null string (""), then
each character in the record becomes a separate field. The default
value is " ", a string consisting of a single space. As a special
exception, this value means that any sequence of spaces and tabs is a
single separator. It also causes spaces and tabs at the beginning and
end of a record to be ignored. You can set the value of FS on the command line using the `-F' option:

                     awk  -F,    'program'   input-files

OFMT
This string controls conversion of numbers to strings for printing with
the print statement. It works by being passed, in effect, as the first
argument to the sprintf function. Its default value is "%.6g".

OFS
This is the output field separator. It is output between the fields output
by a print statement. Its default value is " ", a string consisting of a
single space.

ORS
This is the output record separator. It is output at the end of every print
statement. Its default value is "\n". 

RS
This is awk's input record separator. Its default value is a string
containing a single newline character, which means that an input
record consists of a single line of text. It can also be the null string, in
which case records are separated by runs of blank lines, or a regexp,
in which case records are separated by matches of the regexp in the
input text.

SUBSEP
SUBSEP is the subscript separator. It has the default value of "\034",
and is used to separate the parts of the indices of a multi-dimensional
array. Thus, the expression foo["A", "B"] really accesses foo["A\034B"].

ARGC
ARGV
The command-line arguments available to awk programs are stored in an array called ARGV.  ARGC is the number of command-line
arguments present.  Unlike most awk arrays, ARGV is indexed from zero to ARGC - 1.

FILENAME
This is the name of the file that awk is currently reading. When no data
files are listed on the command line, awk reads from the standard
input, and FILENAME is set to "-".  FILENAME is changed each time a
new file is read

FNR
FNR is the current record number in the current file. FNR is incremented each time a new record is read (see section Explicit Input with getline). It is reinitialized to zero each time a new input file is started.

NF
NF is the number of fields in the current input record. NF is set each time a new record is read, when a new field is created, or when $0 changes (see section Examining Fields).

NR
This is the number of input records awk has processed since the beginning of the program's execution. NR is set each time a new record is read.

RLENGTH
RLENGTH is the length of the substring matched by the match function. RLENGTH is set by invoking the match function. Its value is
the length of the matched string, or -1 if no match was found.

RSTART
RSTART is the start-index in characters of the substring matched by the match function . RSTART is set by invoking the match function. Its value is the position of the string where the matched substring starts,
or zero if no match was found.


Note: Regarding the NR and FNR variables, awk simply increments both of these variables each time it reads a record, instead of setting them to the absolute value of the number of records read. This means that your program can change these variables, and their new values will be incremented for each record.  Okay, don't worry, I've got examples coming for this - and all of the other variables.  Remember, this is a marathon, and not a sprint.  Are you tired of seeing that commend in my posts


I'll conclude this post with just one example, and I'll pick on one of the more simple built-in variables:  NR

I've got a file named "post2", whose contents is shown below:

trevor lee chandler
joseph older chandler
jessie youner chandler
lonnie father chandler
laura mother chandler

As you can see, there are 5 lines in the file, with each containing 3
fields (firstname, mid-name, lastname) of information.

awk  '{ print NR }'  post2

1
2
3
4
5

As you can see, the output is simply a number, that represents the
record (i.e. line) that the awk was reading.  The first line in the file
is record 1, the second line in the file is record 2, etc.  Each time a
line is read by the awk tooll, the NR variable is incremented by 1.
Complicated, huh

Something very noteworthy that I'll mention right now is that there
is no dollar-sign ($) preceding/prepending the variable name NR
(i.e. $NR).  To simply reference the current record number, no
prepending of a '$' is needed.

Now, if you ran that same command, only this time prepending the
NR variable with a '$', notice what the output looks like:

awk  '{ print $NR }'  post2

trevor
older
chandler
(blank)
(blank)


On those last 2 lines, (blank) is not actually printed to the screen.
I simply put that there for the last 2 lines because those lines are
blank, and you wouldn't be able to those blank lines in my sample
output.

I"m going to leave it to your investigation to discover why that
output appears.  If you can see why this is the output, you should
stand-up and take a bow, pat yourself on the back, and just feel
real good about your understanding of what's going on here.  Oh
yes, there's much, much more on this cross-country journey of our
look at awk, but with there being nothing overly intuitive about why
the output is what it is, you've earned a reason to celebrate your
effort.

I'll close this episode by saying something about my mentioning
of the sprintf function, when I  provided the explanation about
the CONVFMT variable.  I simply wanted to say that there's no
need to be concerned about this item at this point.  My coverage
for that is way down the road.  Thisi is a cross-country journey,
going from East coast to West coast, and right now we're only in
North Carolina rigtht now   Let's pace ourselves!!!

 

 

 

 

 

 

 

Trevor
Starfighter Starfighter
Starfighter

Okay, this episode on awk will be very abbreviated, with a focus on the sources
of input for awk.  

awk can take its input from 3 sources:
1) a file
2) a pipe
3) standard input

All of the examples that we've looked at up to this point have shown the input
coming from a file:    awk  options  ' pattern  { action } '  input-file

If an input-file named "student" contains the line "Trevor Lee Chandler", the command

              awk  '{ print $2 }'  student

will display Lee on the screen.  Nothing new there.  By now, we're well 
acquainted with awk enough to have expected that output on the screen.

Now, let's look at an example involving an old friend - the pipe mechanism.

              cat  student  |  awk  '{ print  $2 }'  

What's the output?  You guessed it - Lee        
The information that awk is processing is the same.  The sole difference
with this command is where the information is coming from: via a pipe
vs a filename.

The last source of input for awk involves input from stdin, that will be indicated
by use of the dash/hypen character.  The dash/hyphen character ( - ) is used by awk to receive/expect its input from the keyboard (i.e. the input to awk will be
input manually by the user, and is not read from a file or provided via a pipe).
Let's look at an example:

                     awk  '{ print $2 }'   -
                     Trevor Lee Chandler       # Content that is manually input

                     Lee     # This is the output that will appear on the screen
                     Control-C    # This is used to terminate the execution of the awk
                                            command

Note:  When using the dash/hyphen character for input to awk, this will cause
           awk to  continue to expect content after each line is input.  So, in the
           example above, after "Trevor Lee Chandler" is input, the awk command
           processes that line, outputting "Lee".  You will then be provided a
           blank line, indicating that another line of input is expected, or you can
           terminate the excution of the awk command by inputting Control-C - 
           you know that old Ctrl-C key combination.  Going through an example
           on your own will certainly demonstate what I'm attempting to convey here.


To recap, the awk command can receive its input from 3 sources:
1) filename
2) pipe
3) stdin (standard input)

As promised, this would be an abbreviated episode.  That's all folks!!!

 

 

Chetan_Tiwary_
Community Manager
Community Manager

@Trevor Nicely explained ! Trying to catch up with my ultra max pro tortoise speed :

Screenshot from 2023-11-06 17-24-41.png

Here is a simple awk script for demo :

 

Screenshot from 2023-11-06 17-11-37.png

Chetan_Tiwary_
Community Manager
Community Manager

A lazy tuesday addition : compared to a similar Python or Bash script , awk here is Concise with no additional libraries or commands required.

Screenshot from 2023-11-15 01-47-41.png

Trevor
Starfighter Starfighter
Starfighter

Chetan -

A very rich example for a lazy Tuesday edition

Showing a little bit of the programmability of this
awesome tool.

Very nice!!!!

Trevor
Starfighter Starfighter
Starfighter

Chetan's last lesson was for the PhD's among you.  I'm going 
to provide one for the undergraduates.

In an earlier session, we looked at the function/purpose of the
$ (dollar sign) in awk. You will recall that it is used only when
accessing the contents of a field in a line that was read as input
by awk -- the $ is NOT used to access the value of variables, as
is the case in the shell (e.g Bash).

Okay, now on to today's feature: the FS variable. Quite often, when
a character is to be specified for this variable, it is done so on the
command line, using the -F option. Again, from a previous lesson,
we learned that the default value of the FS variable is the space
character (" ").

Let's go ahead and dig into an example to see this variable in action.
The file ("names") that I will be using for my examples contains the
following content:
First:Middle:Last
Lonnie: :Chandler
Laura: :Chandler
Joseph:Jerome:Chandler
Trevor: :Chandler
Jessie:Wayne:Chandler

Because I'm the author of the file, I know that I'm intending to
have the colon (:) serve as the character separating the
fields of information. With that being the case, we can readily
see that there are three fields of information:

Field1 -> First
Field2 -> Mid
Field3 -> Last


Example1:

$ awk '{ print $1 }' names
First:Mid:Last
Lonnie:Sr:Chandler
Laura:Ethel:Chandler
Joseph:Jerome:Chandler
Trevor:Lee:Chandler
Jessie:Wayne:Chandler

Whoa! The command is specifying that ONLY field 1 is to
be printed. What's going on here? The command is performing
exactly as it should. Using the default value of a space character
as the field separator, all of the content on each of those 5 lines
in the output represent field 1.

If each field in this file is to be referenced, a colon must be used
as the field separator.

Example 2:
$ awk -F: '{ print $1 }' names
First
Lonnie
Laura
Joseph
Trevor
Jessie

Eureka! Just what the doctor ordered! My output is comprised
only of the content in field 1.


Example3:
$ awk -F : '{ print $2 }' names
Sr
Ethel
Jerome
Lee
Wayne

Voila! Only the information in field 2 is output.

Note: Let me point out something that very subtle in this example.
Notice the position of the colon - there is a space between
the -F and the colon. This is nothing major. The only intent
here is to let you see that the character to be specified as the
field separator does not have to immediately follow the 'F'.

Okay, that will conclude this post. The coverage should be brief
because there was only one item that was featured:   Input Field Separator

shashi01
Moderator
Moderator

@Trevor 

Thank you for sharing your knowledge and perspectives. I'm eagerly looking forward to your future posts.

Chetan_Tiwary_
Community Manager
Community Manager

@Trevor What a nice significant addition and a wonderful note about  Input Field Separator.

here is continuing to this :

 

Screenshot from 2023-11-15 16-25-29.png

FPAT="[^,]+": This statement sets the field pattern (FPAT) to a regular expression that matches one or more characters that are not commas. This means that each field in the input will be separated by one or more commas.

The FPAT setting ensures that the field separator is recognized and respected.

Trevor
Starfighter Starfighter
Starfighter

In this post, I'd like to have a look at a couple of 
Regex operators used in awk, known as anchors:

  -  ^
  -  $

The ^ will match some pattern that is at the beginning of a line.

The $ will match some pattern that is at the end of a line.

In the examples that will follow, I'll use another regex operator,
but this is not what is being featured in this post.  One reason
is because we've had a look at this in a previous post.  The 
regex operator I make reference to is  [...].  As was stated
previously, this is called a bracket expression.  It's purpose is
to match any one of the characters within the square brackets.
For example, [e i E I] will match either a lowercase e, lowercase i,
uppercase E, or an uppercase I.

Okay, let's get to the examples.

The files I will be using for my examples is comprised of the 
following content:

file1:
a begins this line
e begins this line
i begins this line
o begins this line
u begins this line
A begins this line
E begins this line
I begins this line
O begins this line
U begins this line

file2:
line that ends with a
line that ends with e
line that ends with i
line that ends with o
line that ends with u
line that ends with A
line that ends with E
line that ends with I
line that ends with O
line that end with U

Example1:  In this example, the only lines that will be displayed
                     are the ones that begin with either a lowercase e,
                     a lowercase i, an uppercase E, or an uppercase I.

$ awk '/^[eiEI]'/ { print }' file1
e begins this line
i begins this line
E begins this line
I begins this line

 

Example2:  In this example, the only lines that will be displayed
                     are the ones that end with either a lowercase e,
                     a lowercase i, an uppercase E, or an uppercase I.

$ awk '/[eiEI]$'/ { print }' file2

line that ends with e
line that ends with i
line that ends with E
line that ends with I


Nothing overly challenghing, right?  And for that reason, this
will be another brief demonstration - a snack

Let me close by serviing up a command, to see where
your thinking is on the two regex operators featured in 
this posting:

awk  '/^[eiEI]$/ { print }'  some-filename

Question:  What lines do you think might be output when this
                  awk command is executed?  I'm not asking the question
                  based on either of the files that I've used in this posting.

 

Happy Thanksgiving!!!

Chetan_Tiwary_
Community Manager
Community Manager

Happy Thanksgiving @Trevor ! 

Brilliant as usual !

here is another simple usage of combining patterns or Logical Operators :

|| (or)
&& (and)
! (not)

Screenshot from 2023-11-27 16-31-24.png

Screenshot from 2023-11-27 16-31-42.png

Trevor
Starfighter Starfighter
Starfighter

In this episode of "Parsing Output with AWK", I want to talk
about functions that are built into awk. Those built-in function
fall into three categories:
- numeric
- string
- I/O

The built-in function that I want to demonstrate in this post
is in the string category: toupper(string). The "toupper" function
takes one argument - a string. The function returns a copy of the
string, with all lower-case characters converted to upper case
characters. Let's look at a short example to demonstrate the
utility of this function.

The file that will be used in my example, "schools", has the following
content:
Booker T Washington
Evan E Worthing
John H Yates
Phillis W Peters
James D Ryan
Carter G Woodson
Bennie C Elmore

$ awk '{ print $3 }' schools
Washington
Worthing
Yates
Peters
Ryan
Woodson
Elmore

No surprises in this output. The content in the 3rd colum is
displayed.

Now, let's see what the output looks like when we deploy the
"toupper" built-in function:

$ awk '{ print toupper($3) }' schools
WASHINGTON
WORTHING
YATES
PETERS
RYAN
WOODSON
ELMORE

I'm sure you saw this coming - all of the lower-case characters
in the 3rd column were converted to uppercase. The "toupper"
function performed exactly as it was described above.

Okay, that's all for this post. There will be many more to
follow, demonstrating some of the other built-in awk functions.

Chetan_Tiwary_
Community Manager
Community Manager

@Trevor Thanks for your insightful addition.  Here is a different addition from my side :

Screenshot from 2023-12-07 21-25-49.png

 

#awk '{ split($0, arr, ","); print arr[1]; }' data.txt

This command reads each line from the file "data.txt", splits the line based on the comma delimiter (",") into an array named "arr", and then prints the first element (arr[1]) of the array.

Now the second example : 

 

awk '{
> sum=0; 
> n = split($0, arr, ","); 
> for (i=1; i<=n; i++) { 
>     sum += arr[i];
>     printf "%s%s", arr[i], (i< n ? " + " :"");
> } 
> printf " = %d\n", sum; 
> }' data.txt

 

 

sum=0: Initializes a variable sum to store the running total.
n = split($0, arr, ","): Splits the line and stores the number of elements in n.

for (i=1; i<=n; i++) { ... }:

This loop iterates through each element in the arr array:
i: This is the loop counter, starting from 1 and iterating until it reaches the value of n.
sum += arr[i];   : This statement adds the current element of the arr array (arr[i]) to the running total (sum).
printf "%s%s", arr[i], (i< n ? " + " :"");   :

This statement prints the current element (arr[i]) and a "+" symbol if it's not the last element:


%s: This format specifier tells printf to print a string.
(i< n ? " + " :"") : This is a ternary operator that checks if i is less than n. If it is, it prints a "+" symbol; otherwise, it prints an empty string.
printf " = %d\n", sum;   :

This statement prints the final sum stored in the variable sum:
%d: This format specifier tells printf to print an integer.
\n: This prints a newline character.

Chetan_Tiwary_
Community Manager
Community Manager

Ok ! Lazy Tuesday addition - simple mathematics operators in a tab separated data fields and to skim the inet line and print only the IP address :

Chetan_Tiwary__0-1703013572423.png

Chetan_Tiwary__1-1703013859894.png

That's all - next item on another lazy day :xD

Andres_Tarallo
Flight Engineer
Flight Engineer

AWK is a tool many times overlooked. Can be a better option to a PERL 'One Liner" or Python script

Chetan_Tiwary_
Community Manager
Community Manager

@Andres_Tarallo Yes ,awk can do most of the data filtering that perl can do - Also it is simpler than python for one liners IMHO. 

 

Andres_Tarallo
Flight Engineer
Flight Engineer
I Agree, but nowadays seems a sort of "Forgotten Art/Craft". I started working in the early 90s. Back then AWK/SED/GREP was the usual tools for most automation tasks on shellscripts.
Chetan_Tiwary_
Community Manager
Community Manager

good to know that @Andres_Tarallo !

Pavan87
Cadet
Cadet

Such a wonderful explanation with demos about AWK. Much appreciated. 

About the Author
I first touched a PC in 1984. I really got hooked when I saw that I could actually send messages to another computer by connecting to the telephone line via this device called a modem. This truly fascinated me. My next fascination was writing programs, using a language called "Basic". I'm sure one of the reasons I was so fascinated with writing programs on the PC was because the only programming I had done was using Fortran, via the keypunch machine, and submitting a stack of cards to the batch operator. My how the technology has advanced since then. I don't recall the year I began my journey with UNIX, but I do recall that it was on an AT&T 3B15 minicomputer. All I ever saw was a dumb (kinda like me about UNIX at the time) terminal. From there, I graduated to Sun Microsystems offerings of UNIX - SunOS and Solaris. Yeah, I was a certified SCSA and SCNA. I worked hard for those certifications. I was so proud. I carried the flag for Sun back then, like I carry the flag for Red Hat now. I've also used IBM's AIX brand of UNIX, and HP's HP-UX brand of UNIX. I don't recall the first Linux I got my hands on, nor am I sure of the year I took the plunge. I do have some CDs with Red Hat Linux 4 on them. Well, I've always been one to hitch my wagon to the leader of a particular technology. In the world of Linux, I think it's safe to say that Red Hat is the undisputed king of that world. When Red Hat introduced their Red Hat Academy, I made a mad dash to get on board. However, the cost was just a little bit prohibitive for my institution's budget. Still, I never took my eye off of using Red Hat Linux, or becoming a Red Hat Academy. When I discovered that the cost to become a Red Hat Academy had been removed, I went into high gear to establish the partnership. I've been proudly associated with the Red Hat Academy now for...I forget. One thing I haven't forgotten, is that I'm a self-appointed, self-anointed evangelist for this program. The learning opportunities, along with the support the Red Hat Academy provides to students and instructors is exceptional!