Tuesday, April 29, 2008

learn Perl

http://www.gossland.com/course/index.htmlwant to learn perl languague in windows?
i went to the site and started reading this. really useful one for us to read.i have pasted the full course about this really useful one. but it is quite large page please read it.

http://www.gossland.com/course/index.html
Introductory Perl Tutorial Course for Windows
This introductory Perl tutorial course for Windows will introduce you to the beginning concepts of Perl in a familiar Windows environment and show you how to set it up for CGI with Microsoft's Personal Web Server or Internet Information Server.
I wrote this course as an accompaniment to a classroom course in Perl, but now this website is visited by Perl students from around the world.
In addition to the basic Perl content, an appendix contains instructions on installing Perl and Microsoft's Personal Web Server on your Windows PC.
If you are following this course on your own, please follow it in sequence. The ideas are presented carefully so that each section builds on the sections before. Please email me at info234@gossland.com if you have comments or questions about the content.
Course Prerequisites
A student should be familiar with basic operation of a computer, how to open a DOS session, how to download and install a software package. Familiarity with at least one other programming language is not required, but it would be a definite asset. Since this course is directed at CGI programming, it is understood that the student should have a good working knowledge of HTML.
Note on typeface conventions: Throughout this site, explanatory text is written in this font. Sometime it is indented to set off the sample text. This font is descriptive rather than rigorous, so it is not "ready to run" as a perl script. Don't expect to run text in this font without getting errors.
On the other hand,# This type of text represents real perl code that# is intended to be copied from the browser# and pasted into your perl scripts. Each block# should run as is.
Also, later in the course some of the perl scripts will pop up in their own windows. You can save these as text files, or copy and paste the window contents into text files and then save them. Be aware that these are actually HTML pages and not plain text, so be careful when you save them that they turn out as just plain text.
Course Outline:
1. Introduction to Perl
What's Perl?Task ExamplesRunning ItVariables and TypesFlow ControlSome Simple Scripts
2. Pattern Matching
Introduction to MatchingThe Binding OperatorRegular ExpressionsExamplesSubstitutionAdvanced Ideas
3. Functions
Introduction to FunctionsString FunctionsArray FunctionsTest FunctionsUser Functions
4. Files
Introduction to Input and OutputStandard Output, STDOUTWriting to FilesStandard input, STDINReading from FilesReading DirectoriesEditing File ContentsRecursive Editing
5. Introduction to CGI
Introduction to CGIFirst Page to BrowserPrinting HTMLServing HTML FilesServing Edited Files
6. The CGI Module
Introduction to GET and POSTThe CGI Module, CGI.pmHandling FormsHandling Form ErrorsHandling Fatal Errors
7. CGI In Use
CGI In UseSite SearchPresenting Data from a DatabaseHandling CookiesUploading Files to the ServerRequesting Info from other Servers
Appendix:
Perl Installation
Personal Web Server Installation
Perl via Personal Web Server
Perl Resources
Chapter 1: Introduction
What's Perl > Task Examples > Running Perl > Variables & Types > Flow Control > Some Simple Scripts
What is Perl and why is it so great?
Perl is a general purpose computer language ideally suited to handling words and text. Perl is admired for staying simple when you need to do simple things, but having the power to do nearly every chore that a system administrator might require. To quote the Perl gurus, "It makes the easy things easy, and the difficult things possible."
It's really a great language because once you get to know it you can express yourself very quickly and easily. Perl has a wonderful way about it that allows you to do a lot by writing very little. It dispenses with a lot of programming "clutter" that accompanies many other programming languages. This ability allows you to write things in a very natural and concise manner.
Another of Perl's strengths is that there are usually many ways of accomplishing any particular bit of programming. This strength inspires Perl programmers to quote the motto "TIMTOWTDI" (pronounced "timtoady") - the acronym of "There Is More Than One Way To Do It." This variety leads to different personal styles and preferences, as well as a great deal of fun, comfort, and joyful discovery as you learn the ins and outs of the language.
Chapter 1: Introduction
What's Perl > Task Examples > Running Perl > Variables & Types > Flow Control > Some Simple Scripts
Examples of Perl tasks
Perl is particularly well suited for gathering and manipulating text. This is why it is a favorite with system administrators as well as HTML authors. The following examples only begin to show how perl can be useful to a programmer.
Web page serving through CGI
Of course, perl is used extensively in serving up content when run in concert with a webserver. The webserver knows that certain URL's are meant to run perl scripts, and when it finds those URL's it runs the perl scripts for you automatically. The perl program creates output in the form of a webpage that in turn gets sent to the browser. The user sees a page created on the fly just for that request.
Talking to web sites and reporting on bits of info
Perl can go and visit a website for you, without you ever using a browser. It can pull in the whole webpage, and hunt through it for bits of information. This info can then be repackaged into whatever form you need it: a different web-page, a printed report, or a piece of email.
The converse of this is that it can submit forms for you and interact with webservers at the other end. An example of this is an e-commerce transaction server; perl can send specific information to a payment server and then await the result to see whether the payment went through properly or not.
Lost cat finder
Okay, we lost our cat last year. The kids were really upset, and we were all distraught. Then I realized I could get perl to help me find the cat. I was able to write a perl script that went out over the Internet and collected and printed the name, address and phone number of all the houses on the streets in my neighborhood. With that list, we were then able to call every house one after the other.
This is a concrete instance of the example above - talking to web sites and assembling a report.
Web page construction
Perl can be used to construct web pages according to your own defined rules. As a matter of fact, the navigation links of this website were built with just such a tool. All the menus down the left hand side on the entire site are created by a perl script, and they can be updated to a completely different layout in a matter of minutes.
Web page editing
Perl scripts can make global changes to web pages. With the right script, presto-chango! wherever a particular phrase is found, it gets replaced with improved wording. Very broad yet precise changes are possible.
Database Management
Perl comes with the capability to interact with online "backend" databases -- the massive storehouses of information running behind the scenes on large websites. It can search for database records containing particular pieces of information and it can add new records or update existing records. With the ability to interact with a user through a web browser, perl can act as the interface between the user and the database running on the server.
To start getting perl to do the things you want, find out how to run perl on your computer in the next section.
Chapter 1: Introduction
What's Perl > Task Examples > Running Perl > Variables & Types > Flow Control > Some Simple Scripts
Running perl scripts
Before you can run Perl, you have to install it on your system. If you haven't done so already, go to the Perl installation page for more instructions. At this stage, you don't have to do the the Personal Web Server or Perl via PWS installations and you should leave them for a later lesson. Installing Perl should be quite quick and simple, so go ahead and get it in place. Make sure you reboot before continuing.
For the first part of this course, we'll be running perl scripts in a DOS window. This means we're going to be typing words in. Yes indeed, a command line environment instead of a Windows GUI! Yuck or yay, depending on your point of view.
Once you've installed perl and rebooted into Windows, bring up a DOS window. It doesn't matter what directory you are in for now. Just type in:perl -v
and you should get back a screenful of version information from perl. If you get a "bad command or filename" message, Perl isn't properly installed. See if you can fix it.
Running your first Perl script: Hello World!
Now it's time to feel like you're getting somewhere. You're going to write your first perl script. Go back to your DOS prompt and this time, make sure you're working in a directory where you can edit a file or two.
Use notepad or some simple text editor (not MS Word!) to create a file with this line in it. You can cut and paste it directly from here:print "hello world\n";
Save the file as "hello.pl" and turn to your DOS session and type:perl hello.pl
The screen should showhello world
right underneath.
Congratulations, you've just run your first perl script!
Now you need to learn a bit more about the language so you can create something more interesting.
Chapter 1: Introduction
What's Perl > Task Examples > Running Perl > Variables & Types >Flow Control > Some Simple Scripts
Variables and Types
This section introduces the idea of variables. Perl uses variables to represent things that can take different values, kind of like high school algebra where you let x represent an unknown. Variables are automatically of a certain type; here we'll just talk about the three main types: scalars, arrays, and hashes.
Scalars. A scalar is a single piece of information. Scalars can hold numbers or text. Scalars are given names (just like the unknown x is in algebra) but these names always must start with a $ sign (for scalar). The name may not contain spaces but may have many letters. The following are all scalar variables with a value assigned:$x = 10$value = $x + 1$number_of_items = 15$word = "hello"$text = "This is a sentence but is still a scalar"
Arrays. Arrays hold multiple pieces of information that can all be referrred to at once. Arrays can hold numbers or text and their names always start with a @ sign (for array). They can have as many values within them as you'd like. Here are some examples of arrays:@array = ( 1, 2 )@words = ( "first", "second", "third" )@values = ( $x, $y, 3, 5)
Once you've got an array, you can get the values from it by referring to the element in the list with a subscript in square parentheses. Subscripts start at 0, not 1. Notice that since you are extracting a single value out of it, you are referring to a scalar, and therefore you change the prefix to a $. Given the examples above:$array[0] has the value 1$array[1] has the value 2$words[0] has the value "first"
and so on.
You should also know that Perl can do assignments to individual variables simultaneously if they are put into a "list context", i.e., put into parentheses. So($x, $y, $z ) = ( 1, 2, 3);
would assign the values 1, 2, 3 to $x, $y, $z respectively. This "list context" idea comes up frequently in Perl.
Hashes. Hashes a really great part of perl, and they are extremely useful in practice. Hashes are just special arrays. They are special because instead of having numerical indexes into the elements, like [0], they have words as the indexes. They are also written in a slightly different way, with curly braces instead of square brackets. The curly braces suggest that they are fancy arrays.
To make a hash element, you just define it, using a key and value pair:$servings{pizza} = 30;$servings{coke} = 40;$servings{spumoni} = 12;
The keys above are "pizza", "coke", and "spumoni", and the values are 30, 40, and 12. You could use strings for values too:$occupation{Jeff} = "manager";$occupation{Martha} = "interior designer";
Here the keys are Jeff and Martha, and the values are manager and interior designer.
If you want to refer to the hash itself, you use a % sign, so these hashes would be %servings and %occupation.
You will see hashes used a great deal in other people's Perl scripts and you will find many uses for them yourself too.
The default variable. Perl provides a default variable designated with an underscore symbol: $_. Just as this name suggests, it is a scalar variable with the name "underscore". This variable is used whenever a variable is required, but where you're too lazy to bother specifying one.
As a specific example, suppose this default variable, $_, has the value of "hello world". Since the print statement normally requires a variable to print, if you leave it out the default variable will be used. So if you write:print;
then you'll gethello world
Nifty, huh? This is part of perl culture. Once you know what's going on, you can use fewer words to get more done.
Chapter 1: Introduction
What's Perl > Task Examples > Running Perl > Variables & Types >Flow Control > Some Simple Scripts
Flow Control
While perl scripts tend to run from the beginning to the end, they have to decide on the correct actions to take along the way. Under some conditions, the script might run through a particular path, but under other conditions it might run through a completely different path. The decision-making that controls which parts of the program run is called flow control.
If-Else
The if-else combination is one of the most important control statements in any programming language. The idea is that if a condition is true, do one thing, and if it's not true, do something else. An example will show how it works:$myname = "Mike";if ( $myname eq "Mike" ) { print "Hi Mike\n";} else { print "You're not Mike!\n";}
While
A "while loop" is kind of like a repeated if statement. As long as a condition remains true, an action is repeated over and over. Once the condition is not true, the program moves on. Here's an example:$x = 0;while ( $x < x =" $x" x =" $x" colors =" (" href="http://www.gossland.com/course/intro/index.html">What's Perl > Task Examples > Running Perl > Variables & Types >Flow Control > Some Simple Scripts
Some Simple Perl Scripts
To get an idea of how Perl works, we'll finish off the first lesson with some simple Perl scripts. We'll build on the items you've learned earlier: scalars and arrays, the if-else, while and for constructs, and the print statement.
Since we're writing real perl programs here, there are a few more things you should know about.
Statements
Statements are complete thoughts in perl, just like sentences. In English, sentences end with a period. In Perl, statements must end with a semi-colon. This is for good reason — in perl the period is used for something else.
Comments
If you've ever had the nerve to scribble notes in the margin of a novel, you'll know what comments are for.
In perl, comments are words within the program that perl itself ignores. They are written by you for your own benefit. They help explain the program so when you look at it later you know what the heck is going on. In perl, comments are set off with the "#" character. Anything following the # to the end of the line is a comment. For example: #This illustrates a comment. #Note there is a semi-colon on the next lineprint "The quick brown fox jumped over the lazy dog\n";
There is another form of comment which is very useful when you want to chop out large sections of code. Borrowing a technique from Perl's embedded documentation, I like to use it in this way: =comment until cut$variable = 1;print "The variable has the value of $variable\n";...... =cut
Commenting out large sections like this can be extremely helpful.
The Newline Character
In the examples leading up to this section, I've used the print statement a lot. Each time, I've added the funny ending "\n" onto the print statement. This odd combo of \n is the newline character. Without it, the print statement would not go on to the next line.
Many people are surprised to learn you have to tell the program to go to a new line each time. But if it did this all by itself every time, then you could never print out complete lines a bit at a time, like this: $num_words = "eight";print "There are ";print $num_words;print " words altogether in this sentence.\n";
Instead the output would be: There areeight words altogether in this sentence.
Short Forms
In perl, some operations are so common they have a shortened form that saves time. These may be strange to the novice, so I'll be careful here. Where appropriate I'll make a comment near the statement to describe the short form.
Programming Errors
Programming errors? Already? Yes. Programming errors, also affectionately known as bugs, are a fact of programming life. You'll be encountering them soon enough.
If you have a mistake in your perl script that makes your meaning unclear, you'll get a message from the perl compiler when you try to run it. To check for these errors before running, you can run perl with the -c flag. And turn on the warnings flag, -w, while you're at it. This picks up hard to find errors too. As an example you'd type in: perl -wc hello.pl
to check on the health of the perl script, hello.pl. If there are any syntax errors, you'll hear about them alright!
Running the example scripts
You can copy any of the following script examples into a file in Notepad and save it as, say, perltest.pl. Then check it for errors by typing perl -wc perltest.pl
If it comes back saying "syntax ok", then go ahead and run it by typing perl perltest.pl
If it doesn't say "syntax ok", then go and fix the reported error, and try again.
Script 1: Adding the numbers 1 to 100, Version 1 $top_number = 100;$x = 1;$total = 0;while ( $x <= $top_number ) { $total = $total + $x; # short form: $total += $x; $x += 1; # do you follow this short form?} print "The total from 1 to $top_number is $total\n"; Script 2: Adding the numbers 1 to 100. Version 2 This script uses a form of the for loop to go through the integers 1 through 100: $total = 0; #the for loop gives $x the value of all the #numbers from 1 to 100; for $x ( 1 .. 100 ) { $total += $x; # again, the short form } print "The total from 1 to 100 is $total\n"; Script 3: Printing a menu This script uses an array to store flavors. It also uses a terrific form of the for loop to go through them. @flavors = ( "vanilla", "chocolate", "strawberry" ); for $flavor ( @flavors ) { print "We have $flavor milkshakes\n";} print "They are 2.95 each\n";print "Please email your order for home delivery\n"; Script 4: Going one way or the other: This allows you to program in a word to make a decision. The "ne" in the if statement stands for "not equal" and is used to compare text. The "die" statement shows you a way to get out of a program when you're in trouble. #You can program answer to be heads or tails$answer = "heads"; if ( $answer ne "heads" and $answer ne "tails" ) { die "Answer has a bad value: $answer!";} print "Answer is programmed to be $answer.\n"; if ( $answer eq "heads" ) { print "HEADS! you WON!\n";} else { print "TAILS?! you lost. Try your coding again!\n";} Script 5: Going one way or the other, interactively: This allows you to type in a word to make a decision. A shameless sneak peek at the next lesson on input and output. allows us to read a word from the keyboard, and "chomp" is a function to remove the newline that's attached to our answer after hitting the carriage return. print "Please type in either heads or tails: "; #The is the way to read keyboard input$answer = ;chomp $answer; while ( $answer ne "heads" and $answer ne "tails" ) { print "I asked you to type heads or tails. Please do so: "; $answer = ; chomp $answer;} print "Thanks. You chose $answer.\n";print "Hit enter key to continue: "; #This line is here to pause the script#until you hit the carriage return#but the input is never used for anything.$_ = ; if ( $answer eq "heads" ) { print "HEADS! you WON!\n";} else { print "TAILS?! you lost. Try again!\n";}
If time permits, we'll play around with some other simple examples. If you've written one you like, email it to me at info234@gossland.com.
Chapter 2: Matching and Substitution in Perl
Introduction > Binding Operator > Regular Expressions >Examples > Substitution > Advanced Ideas
Introduction to Matching
The ease and power of Perl's pattern matching is one its true strengths and a big reason why Perl is as popular as it is. Almost every script you write in Perl will have some kind of pattern matching operation because so often you want to seek something out, and then take an action when you find it.
Matching and substitution are very important because this is how you do editing "on the fly". This is how you create content customized to your Web visitor. You need to be able to open HTML templates and swap in information pertaining to your visitor. Matching and then substituting is just the way to do it.
Also in many other administrative tasks, such as searching through log files or web pages for particular words or sequences, pattern matching is the way to go. Take your web stats for example. All the web stats for your site's traffic are built from data recorded in the server log files. While not all stats programs are based on Perl, you can be sure that they are all looking through the log files for certain patterns of text, and this is the sort of thing that Perl excels at.
Pattern matching, in Perl at least, is the process of looking through sections of text for particular words, letters-within-words, character sequences, numbers, strings of numbers, html tags, what have you... and working with them.
In general, whatever you are seeking can be represented as a text pattern, whether it is a very explicit one, like looking for a specific word like "goodness", or a much more general one, like looking for a North American phone number: three digits, three digits, then four digits.
These more complicated search expressions fall into the category of "regular expressions". This is an extremely important part of Perl and we will devote the rest of this class to this topic.
Chapter 2: Matching and Substitution in Perl
Introduction > Binding Operator > Regular Expressions >Examples > Substitution > Advanced Ideas
The Binding Operator
When you do a pattern match, you need three things:
· the text you are searching through
· the pattern you are looking for
· a way of linking the pattern with the searched text
As a simple example, let's say you want to see whether a string variable has the value of "success". Here's how you could write the problem in Perl:$word = "success"; if ( $word =~ m/success/ ) { print "Found success\n";} else { print "Did not find success\n";}
There are two things to note here.
First, the "=~" construct, called the binding operator, is what binds the string being searched with the pattern that specifies the search. The binding operator links these two together and causes the search to take place.
Next, the "m/success/" construct is the matching operator, m//, in action. The "m" stands for matching to make it easy to remember. The slash characters here are the "delimiters". They surround the specified pattern. In m/success/, the matching operator is looking for a match of the letter sequence: success.
Generally, the value of the matching statement returns 1 if there was a match, and 0 if there wasn't. In this example, the pattern matches so the returned value is 1, which is a true value for the if statement, so the string "Found success\n" is printed.
Chapter 2: Matching and Substitution in Perl
Introduction > Binding Operator > Regular Expressions >Examples > Substitution > Advanced Ideas
Regular Expressions
In the previous example, the expression /success/ is a very simple example of a more general concept of the "regular expression".
All pattern matching in Perl is based on this concept of regular expressions. Regular expressions are an important part of computer science, and entire books are devoted to the topic. Regular expressions form a standard way of expressing almost any text pattern unambiguously.
Luckily, for our purposes regular expressions can start out simple. Once you have these key concepts mastered, you'll be able to find out and learn more about them on your own through many online resources. One online resource to note is file:///C:/Perl/html/lib/Pod/perlop.html, which is your local copy of the perlop (perl operators) documentation. This section describes the matching and substitution operators in detail. Look in particular at m// and s///. Another online resource to look at is file:///C:/Perl/html/lib/Pod/perlre.html which dicusses regular expressions in great detail.
The power of regular expressions starts to become clear when you discover they can represent words and phrases but also far more general patterns of text.
Note In many of the following examples of pattern matching, only the pattern match is shown. If you put any of these patterns into effect you still have to use a variable and the binding operator. Usually. ;-)
Plain Character Expressions
Many letters and characters can represent themselves in a matching pattern, so often just the plain word by itself will act as a regular expression. E.g. /success/, /failure/, and /nearly all plain text/ are all pattern matches that are very straightforward in their meaning.
It's important to note that without any additional qualification, these search patterns can occur anywhere in the string being searched, so /success/ would match any of the strings: "success", "This sentence contains success", and "unsuccessful". Many times, plain vanilla search patterns like this are adequate for the job. Virtually any plain English word without any punctuation can be used as a regular expression to represent itself as a search pattern.
Special Characters
Some special characters or combinations of characters have a special meaning and do not represent themselves. This is what give regular expressions their power. For example, the lowly period does not stand for a period in a match. Instead, it stands for any character.
The pattern /b.g/ would match "bag", "big", "bug", etc, as well as any other sequence: "b2g", "b&g", "b]g" and so on. It would match "b.g" itself, where . does represent a period. /b.g/ would also match longer expressions: "bigger", "bug swatter".
Matching simply means "found somewhere, anywhere, within the searched string". You can use special characters to specify the position where the search pattern must be located.
A ^ character stands for the beginning of the searched string, so:
/^success/ would match "success" but not "unsuccessful".
A $ character stands for the end of the searched string, so:
/success$/ would match "unsuccess" but not "successful".
Using both ^ and $ together nails the pattern down at both ends, so:
/^success$/ will only match the exact string "success".
Other special characters include:
\ - a form of a "quote" character - alternation, used for "or'ing"() - grouping matched elements[] - character class
The first character, "\", is used in combination with special letters to take away their special meaning. E.g.:
\. will match a period\$ will match a dollar sign\^ will match a caret\\ will match a backslash
and so on.
The pipe symbol "" is used to provide alternatives:
/goodbad/ will match either "good vibes" or "bad karma".
The parentheses group matched elements, so
/(goodbad) example/
is the same as searching simultaneously for
/good example/ or/bad example/
Without the ()'s, this would be the same as searching simultaneously for
/good/ or/bad example/
The square brackets indicate a class of characters, so
/^[abcdefg]/ would match any strings beginning with the letters a through g. This can also be written in shorthand as /^[a-g]/.
Special Backslash Combination Characters:
The backslash character is not just used to "quote metacharacters" (in other words to remove their special meaning) as above. It is also used in conjunction with non-special characters to give them a special meaning. For instance
\t is a tab character\n is a newline character\d is any digit\D is any non-digit\s is a whitespace character\S is any nonwhitespace character
You'll find yourself using these backslash combinations a lot in practice.
Repetition Characters
The expressions above show you how to match certain characters, but they don't allow you to control how many matches should be made at once. Matching repetition is controlled by a few other special characters:
+ means 1 or more matches* means 0 or more matches? means 0 or 1 matches{n} exactly n matches{m,n} m to n matches
The best way to learn Regular Expressions is by example, so let's go on to see how these amazing things can be put to work together in the next section.
Chapter 2: Matching and Substitution in Perl
Introduction > Binding Operator > Regular Expressions >Examples > Substitution > Advanced Ideas
Examples of Regular Expressions
To try out what we learned in Regular Expressions it would be good to see a few pattern matches in practice to see how they go together.
Regular Expression
Meaning


/a.c/
the letter a followed by any character then c
/a+c/
one or more a's followed by c
/a*c/
zero or more a's followed by c, so even "c" matches.
/a?c/
zero or one a followed by c: "ac" or "c"
/a.+c/
a followed by one or more characters, then c
/a.*c/
a followed by zero or more characters, then c, so even "ac" matches.
/abc/
"a" or "bc"
/(ab)c/
"ac" or "bc"
/(ab)+c/
one or more a's or b's, followed by c:ac, bc, aac, abc, aaac, abbabababbac.
/(aA)\ssample\smatch/
"A" or "a" followed by one whitespace character, then "sample", then one whitespace character, then "match".
/\d\d\d-\d\d\d-\d\d\d\d/
Any phone number like this: 250-123-1234
/\(\d\d\d\)\s\d\d\d\-\d\d\d\d/
Any North American phone number like this: (250) 123-1234
/\(\d{3}\)\s\d{3}-\d{4}/
As above, but using the count specifier
//
An html title tag, with any title. The .* match would include the > and <\ characters / tag needs a backslash quote in front of the slash, \/, to prevent the slash from being taken as the end of the pattern The .* match would include only the title text, not the > and Introduction > Binding Operator > Regular Expressions >Examples > Substitution > Advanced Ideas
Substituting New Text For Old
Perl's can substitute text just as easily as it can match it, but instead of using the plain matching operator m//, you use the substitution operator, s///.
When a match is made, Perl knows which characters matched, and it sets up built-in variables to point at the starting position and the ending position of the match in the searched string. For example, if you had:
$text = "Here is some text"
and you did a match on the regular expression /some.*/, like this:
$text =~ m/some.*/
then Perl would know that the matched string was "some text", and it would know that the match started at the 9th character and ended at the 17th.
When you use the substitution operator, s///, Perl uses that positional information to know which characters to replace with the substitution text.
Simple substitution
The substitution operator has two specifications: on the left, the matching regular expression like the matching operator, and on the right, the substitution value.
Let's say you wanted to change the first occurence of the word dog into cat in the string variable $story. This is simple:
$story =~ s/dog/cat/;
Your substituted string does not have to be the same length as the matched string. You could put in more letters or fewer:
s/a short phrase/a much longer phrase/s/1999/MCMXCIX/s/Twentyfirst century/21st century/
You can also look for a more abstract pattern and replace it. Let's say you wanted to edit any 3 digits and replace them with dummy values, of nnn. You could use a substitution operator like this:
s/\d\d\d/nnn/;
This would take any sequence of 3 digits and replace replace them with the letters "nnn".
Let's say you wanted to edit all phone numbers and replace them with dummy values. You could use a substitution operator like this:
s/\d{3}-\d{3}-\d{4}/123-123-1234/;
This would take any sequence of 3 digits, minus sign, 3 digits, minus sign, and 4 digits and replace that phone number with 123-123-1234.
For any of the matching expressions in the table on the previous page of examples, we could just as easily have specified some text to replace it with, by using the s/// operator instead of just the m// operator.
Deletion
You can use s/// for deleting things too. Just use an empty value for the substitution. Here's how you might delete an html comment consisting of everything between :
s///;
Isn't that amazingly simple?!
Remembering Matched Values
Suppose you wanted to match on something and modify it, but re-use part of what you matched on. Let's say you wanted to replace an occurrence of boys with boyz or girls with girlz. You could do this in separate passes like this:
s/boys/boyz/;s/girls/girlz/;
or alternatively, since TIMTOWTDI, you could match on either boy or girl, and remember what it was you matched on, like this:
s/(boygirl)s/$1z/;
The $1 is called a positional parameter, and it is an internal variable maintained automatically by Perl to represent whatever was matched within the brackets of the search expression. Here, we are looking for either boy or girl followed by an s. We want to replace it by whatever we find, with a z substituted for the s. The $1 parameter will remember whichever word matches and will put it in the substitution.
You can remember more than one matching expression. In fact you can remember up to 9 expressions in the variables the $1 through $9. I have never had occasion to go past $4, so 9 variables is probably more than enough.
As an example of remembering two matches consider this method of getting rid of potential visitors:
s/(dogcat)s are (invitedwelcome)/$1s are not $2/;
Note that $1 represents either dog or cat, whichever was found, and $2 represents either invited or welcome, whichever was found. Note also that Perl is smart enough to know that the string "$1s" means the $1 variable followed by an "s". It does not get confused into thinking you meant a variable with the name of $1s.
If you have absorbed this and want more then go on to the next page. Otherwise, email me with questions.
Chapter 2: Matching and Substitution in Perl
Introduction > Binding Operator > Regular Expressions >Examples > Substitution > Advanced Ideas
Advanced Matching Ideas
Now you have seen the basics of pattern matching and substitution. If you are not too overwhelmed by these regular expressions, you can already see why Perl is so well liked by its proponents. Consider doing these same jobs with a language that lacks regular expressions!
Regular expressions are so useful that most modern languages have them available nowadays. Python has them built in, Javascript has added regular expressions in newer versions, and Visual Basic has access to the RegExp object when you add Microsoft VBScript Regular Expressions to your project as a reference.
While we have made a decent start of an introduction to matching and substitution, you are not ready to go out into the real world. I've been holding a few more advanced ideas back from you while you get your feet wet.
Greediness in Matching
The match repetition characters, * and +, are the greediest they can be when trying to make a match. That is, let's say you have a string like, "The word twice is in this sentence twice". Let's say you want to match all the characters from the beginning of the line to the end of the word twice. So you use the matching expression
m/^.*twice/
Which of the two instances of the word twice would it match? The answer is the second. It would match the sentence line all the way to the end. This is because, by default the matching operators are greedy. They'll take all they can get and extend the match as far as possible within the searched string.
If you want to curb this behaviour and make them match on the first occurrence instead, then you must use the greed-inhibiting character, "?". Putting a "?" right after a repetition character means "don't be greedy". So in this case we'd change our matching statement to
m/^.*?twice/
and it would just match on the part, "The word twice". In practice, managing the greediness of these matching characters is very useful indeed.
Case-Insensitive Matching
What if I'm looking for a particular character string, and I don't care whether they are in upper or lower case? Say I'm looking for "student" or "Student" or "STUDENT". Do I have to do this?
/(sS)tudent/ or even worse:
/(sS)(tT)(uU)(dD)(eE)(nN)(tT)/
By now, you should start to think this is too awkward for Perl. Of course, TIMTOWTDI! There's a special option that allows you to do case insensitive matches.
/student/i
is all it takes. "i" means "insensitive".
Matching and Substitution using Variables
In all the examples I've shown so far, the matching pattern has been hard-coded in, and could not be altered during a program. However, there's no reason why you can't assign a matching pattern to a variable, and then use the variable instead. For example, if you were looking for cats or dogs you could do this:
$text_string = "I've had cats, dogs and birds";
for $animal ( "cat", "dog", "bird" ) {
if ( $text_string =~ /$animal/ ) {
print "Found a $animal\n";
}
}
You can use variables for the pattern match quite freely, but keep in mind they really do act as search patterns, not as defined strings. For instance if your variable was
$match = "yes?"
then
$question =~ /$match/
would specify a search like /yes?/ and you'd be looking for "ye" followed by 0 or 1 occurences of s, not the literal string ""yes?" with a question mark.
Global Matching and Substitution
All of the pattern matching and substitution operations we've done so far have a weakness you'd soon discover in practice. They only operate on the first match found.
To allow you to work your way through the entire text to be searched, Perl offers the "global" option for matching and substition, specified by putting a "g" behind the final /. For example:
$story =~ m/Harry Potter/g;
would match all instances of "Harry Potter" in the story, and:
$story =~ s/Harry Potter/Larry Wall/g;
would turn the hero from Harry Potter into the hero from Perl in the whole of the Sorcerer's Stone. The /g option can be used with any matching operation.
Matching across newlines
If you were to try these pattern substitutions on large blocks of text that include multiple lines, you'd be in for confusion and disappointment when a number of the matches didn't work. This is because of the behaviour of the period metacharacter "." By default, it matches any character except the newline character. Therefore the far-ranging matching operation ".*" will only match up to the first newline character encountered. But what if you need to match something that starts on one line and spans several lines before it stops, like a long HTML comment or table? The answer is to use the /s modifier, which mnemonically means to "treat string as a single line". This changes the behaviour of "." to match newline characters as well. For example:
$html = "

Here is some content>";
In order to match the beginning of this comment to the end, we add the /s modifier like this:
$html =~ s///s;
Without the /s, it wouldn't match at all.
Different Delimiters
Another of Perl's strengths is that the language will help out when things get ugly for you. You can use different delimiters in the matching expression when you have a need for it.
For example, file paths typically contain many /'s. (Even on a DOS and Windows machine, path separators are /'s and not \'s.)
If you are matching on something that contains a "/" such as a path, you could be looking at something ugly. Let's say you wanted to change a file path of a text file in a particular directory, c:/web/cgi-bin/, and move it to the d: drive. You'd have to back-quote the / characters with \ and use \/ for the path separators. So,you'd have to write:
s/c:(\/web\/cgi-bin\/.*?.txt)/d:$1/
Instead of this mess, you can change the delimiter character to anything you'd like, as long as it's repeated at the beginning and then at the end. Here we use the # character - a.k.a the pound sign, number sign, hash, or the octothorp - like this:
s#c:(/web/cgi-bin/.*?.txt)#d:$1#
There is another useful variation on this idea. You can use different pairs of delimiters around each section. This makes it easy to split up your matching and substitution expressions to keep things neater.
s{c:(/web/cgi-bin/.*?.txt)}{d:$1}
You don't even need to use the same delimiter, and you can put the two parts on separate lines, so you could write:
s{c:(/web/cgi-bin/.*?.txt)} (d:$1)
Negative Matching
In some cases you are more interested in whether a pattern does not match a string rather than that it does. In this case you could write
if ( ! $string =~ m/search text/ ) ...
but as usual, Perl makes it easier for you and offers you more than one way to do it. In this case, there's the "negative" binding operator, !~, so you could write this:
if ( $string !~ m/search text/ ) ...
So, if you see this operator in your travels you'll know you've seen it before.
Default Matching
Matching is such a common operation in Perl that you can dispense with the "m" at the front of it. Everybody, including perl itself, knows what you mean if it's left out. So
m/hello/ is completely equivalent to/hello/
And in fact, if you leave out the entire variable and binding operator, then perl will assume you mean to bind to the default variable mentioned before.
So
/hello/
is completely equivalent to
$_ =~ m/hello/
Isn't that cool? This is very neat when you are searching through files, because as you read through a file, each line of the file is automatically put into the default variable for you. More on this in a later section on reading from files.
Conclusion
This concludes the section on regular expression pattern matching and substitution. We've spent a lot of time on it because it is a very important concept and operation in the world of Perl programming. I urge you to try to look over the sections on pattern matching and substitution in the online documentation. You might find it quite daunting at first, but once you get the hang of how the documentation is written, you will be able to get more ideas of more advance matching and substitution operations.
In the next lesson, we'll put some of this matching and substitution knowledge to work as we start to manipulate files. See you next time!
____
Chapter 3. Functions
Introduction to Functions > String Functions >Array Functions > Test Functions > User Functions
Introduction to Functions
This chapter introduces functions in Perl. I will introduce the more commonly used built-in functions and then show you how to add to these by creating your own user functions.
Functions are small blocks of code that can be run simply by using the function's name in the script. Quite often functions will have inputs, specified as a list passed to the function in parentheses. The function nearly always has a value which can be returned and assigned to a variable.
As an example of a function consider a made-up "add" function:
$sum = add(1, 2);
The function is being given the values 1 and 2 as inputs, and the variable $sum would get the value of 3. This function returns only one value, the sum.
In general however, functions can return more than one value. They can return a list of values, and in such cases the returned list would be assigned to an array:
@names = students( "Perl Course 130" );
This "students" function would return a list of students' names registered for a Perl course, and this list would then be assigned to the array, @names. The @names array would be created with just enough elements to contain the complete returned list.
This multi-valued return value is a point worth pondering. It means that some Perl functions will return scalars and others will return arrays. Even more interesting, the same function can sometimes return a scalar, and sometimes an array.
When scalars are used, functions are said to be in a "scalar context". When arrays or lists are used, functions are said to be in a "list context". Which of the two contexts are used has a dramatic impact on the meaning of the statement and on the values returned by the functions. You'll see how to determine which context will be used in the next sections.
Chapter 3. Functions
Introduction to Functions > String Functions >Array Functions > Test Functions > User Functions
String Functions
String functions are functions that operate on strings or that return strings. We will just look at a couple of common ones here:
chop, chompchr, ordlc, lcfirst, uc, ucfirstlengthsprintfconcatenation
chomp, chop
Strings often have trailing newlines after they have been read in as input. Chop and chomp are two functions used to clean up these strings by removing any trailing newline:
chop($text_string);chomp($text_string);
Chop is harsh - it chops off the last character no matter what it is. This is good if you want to remove a newline and the last character is a newline. But what if it isn't? Then you've chopped off the wrong thing. Chomp, which is a matching chop, will only remove the last character if it really is the newline character.
chr, ord
Chr returns the ascii character belonging to a numeric value, and ord returns the numeric value of the first character in a string:
chr(65) returns "A"; ord("Apple") returns 65;
lc, lcfirst, uc, ucfirst
These functions, representing lower case, lower case first character, upper case, and upper case first character, take a string and return the same string with obvious changes:
lc( "HELLO") returns "hello";lcfirst ("HELLO" ) returns "hELLO";uc("hello") returns ("HELLO");ucfirst ("hello") returns ("Hello");
length
The length function returns the length of the string, including any newline characters:
length("hello") returns 5;
sprintf
This useful function provides formatting capabilities for numbers and strings. Basically,
sprintf("%6.2f", $x );
would turn the numeric value of x into a string of characters with a precision of 2 decimal places and return that string value. You can read more about sprintf in the documentation.
concatenation, .
The dot operator, ., is actually an operator rather than a function, but I'll describe it here. A dot in Perl allows you to join two text scalars together. For example:
$x = "The quick ";$y = "brown fox ";
$text = $x.$y."jumped over the lazy dog\n";
would put the whole sentence into the variable $text.
Chapter 3. Functions
Introduction to Functions > String Functions >Array Functions > Test Functions > User Functions
Array Functions
Array functions in Perl are very useful, and as the name suggests they operate on arrays. Some of these functions operate on lists, which are very similar to arrays. I'm going to combine both list and array functions in this section. Arrays can be filled with numbers or strings.
We'll look at the most common array functions:
pushpopshift
Push
Push takes an array and pushes a new value onto the end of it. If you had an array like
@array = ( "one", "two", "three" );
then
push @array, "four";
would add "four" onto the end after "three".
Pop
Pop does the opposite of push. It returns the top value from the array, and removes it from the array.
$value = pop @array;
would return "four" and put @array back to ( "one", "two", "three" ).
Shift
Shift is like pop, except it returns the first element and then deletes it.
@array = ( "one", "two", "three" );$value = shift @array;
would put $value equal to "one", and leave @array as ("two", "three");
List Functions
List functions operate on lists. Commonly used functions include:
grepjoinsplitmapsort
We will have just a brief look at each function. For more complete explanations, look them up in the online HTML documentation.
Grep
Grep comes from its Unix brother of the same name. Grep is an acronym for "global regular expression print". In scalar context, the function looks through an array or list and returns the number of times the regular expression is found in the elements. The statement
$found = grep /a/, ( "ant", "bug", "cat" );
would set $found to 2 because there are two occurrences of "a", one in ant and one in cat. Because the value is being assigned to a scalar, Perl evaluates grep in a "scalar context" and the single value 2 is returned.
You can also evaluate grep in a "list context". If you assign the return value of grep to an array instead of to a scalar, then the array will be filled with the matched elements.
@found = grep /a/, ( "ant", "bug", "cat", );
would leave the array @found with the values like this:
@found = ( "ant", "cat");
Grep is a very powerful tool for determining whether patterns exist in array elements, or extracting those elements out of a list or array that match a certain pattern.
Join
Join is a very useful function. Given a "join string" and a list of values, it will return the string of values separated by the join string. It is very useful for putting a comma and space between list values. No extra ones are put at the beginning or end, just between the elements.
$text = join ", ", @array;
would set $text to "one, two, three, four" with the right @array.
Split
Split is the converse of join. Split separates a string using a delimiter pattern and returns the separated array. It is very useful for separating text lines that contain fields into their individual parts.
@array = split /, / , "one,two,three,four";
would result in @array having the values ( "one", "two", "three", "four");
($a, $b, $c, $d) = split /\s*,\s*/, "1 , 2, 3 , 4";
would result in $a = 1, $b = 2, $c = 3 and $d = 4. Note that /\s*,\s*/ is a regular expression matching any whitespace around a comma.
Map
Map is a bizarre but useful function. It takes a list as an argument and performs a user-defined operation on each element in turn. Then it returns the list consisting of the modified values. For example, to change an array to all upper case, you could write:
@upper = map { uc($_) } @array;
One by one, each element of the array is assigned to the default variable $_ which is then converted to upper case in the expression uc($_). By the time all the elements are done, map has made a copy of the original array but all in upper case. It returns that new list, which is then assigned to @upper.
For another example, to print an array with one element per line, you could tack a newline character onto the end of each value like this:
print map { "$_ \n" } @array;
One by one, each element of the array is assigned to the default variable $_ and is quoted with a newline character: "$_\n". By the time all the elements are done, map has a copy of the original array but with a newline added to each element. It returns that new list. The print statement then prints the new list, and each orignal array value comes out on its own line.
Sort
Sort will arrange any list into a sorted order. Many variations are possible so check the documentation. It has much more to offer than I'm showing here.
print sort ("gamma", "beta", "alpha");
would print alphabetagamma. If you wanted commas in between, use join too:
print (join ", ", sort ("gamma", "beta", "alpha")), "\n";
to get "alpha, beta, gamma". Be sure to look more into sort, because you can get the sort function to sort a list according to virtually any sort criterion you can imagine.
With these functions at your disposal, you are very close to getting into some real coding. Carry on to learn about the interesting test functions.
Chapter 3. Functions
Introduction to Functions > String Functions >Array Functions > Test Functions > User Functions
File Test Functions
Test functions are functions used to test common file conditions. They are often used in if (...) statements. And there's no way other way to say it — they look weird.
Look at this:
if ( -e $file ) {
#do something...
}
means: "if exists $file" then do something... In other words, if the variable $file refers to a file that exists on the hard drive, then do something...
There are many test functions used to test for different things. These next ones are the most useful to a beginner:
-e (file exists)-f (is a plain file - not a directory)-d (is a directory)-z (has zero size)
One last point is that if you don't specify a $file variable, Perl will assume you mean, what else, the default variable!
if ( -e ) {
#do something...
}
means if the file specified by the value of the default variable $_ exists, then do something.
One last section on functions remains and that is how to create your own...
Chapter 3. Functions
Introduction to Functions > String Functions >Array Functions > Test Functions > User Functions
User Functions
Using the built-in functions of Perl will take you a long way. But eventually you'll have a repetitive chore that you want to call from more than one place in your program. You separate the block of code you want to perform into a function and call it whenever you need it.
Suppose you wanted to make a geometric mean function, where you take two numbers, multiply them and return the square root. Here's how it would look:$x=10;$y=20; print "The geometric mean of $x and $y is: ", geomean($x, $y), "\n";sub geomean() { ($a, $b) = @_; return sqrt( $a*$b );}
A couple of interesting points. First, the function is declared with the use of the word "sub", meaning subroutine. Subroutines and functions are synonymous here.
The other interesting thing here is that the parameters $x and $y are passed to the function inside parentheses. But in the place where the function is defined, in the sub geomean() line, there's no mention of input parameters.
Instead, one the line: ($a, $b) = @_; we see this strange @_ thing. What's that? Hm. It's the "default array" — the parameter list, passed in as an array, with the default name @_. It's very similar to the default variable $_, but it's an array instead of a scalar!
The line ($a, $b) = @_; is a list assignment. It assigns what was passed in as $x and $y to $a and $b, respectively.
The rest of the function is easy. It just returns the square root of the product, like it should.
In practice you'll see many different ways of retrieving the values from the default array. Sometimes you'll see a list assignment as above, and sometimes a series of "shift" statements instead. We could have written: sub geomean() { $a = shift; $b = shift; return sqrt( $a*$b);}
because if no array is provided to the shift function, it operates on the default array!
That ends the brief introduction to functions in Perl. With this chapter behind you, you've got ammunition to tackle some pretty advanced Perl jobs. Now let's see how to read and write results to files in the next chapter.
____
Chapter 4. Handling Files in Perl
IO Introduction > STDOUT > Writing Files > STDIN > Reading Files > Reading Directories > Editing Files > Recursive Editing
Introduction to Input and Output
This section describes working with input and output: getting information into and out of your Perl program. Input and output are referred to by computer geeks as IO.
In general as you work with Perl, you will be reading information from at least one source of input data. This source might be typed in at the keyboard, or read from an existing file, or brought in as data through CGI. You might even read from more than one source in the same program.
You will also be working with sending data to at least one output destination. This output destination might be to the screen, to a file, to another program, or to a web browser.
Of course, you get to choose where to read your data from, and where to send it to. How you do that is the subject of this lesson in the course. Continue on to learn about the various forms of IO, beginning with sending data to the simplest output destination.
Chapter 4. Handling Files in Perl
IO Introduction > STDOUT > Writing Files > STDIN > Reading Files > Reading Directories > Editing Files > Recursive Editing
Standard Output
Standard output is a term used extensively in computer science to refer to the normal output destination of a program. When you are working with a command line interface, like DOS, the standard output is directed to your computer screen.
Standard output is written as STDOUT within Perl programs. Any output generated by a plain old print statement in your Perl program goes to STDOUT. When you write
print "The quick brown fox\n";
you are really asking that your output be sent to STDOUT.
STDOUT is an example of a thing called a filehandle. A filehandle is a special type of variable that is associated with an output destination. It is used to tell your program where you want output to go.
Writing to standard output is so common that Perl automatically prepares the STDOUT filehandle for you so you don't have to do anything extra to use it. You don't even have to specify it since a print statement sends output to STDOUT by default. If you wanted to specify it in your print statement, you'd include it right after the word "print" with no extra commas:
print STDOUT "The quick brown fox\n";
But, you don't need to since the default statement:
print "The quick brown fox\n"
is completely equivalent.
The neat thing about STDOUT is that it can be altered outside your program, with the "redirection" operator. So if you were running your Perl program and you wanted to keep the output for review instead of letting it flash by on the screen, you could redirect STDOUT with the ">" symbol and have the output sent to a file like this:
perl perltest.pl > output.txt
This would then put all the program's output into the file, output.txt. Note that this redirection is a feature of DOS, or Unix, and not of Perl in particular. But Perl supports writing to STDOUT, so that this redirection works. This means you can save your output to any file you choose, without altering the internal contents of your Perl script. That is a very handy and cool thing.
Chapter 4. Handling Files in Perl
IO Introduction > STDOUT > Writing Files > STDIN > Reading Files > Reading Directories > Editing Files > Recursive Editing
Writing Files
Writing to STDOUT is very common, and using the redirection operator allows you to save output to a different file. But what if you want the output to go straight into a particular file from within your program?
The answer is you can specify the file you want to use by opening a filehandle to it. Then you use the new filehandle in your print statement, instead of the default STDOUT filehandle. When you're finished, you close the filehandle. It goes like this:open OUTPUT, ">output.txt";print OUTPUT "The quick brown fox\n";close OUTPUT;
In the above code, OUTPUT is the new filehandle. You can name a filehandle anything you like. Instead of OUTPUT, we could have used FILE, DEST, OUT, SAVE, or any other non-reserved name. By convention, a filehandle is written in upper case, but it doesn't have to be.
We specify that we are opening a file for writing with the ">" sign in the quote:
">output.txt"
The ">" symbol is a mnemonic to indicate something is going into the file; in other words we are writing into it.
Lastly, we specify the destination file by naming the file in the open statement. In this example it is "output.txt", but any valid path would work.
Note there is no comma after the filehandle in the print statement. It is a common error to include one.
When you name a file in an "open for writing" statement, the file may or may not exist before opening. If it does exist first, it is wiped clean as soon as it's opened. If it does not exist first, it is created with no contents. Either way, you start with a clean slate as you begin writing into the file. For this reason, opening files for writing is considered "dangerous" because you can wipe a file out inadvertently if you open it for writing by accident.
As another example, consider this:open HTML, ">c:/web/root/index.html";print HTML "Content-Type: text/html\n\n";print HTML "";print HTML "

Written by Perl!

";print HTML "";close HTML;
You've just created a new web page, saved in the file: c:/web/root/index.html.
Re-opening STDOUT
It is quite permissible to re-open existing filehandles. When you do, the filehandle is first closed, and the new one is opened. This works with STDOUT too. You can re-open STDOUT, and then use the print statement to write to your new file by default.open STDOUT, ">output.txt";print "The quick brown fox\n";close STDOUT;
As you see, STDOUT is now going to output.txt, and there's no need to mention it in the print statement.
Appending to Files
If you want to append data onto the end of an existing file instead of writing new content from scratch, then you open the file with ">>" instead of ">".open LOG, ">>log.txt";print LOG "Here is a new log entry\n";close LOG;
Any information you print now will be added on the the end of the file.
Continue on to the next section to learn about how to read information in from files.
Chapter 4. Handling Files in Perl
IO Introduction > STDOUT > Writing Files > STDIN > Reading Files > Reading Directories > Editing Files > Recursive Editing
Reading Information from Standard Input
When you want to get information into your program, how can you do it? The counterpart to the standard output destination, is the standard input source of data. Standard input is abbreviated to STDIN.
Normally, STDIN reads input from keys typed at the keyboard, but STDIN input doesn't always have to come from the keyboard. You can redirect STDIN to take its data from a selected file, and this can be done outside your program. To redirect STDIN to read from another file, you can use the "input redirection" operator, "<". So you could write: perl perltest.pl <> output.txt
Keep in mind, this redirection is performed by the operating system, not by Perl specifically. However, it works in Perl because Perl supports the functioning of STDIN instead of hardwiring input to the keyboard.
When you want to read from STDIN within your program, you can get a line of input by using the diamond operator, "<>". The diamond operator means "read a line of input". To read a line of input from STDIN and save the input in a variable, write:
$input = ;
or, equivalently just:
$input = <>;
since it will read from STDIN by default. Don't put a space in between the brackets or it won't work.
Note that this will read a whole line of input including the newline (\n) character added on to the end when you hit the Enter key. Often you'll want to get rid of this \n character. To do that you can write:
chomp($input);
Recall that chomp is the safe form of chop.The next section will show you how to read from specific files, instead of from standard input.
Chapter 4. Handling Files in Perl
IO Introduction > STDOUT > Writing Files > STDIN > Reading Files > Reading Directories > Editing Files > Recursive Editing
Reading Files
Often you will want to read input from a specific file instead of from the keyboard. To do this, you open the specific file for reading, read the lines of input and then close the file again.
To open a file for reading, you can use the "<" character, a mnemonic for "something coming from the file" in the open statement. However, because reading is a very common operation and is also safe, you don't need to specify any particular character in the open statement. Therefore, you can use either of these statements: open INPUT, " operation. If you put the read operation inside a while test, the input will into the default variable, $_. For example the following script will print out the contents of input.txt to the screen.open INPUT, " ) { print;} close INPUT;
Reading lines into an array
The line
while ( ) {
is equivalent to
while ( $_ = ) {
and when you assign to a scalar variable as above, the operation is put into a scalar context. In a scalar context, it reads in only a single line at a time. If you put into a list context, then all lines will be read in at once:open INPUT, ";close INPUT;#Now @lines holds all the lines, one line in each element.print "Last line is:\n";print $lines[-1];
Reading a whole file into one varaiable
Sometimes, you'd rather read the whole content of the file into a single variable, rather than into an array of lines. This is a particularly good move when you need to do a multi-line pattern match or substitution, because then you can match to the entire content at once.
The thing to understand is that the <> operator reads input until it reaches an "end of record character". By default, this end of record character is a newline, \n. So, by default, input stops after each newline.
The Perl variable which specifies the end of record character is $/. By default, $/ = "\n";
If you change the value of $/, you can change input behaviour. You can even undefine the end of record character to read in the whole file in one operation. Use the undef operator:open INPUT, ";close INPUT;$/ = "\n"; #Restore for normal behaviour later in script
With the above script, the variable $content will hold the entire file.
These various methods of reading files: a line at a time, into an array of lines and into a single variable all have their individual strengths and weaknesses in use. We'll soon use the different methods in the upcoming section on editing files. But first, let's learn how to list the files that are in a directory.
Chapter 4. Handling Files in Perl
IO Introduction > STDOUT > Writing Files > STDIN > Reading Files > Reading Directories > Editing Files > Recursive Editing
Reading Directories
Often you'll want to do a directory listing to find out what files are present in a directory. It's easy with the opendir, readdir, and closedir functions. For these you use a directoryhandle which is quite a bit like a filehandle.
You can use a scalar context to read one filename at a time:opendir DIR, "."; # . is the current directorywhile ( $filename = readdir(DIR) ) { print $filename , "\n";}closedir DIR;
Or you could use an array context and read the files into an array as in the next example. You don't have to specify the current directory. You could specify some other path: opendir DIR, "c:/";@files = readdir(DIR);closedir DIR;print map { "$_\n" } sort @files;
This last line sorts the @files array into ascii-based order before printing the files, one per line.
We are now ready to begin the section on editing files.
Chapter 4. Handling Files in Perl
IO Introduction > STDOUT > Writing Files > STDIN > Reading Files > Reading Directories > Editing Files > Recursive Editing
Editing Files
We have seen many elements of Perl: matching and substitution, reading and writing files, and a number of functions. It is time to try editing files. We just have to put everything we've learned together.
Let's say you had this nonsensical data file. Copy and paste this text into notepad and save it as test.txt in your scripts directory:cry through leap been by full takemany again track every many aim quite able how plus all all life toad than end through if would that fire quite away that away smile take every away quiet quiet toad strong how old every when cry quiet how be pale quiet smell leap hope quite sit to able how but by
Let's say you wanted to edit the word "away", and change it to "yellow", but only if it occurs twice on one line. You want to change the second occurence only. You want to save the changed file in the same file, test.txt. Here's a script to do that.#Specify the file$file = "test.txt";#Open the file and read data#Die with grace if it failsopen (FILE, "<$file") or die "Can't open $file: $!\n";@lines = ;close FILE;#Open same file for writing, reusing STDOUTopen (STDOUT, ">$file") or die "Can't open $file: $!\n"; #Walk through lines, putting into $_, and substitute 2nd awayfor ( @lines ) { s/(.*?away.*?)away/$1yellow/; print;}#Finish upclose STDOUT;
Let's suppose you want to work with this comma separated numeric data.
Copy this file and save it as test.txt"Date","Time","O","H","L","C","V","OI"12/18/1996,1600,1562,1562,1562,1562,0,012/19/1996,1600,1800,1800,1800,1800,0,012/20/1996,1600,1589,1589,1589,1589,0,012/23/1996,1600,1121,1121,1121,1121,0,012/24/1996,1600,1298,1298,1298,1298,0,012/26/1996,1600,1544,1544,1544,1544,0,012/27/1996,1600,1451,1451,1451,1451,0,012/30/1996,1600,1402,1402,1402,1402,0,012/31/1996,1600,1281,1281,1281,1281,0,001/02/1997,1600,784,784,784,784,0,001/03/1997,1600,1859,1859,1859,1859,0,001/06/1997,1600,1391,1391,1391,1391,0,001/07/1997,1600,1476,1476,1476,1476,0,0
Let's say you want to remove the Time, V, and OI column, change commas to tabs, and append the words " over 1400" to the line if the value in the C column is over 1400. You want to save the output to a different file, testout.txt, and print out a copy of it when you are finished. Here's a script to do that:$in_file = "test.txt";$out_file = "testout.txt"; open (IN, "<$in_file") or die "Can't open $in_file: $!\n";open (OUT, ">$out_file") or die "Can't open $out_file: $!\n"; while ( $line = ) { @fields = split /\s*,\s*/, $line; $line = join "\t", $fields[0], @fields[2..5]; #an array slice! print OUT $line; print OUT " over 1400" if $fields[5] > 1400; print OUT "\n";} close IN;close OUT; #read in output file and print to screen to confirmopen (TEST, "<$out_file") or die "Can't open $out_file: $!\n";while ( ) { print;}close TEST;
We will cover more examples in class if time permits. We have one last thing to cover in this section, and that is editing all the files in a directory tree, or recursive editing.
Chapter 4. Handling Files in Perl
IO Introduction > STDOUT > Writing Files > STDIN > Reading Files > Reading Directories > Editing Files > Recursive Editing
Recursive Editing
Sometimes as a webmaster or an administrator, you'd like to be able to make a particular regular expression substitution in all the files throughout an entire directory tree. This is not too hard, with a little help from a built-in Perl module called File::Find.
There will be a few new concepts introduced in this section and you might have to start digging into the documentation for more details on how these things work.
Perl can be extended from the core language by way of "modules". Modules are additional special-purpose packages of code that can be imported into the Perl environment when you need to use them. Your Perl installation comes with many modules available. They are not used until you ask for them since using modules adds a small performance penalty.
One module that comes with every Perl installation is the module: Find::File. To use this module, add the line:
use File::Find;
to your script.
File::Find provides just the directory-walking behaviour we need to traverse a directory tree. Consider this script:use File::Find;$dir = "c:/web";find(\&edits, $dir);sub edits() {print "File name is $_\n\t\tFull path is $File::Find::name\n";}
Alter the $dir="c:/web" to something that makes sense on your system and try running this script. You'll find it recursively lists every file in the directory tree you picked. Let's see how it works.
The line "find(\&edits, $dir)" is the line that walks through the tree specified by "$dir". Find is a function provided by the File::Find module. For parameters, find takes a reference to a function (\&edits), and the directory to walk through. For each file in the tree, it calls the referenced function.
So, for every file, the function "edits" is called. Within the edits function, the default variable, $_, is set to the name of the file. If you want the complete path to the file, look at $File::Find::name. Each time a new subdirectory is entered, the system changes into that directory so the short $_ file names are all you need to open them.
In the above example, we just listed the files, but we have all the tools to do a recursive edit. We just have to add some code to open the file and edit it within the edits function, as follows. Be careful, because this code has the potential to edit a lot of files!#!/usr/bin/perl use File::Find;$dir = "c:/web";find(\&edits, $dir);sub edits() {$seen = 0;if ( -f and /.html?/ ) {$file = $_;open FILE, $file;@lines = ;close FILE;for $line ( @lines ) {if ( $line =~ s/Lesson/Chapter/ ) {$seen++;}}open FILE, ">$file";print @lines;close FILE;}} print "Found in $File::Find::name\n" if $seen > 0;
This script can be used as a starting point for more useful or powerful scripts. Readers are encouraged to look up this kind of editing in the Perl cookbook.
Here's a more full featured script with a few extra features. How it works is left as an exercise for the student.#!/usr/bin/perluse File::Find;@ARGV = ('.') unless @ARGV;$dir = shift @ARGV;find(\&edits, $dir);sub edits() {return unless -f; #skip directories$seen = 0;$file = $_;#Uncomment next line if you want multi-line edits#undef $/;local $^I=".backup";#Warning - heavy magic herelocal @ARGV = ($file);while(<>) {#Remember to use the s option if doing multiline edits$seen++ if s/Lesson/Chapter/;print;}print "Found in $File::Find::name\n" if $seen > 0;#Comment out if you want to keep the backup#unlink $file.".backup";}
To decipher the above, look up the $^I variable or the equivalent command-line option, -i. Also, look up @ARGV, and explore the special meaning of "while(<>)" when used with the @ARGV array.
Be careful, and make sure you've got a backup of your directory because you can mess up a lot of files quickly with this script. Don't run it twice in a row without checking that the content is preserved, or you can completely clobber your original work!
With care, recursive editing is a very handy addition to your bag of tricks.
Chapter 5. Introduction to CGI
Introduction to CGIFirst Page to BrowserPrinting HTMLServing HTML FilesServing Edited Files
Introduction to CGI
The rest of this introductory Perl course will describe how you can make Perl interact with a browser through the Common Gateway Interface, or CGI.
CGI is a standard interface that sits between the web browser and the web server. When the browser makes a request of the server, all of the request details flow into the server through the input interface of the CGI. When the server responds with its output, the information flows back out through the output interface of the CGI.
When Perl responds to a browser request it sends output to STDOUT which is sent through CGI back to the browser. Because you know how to print data to STDOUT, you can already work with CGI at its most basic level: sending data to it. It is no more complicated than printing to STDOUT.
The real challenges are learning how to interact with the data that comes in, and how to create useful content. We will learn this later, but for now, we will start by printing some very simple text to a browser window.
If you have not yet installed your Personal Web Server and set it up to work with Perl according to the PWS installation instructions then you might as well stop to take the time to do it now. You won't be able to do any more of the course until you do. See you when you are ready.
Chapter 5. Introduction to CGI
Introduction to CGIFirst Page to BrowserPrinting HTMLServing HTML FilesServing Edited Files
Sending your first page to the browser
During the setup of Perl and the Personal Web Server, you were asked to verify your setup by sending a simple page to the browser. Let's look at it in detail again here:print "Content-Type: text/html\n\n"; print "\n";print "\n";print "\n";print "\n";print "\n";print "

Hello World

\n";print "

\n";print "Your IP Address is $ENV{REMOTE_ADDR}.\n";print "

";print "

Have a nice day
\n";print "\n";print "\n";
First of all, it should be clear that this is Perl program does nothing but print HTML text to STDOUT. That's pretty much all there is to sending data out through CGI!
The output does have to follow certain rules through. You can't just print anything you want and expect it to be interpreted by the browser correctly. The first thing that your program must do is to tell the browser what kind of information follows.
Since we are sending HTML output, we must inform the browser that the content is text in HTML format. That's what the first line is for. (It's a standard MIME header.) If you were just outputting plain text to your browser, then you could put "text/plain" instead of "text/html".
It is very important to send two newlines after the header, i.e., a single blank line, otherwise the browser will complain of bad header information.
After the content type header, there is just a complete section of HTML. The only odd bit in the script is the line:
print "Your IP Address is $ENV{REMOTE_ADDR}.\n";
As you have seen, this prints your numeric IP address in the browser window. But what is $ENV{REMOTE_ADDR}? The answer will become clear later. For now just accept that it is part of the data delivered to the input of CGI. How you get at data will come later.
Let's look at some other examples of content and other techniques of creating content "on-the-fly".
Chapter 5. Introduction to CGI
Introduction to CGIFirst Page to BrowserPrinting HTMLServing HTML FilesServing Edited Files
Printing HTML
We've seen the print statement in action in earlier lessons. You can print anything you like in HTML with a sequence of print statements. This was the technique used in previous script — specifying one print statement after another. As long as you begin by printing the correct Content-Type and a blank line, you will have no problem.
Let's look at a couple of alternatives to repeated print statements because TIMTOWTDI! Let's quote the whole thing within a variable and just print once:$html = "Content-Type: text/html

Hello World

Your IP Address is $ENV{REMOTE_ADDR}

Have a nice day
"; print $html;
This is easier because of two things. We don't have to keep track of quotes around many strings, and we can let the new lines within the block itself specify the \n characters.
But now that we're quoting, suppose we wanted to put the word nice into quotation marks? If we wrote:
Have a "nice" day

in the original script, we'd get a syntax error, because the two quote marks we put in would match with the opening and closing quotes already there. The solution is to backquote the new ones, like this:
Have a \"nice\" day

This takes away the syntax error since the quotes are interpreted as part of the string itself. Sometimes though, this backquoting gets to be too much of a pain when there a lots of quotes such as in the next example. In that case, you can use a different quoting delimiter, and use the explicit quoting operator, qq{}.$html = qq{Content-Type: text/html

Hello World

Your IP Address is $ENV{REMOTE_ADDR}

Have a "nice" day

Name:}; print $html;
See how much nicer looking it is when you don't have to backquote the quotes? Like s///, the quote operator qq provides you with a complete choice of delimiters, so you can just choose one that doesn't collide with your quoted text.
Just so you've heard of it, you can also use the "here document" form of printing. "Here documents" originate in Unix shell programming and they have been incorporated into Perl. With qq's ability to quote over newlines, I'm not sure what advantage they offer. In any case, here's an example:print <

Hello World

Your IP Address is $ENV{REMOTE_ADDR}

Have a "nice" day

Name:EOF
The "here document" always follows the form:
print <\n\n}; } sub body() { $body = qq{

$title

}; for $i ( 1 .. $rows ) { $body .= qq{ }; $body .= qq{ \n}; } $body .= qq{
Row $i}; $body .= $i*$i; $body .= qq{

}; return $body;}
These examples demonstrate Perl's ability to generate HTML on the fly. As powerful as it is, there is a disadvantage in this approach. The technique mixes your HTML coding tightly within your Perl programming. It is generally a good design idea to try to keep each area separate, because you can approach the project either as an HTML author, or as a Perl programmer. So can other people you bring in to work on your project.
In the next section we'll look at serving HTML files that have been authored outside of Perl.
Chapter 5. Introduction to CGI
Introduction to CGIFirst Page to BrowserPrinting HTMLServing HTML FilesServing Edited Files
Serving HTML Files
Very often in CGI programming, you'll want to call up a particular HTML file and use it as the basis of your response. In its simplest form, this is accomplished with nothing more than sending a header, opening the file and printing it.
Try saving this web page as "serving.html" in your cgi-bin directory. You could actually save it anywhere on your hard drive, but putting it in your cgi-bin directory will allow you to leave out the path to the file in the example.
Next, run this perl script:print "Content-Type: text/html\n\n"; open HTML, "serving.html" or die $!; while( ) { print; } close HTML;
Pretty simple, huh. You could display other files just by modifying the path. As an exercise, try serving other html files on your disk. Then try displaying a pure text file, with a header of text/html, and then with text/plain.
Now you've seen how to open and serve files, you're ready to see how to edit files on the fly and use them as templates.
Chapter 5. Introduction to CGI
Introduction to CGIFirst Page to BrowserPrinting HTMLServing HTML FilesServing Edited Files
Serving Edited Files
Now that you know how to fetch and serve an existing HTML file, you can use it as the basis for editing. Try saving this web page as "editing.html" in your cgi-bin directory and running this perl script:print "Content-Type: text/html\n\n"; open HTML, "editing.html" or die $!; while( ) { s{}{}; s{}{

Substituted Header!

}i; print; } close HTML;
You will see that the output page now has a new title, and different headers.
You can extend this concept in many ways. Consider these few ideas and see if you can think of more:
· substitute other new words for old in the text as above
· put special words in the text, like _name_, especially for "form letter" substitution
· add hyperlinks to a document. See below
· highlight keywords. See below
· put comments in places your Perl script should fill in. These remain invisible if not replaced.
· put and comments for invisible markers around whole sections of code that should be replaced.
As an example of adding hyperlinks and highlighting words on the fly to existing content, have a look at this next script. It's something that could be done on the fly, as you are about to see, or it could be done as an offline editing process.
This script might be a bit challenging so here are some words of explanation. Don't get thrown by the filehandle. It's a filehandle that's opened for you by Perl if you use it. It's a nifty way of attaching data at the bottom of your program under the line __DATA__ (that's DATA with two underscores both before and after). We are using the data under __DATA__ to define the words we want to substitute along with the URL they should point at.
First, we read the "word, URL" pairs in from the end of the script through the DATA filehandle. We loop through them to build up the hash, %hyperlinks, of words and associated URL's.
As an example of printing diagnostics during development, we print the key and value pairs of the hash at the top of the browser window, right after the Content-Type is written. We could print much more comprehensive diagnostic info with more print statements throughout the program. We'd remove the diagnostics after we were happy with the way the script worked.
Once we have our hyperlink hash done, we just loop through each HTML line with the while() loop. Then for each HTML line, we loop through the words that need substitution. The substitution looks for the selected substitution word and replaces it with a hyperlink reference to the URL, with some highlighting.
We put spaces (\s) around the subsitution pattern to only catch whole words, and not their occurrences inside other words.
Once you've run the script have a look at the source to see what happened to the substituted words.print "Content-Type: text/html\n\n"; for ( ) { #skip blank lines next if /^$/; ($key, $value) = split /\s*,\s*/; $hyperlink{$key} = $value; #Once you've printed the header, #you can print diagnostics to the browser window! print "$key, $value
\n"; } open HTML, "editing.html" or die $!; while( ) { for $key ( keys %hyperlink ) { s(\s$key\s) ( $key ); } print; } close HTML; __DATA__special, http://www.m-w.comcode, javascript:alert('Code!')example, course/index.html
This is just the beginning of substituting on the fly. It is a very powerful and useful technique. Sometimes the same kind of editing should be done offline instead of online. Let me know what else you can think of.
The next chapter will deal with responding to input from the user.
____
Chapter 6. The CGI Module
Introduction to GET and POSTThe CGI Module, CGI.pmHandling FormsHandling Form ErrorsHandling Fatal Errors
Introduction to GET and POST
I assume that readers are familiar with forms in HTML and would know how to build a form with either the GET or POST methods of form submission.
When a browser makes a simple URL request of a server, such as www.servername.com/directory/pagename.html, it sends this URL to the server and gets back a response. When information is provided to the server in this way, it is using the GET method of retrieval. It is the default method for all web requests.
With the GET method, all of the information required to retrieve the URL is in the address line. Sometimes you'll see additional fields trailing the URL, like www.servername.com/directory/pagename.html?term=first, etc. This is still a GET operation, with additional parameters provided in the URL. We will learn how to access these additional parameters inside our Perl scripts later in this chapter.
In contrast, the POST method is used when information is to be uploaded to the server. Because the POST method is intended for uploading much larger quantities of data than the GET method, none of the additional data appears in the URL. Instead it is passed in to the server through a different route, but it is still accessible by your Perl programs.
As examples of how the information is passed in and how it differs between GET and POST, consider the following script, env1.pl. This is a handy little script to have around because it reveals so many interesting details. In addition, it reveals the main structure of CGI input data.
Save the following script in your cgi-bin directory. A good name would be "environment.pl". Then run it in your browser.use CGI; $cgi = new CGI; print qq{Content-type: text/html }; #print every key in the environmentforeach $key (sort (keys %ENV)) { print $key, ' = ', $ENV{$key}, "
\n";} #print a couple of simple forms: a POST form and a GET formprint qq{
};print qq{
}; print qq{};
Once you have run the script, you'll see all kinds of details: your IP address, the server's IP address, your browser version, referrer URL, details of the server's environment, etc. Did you know that all this stuff is available to every webserver you ever browse to?
When you are ready, try putting values into either the POST or GET fields. If you look carefully, you'll notice a couple of things change... "CONTENT_LENGTH", "REQUEST_METHOD" and "QUERY_STRING", and the URL itself.
During a GET method, the parameter appears in the URL of the response. QUERY_STRING shows the parameters, and CONTENT_LENGTH is 0.
During a POST method, the parameter does not appear in the URL of the response. QUERY_STRING is missing, and CONTENT_LENGTH is non-zero.
In both cases, the requested parameter has been supplied to the script. It is just accessed in a different way.
Fortunately, all the grunt work of getting the values out of the CGI input parameters is done for you by a very useful Perl module called CGI.pm which is already in this simple script. Let's learn more about using CGI.pm in the next page.
Chapter 6. The CGI Module
Introduction to GET and POSTThe CGI Module, CGI.pmHandling FormsHandling Form ErrorsHandling Fatal Errors
The CGI Module, CGI.pm
The CGI module, introduced quietly in the last example is a superb module for dealing with the CGI interface.
CGI.pm does everything you need to read user input, create forms, handle cookies, handle redirection, and more. It is a very useful module indeed and it is a standard module supplied with your Perl installation.
Let's examine another script just a bit more advanced than the previous one. This next script does everything in the previous script, but it also prints out any input parameters supplied to the script through either GET or POST.
Try saving this script as getpost.pl and running it in your browser window. Or you can run the same script already loaded on my website.use CGI; $cgi = new CGI; for $key ( $cgi->param() ) { $input{$key} = $cgi->param($key);} print qq{Content-type: text/html }; foreach $key (sort (keys %ENV)) { print $key, ' = ', $ENV{$key}, "
\n";} for $key ( keys %input ) { print $key, ' = ', $input{$key}, "
\n";} print qq{
};print qq{
}; print qq{};
The first line "use CGI;" calls the CGI.pm module into your perl script so you can use its features. The very first thing you need to do is to create your brand new shiny $cgi variable, which you do with $cgi = new CGI;. (For those of you who want to know, you have just entered into an object-oriented aspect of Perl, but do not be afraid. It doesn't get particularly complicated.)
As soon as you have defined this new $cgi variable, all kinds of things are available to you. For instance the first thing you see is
for $key ( $cgi->param() ) { $input{$key} = $cgi->param($key);}
Now $cgi->param() is a hash of all the (name,value pairs) that were submitted to the form. I like to assign the $cgi->param() hash to the hash "%input" even though it's a duplication of memory, because %input turns out to be easier to work with. If you want the details, email me.
Next we print out the regular Content-type header and get into the html body. Then we work through all the keys in the environment hash, ENV, and print all its (key, value) pairs to the browser.
Lastly, we print out the %input hash. You'll see any values you submit appear here, already neatly parsed into (name,value) pairs. And nice to know, it doesn't matter whether that data came in as the result of a POST or a GET method. Either way, the data are there and accessible in the same way.
Now you've seen how to get at the input fields, let's see a more realistic example in practice.
Chapter 6. The CGI Module
Introduction to GET and POSTThe CGI Module, CGI.pmHandling FormsHandling Form ErrorsHandling Fatal Errors
Handling Form Input
To see how the CGI module helps you handle form input, let's look at another simple example.
Here's a form. Have a look at it, and look at its source in the source viewing window if you want to. It's just a plain HTML file with a form in it. The form's action is set to POST. Try pushing the Add Me! button and "Don't Add Me" buttons to see what happens.
Here's the script that gets called as the action of the form.
It creates a web page on the fly for you.
We can even serve up a page, with your name put into it. Try this form instead, and you'll see your details come right back into this page.Here's the script to do this
Pretty straightforward now that you know how to edit files on the fly. Once you get the data from the user, you can do anything you like with it; you can use it to customize content, or stash it in a database somewhere.
The next topic will discuss how to protect yourself against form errors so you can make the user fill in fields when they are not filled in properly.
Chapter 6. The CGI Module
Introduction to GET and POSTThe CGI Module, CGI.pmHandling FormsHandling Form ErrorsHandling Fatal Errors
Handling Form Errors
Sometimes when a user is filling in a form, he or she will leave out a crucial field or provide erroneous information. It's a fact of life that it's up to you to deal with it.
The best first line of defense against blank inputs is Javascript because the server doesn't even have to get involved in the correction. But sometimes, like when a user selects a username that's already taken, the only way to know there's an error is to check the data on the server.
Once you find an error, you have to prompt the user to enter the data again. You could just put up the original form and ask them to fill in the whole thing, but you'd be very unpopular with that approach. It's much better to fill in the form with the information they already provided, and just prompt them to correct the parts that are wrong.
Substitution of provided form fields is all that's needed for this. Let's say you put up a the same form as we've seen before, but with a twist: you'll be expected to fill in your name if you didn't provide it.
Click here to bring up this new form. Try it out without putting in a name. Here's the script to handle it.
Now that you've seen how to handle the users errors, you should learn how to handle your own!
Chapter 6. The CGI Module
Introduction to GET and POSTThe CGI Module, CGI.pmHandling FormsHandling Form ErrorsHandling Fatal Errors
Handling Fatal Errors in the Browser
Quite often when a script doesn't work, you get a completely blank page in response. Even if you have included some useful "die" messages, you might still be looking at a blank page of output, with no clue provided as to what went wrong. The error message from your die statement was still generated, but it went to the server log instead of to the browser window.
If you are hosting on a virtual server somewhere, chances are good that you don't have access to this server log. This is unfortunate because the error messages that are stashed there can be of real help when you are trying to debug your script.
Fortunately, CGI.pm supports a method of redirecting error messages to the browser window as well as to the server log. This can be a real lifesaver if you don't have access to the error log and a nice timesaver even if you do.
To do this, you just have to invoke this behaviour when you use CGI. Consider the script here.
Now try running the script here. See how the die message came out? An extra Content-Type line is printed, just in case none has been printed before. That extra line calling Carp is all it takes.
As a final example, you can see how to customize the error message that gets displayed here.
Here's the script for it.
That concludes the introduction to the CGI module. It is a large module with many, many features. You are encouraged to look into the documentation if you want to get the most you can out of it.
Chapter 7. CGI In Use
CGI In UseSite searchPresenting data from a databaseHandling cookiesUploading files to the serverRequesting info from other servers
CGI in Use
We have learned so much by now that you are ready to see how to put CGI programming to use in real world applications. We will spend the rest of the course working with several examples that feature useful applications for CGI programs:
Site search
Presenting data from a database
Handling cookies
Uploading files to the server
Requesting info from other servers
So on to the next page to get started!
Chapter 7. CGI In Use
CGI In UseSite searchPresenting data from a databaseHandling cookiesUploading files to the serverRequesting info from other servers
Performing a Site Search
Performing a site search is pretty straightforward because finding which pages hold a certain term is easy. We use File::Find to look through all the files, and we just do a match on the searched term to see if it's in the page.
The example below is complicated a little because it's set up to run on a Windows or a Linux server, and some details are different between the two.
There are some new constructs for you to encounter, such as using environment variables and the "=" operator, which is used to set a variable to a value only if it does not already have a value.
First you need a form to ask for the search term. There's one in this form. The action of the form is "search.pl" and the field is called "search_term"
Chapter 7. CGI In Use
CGI In UseSite searchPresenting data from a databaseHandling cookiesUploading files to the serverRequesting info from other servers
Presenting Data from a Database
Here's an example where a visitor can fill in a form and when form is submitted, the server stores the data in a database. The next time a visitor comes to the same page, the data that was uploaded by the previous visitor is now on display.
Click here to visit a form that works just like this.
Click here to see the script that handles the form.
Look through the code and read the comments for detailed explanations. In summary, the form input is parsed and saved onto the end of a "comma separated value" data file. Then the file contents are read back, turned into html table rows, and substituted into a placeholder in the original form.
Chapter 7. CGI In Use
CGI In UseSite searchPresenting data from a databaseHandling Cookies Uploading files to the serverRequesting info from other servers
Handling Cookies
The CGI module makes setting and retrieving cookies very easy. There are special "cookie" functions that allow you to build a cookie header, and retrieve cookie values.
Basically, there are two functions; one for reading cookies
#Read the cookie variable$ID = $cgi->cookie('ID');
and one for writing cookies
$ID_cookie = $cgi->cookie( -name => ID, -value => $name,-path => '/', -expires => '+1M' );
The function to create a cookie returns a string, in this case saved in $ID_cookie, which is then sent out as part of the header to set the cookie at the browser:
print $cgi->header( -cookie => $ID_cookie );
To see this in action, bring up a small cookie handling form here, and try out the Set, Check and Clear buttons.
Next time you send any form in to the server, the cookie goes along for the ride. This cookie is available to any script on the server from then on.
Click here to look at the perl script that handles these cookies.
(Just for your information, you can also check the value of a cookie with this little bit of javascript that runs on your local browser. This check does not involve the server. This is quite useful for debugging.)
Note that you can use cookies to track your visitors, at least when they visit pages served by scripts. Just check their cookies and append the details of their visit to a log file.
Chapter 7. CGI In Use
CGI In UseSite searchPresenting data from a databaseHandling cookiesUploading files to the server Requesting info from other servers
Uploading Files to the Server
Usually when web developers think of uploading files to the server, they're thinking in terms of FTP to get them there. But sometimes it's useful to allow your visitors to upload a picture or a document on the server.
Have you ever wondered how sites that allow you to upload files to the server through the browser window actually work? You are about to find out how.
The most important thing is that the HTML form's encoding be set to enctype="multipart/form-data". This advises the server that a file upload is coming in through the CGI interface. An input of type "file" is sufficient to put the Browse button and text box into the form.
When you hit the Upload button, the script at the other end just has to decide where to put the uploaded file, and open a file descriptor to write it there. Then it reads the data in from a specially provided filehandle (courtesy of the CGI module) and it writes it out to the file.
At the end, the directory is opened and the files in the diretory are listed. This updated list of files is written back to the upload form so you can see what you just uploaded.
Here's the uploading demo form.
Here's the script that handles the upload.
#!/usr/bin/perl
#upload.pl

use strict;

use CGI qw(:standard);
use CGI::Carp qw(fatalsToBrowser);

my %input;

my $upload_dir = "temp";
my $max_size = 30_000;

my $cgi = new CGI;

print $cgi->header();

for my $key ( $cgi->param() ) {
$input{$key} = $cgi->param($key);
}

if ( $input{upload_demo} =~ /\.(exeaspphpjspcgiplaspxconfigasaxasa)$/ ) {

die "Invalid file extension. No executable file types permitted";

}

if ( length($input{upload_demo}) > 0 ) {

#We are uploading a file with a name other than ""
#get rid of the leading directories

( my $file_name = $input{upload_demo} ) =~ s/.*\\//;
my $upload_path = "$upload_dir/$file_name";

# open output file
open OUT, ">$upload_path" or die "Error opening $upload_path: $!";
binmode OUT;

my $buffer = '';
my $size = 0;

#In file handle context, upload_file is a file handle
while (my $chars_read = read $input{upload_demo}, $buffer, 4096) {
print OUT $buffer;
$size += $chars_read;

#if size is getting bigger than you want to handle, quit!
if ( $size > $max_size ) {
last;
}
}
close OUT;

if ( -z $upload_path or $size > $max_size ) {
unlink $upload_path;
}
}

#build list of inline file choices
opendir(DIR, "$upload_dir");

my @files = readdir(DIR);
closedir DIR;

my $file_lines = "

".join ( "\n
", @files) ."\n";

open (HTML, "course/cgi_use/upload_form.html" )
or die "Error opening upload_form.html: $!";

#make substitutions

while ( ) {
s//$file_lines/;
print;
}

close HTML;

Chapter 7. CGI In Use
CGI In UseSite searchPresenting data from a databaseHandling cookiesUploading files to the serverRequesting info from other servers
Requesting Info from Other Servers
Usually you visit a website with a browser and view the retrieved information in the browser window. But with Perl, you can send off the same request that your browser would issue and get the results back into your Perl program instead.
Once it's in your Perl program, you can do whatever you need to with it. The appreciation of the value of this ability is left as an exercise for the reader.
The magic is provided by a popular Perl module, LWP.pm. Just "use" it at the beginning of your perl program to enable it. In summary, you create a new request according to some simple rules, you send it off to the remote server, you wait to get the results back, and you put the results to whatever use you choose.
Here's an example of a request that fetches a website of your choice and reveals all tables present in it. It's a great HTML design aid!
Click here for the request form.
Click here to see the script that handles the request. This form uses the "strict" pragma. It forces you to use Perl in a safe way and is something you should strive to use in your development as a Perl programmer.
Installing Perl and Personal Web Server on Windows
Before you start trying to upload your best perl-based CGI scripts to a remote web host, wouldn't you like to be able to run your scripts locally and see the output in your own browser? It's a lot quicker, easier, and even potentially less embarassing to develop on your own machine first!
Best of all, you can do it all for free with readily available software. It just takes a bit of know-how, and that's what I'm going to show you in this section.
Basically there are three steps involved in this installation, and fortunately each one is quite easy. And, like I said earlier, the software is available for free. These instructions work well for Windows 95, 98, and Windows Me. They also work for NT and 2000, but things are not quite so easy, because you'll need administrative privileges to do the installation.
The three steps are these:
1. Install Perl from http://www.activestate.com/
2. Install Personal Web Server from your Windows CD
3. Link Perl and PWS together to run CGI Scripts
Please visit each step in turn. If all goes well, by the end you'll be looking at a simple perl script output in your browser!
Note as of Aug 2005: A lot of people have difficulty with configuring Microsoft's PWS or IIS. Most of the questions I get regarding installation are because of this. Let's face it - Microsoft's documentation on configuring these products is not good.
Nowadays, when someone writes to me for help with their PWS or IIS installation settings, I suggest they save themselves a lot of trouble and use a different webserver instead. It is free, and is well documented. You can find it at http://www.aprelium.com/. Look for the Abyss Web Server X1. It works very well and is much easier to set up than PWS or IIS.




No comments:

How to Get files from the directory - One more method

 import os import openpyxl # Specify the target folder folder_path = "C:/Your/Target/Folder"  # Replace with the actual path # Cre...