Php Regular expression basic tutorial
Regular Expressions TutorialBasic Syntax of RegularExpressions (as from PHPBuilder.com) First of all, let’s take a look at two special symbols: ‘^’ and ‘$’. What they do is indicate the “^The”: matches any string that starts with “The”; “of despair$”: matches a string that ends in the substring “of despair”; “^abc$”: a string that starts and ends with “abc” — that could only be “abc” itself! “notice”: a string that has the text “notice” in it. You can see that if you don’t use either of the two characters we mentioned, you’re saying that the pattern may occur anywhere inside the string — you’re not “hooking” it to any of the edges. There are also the symbols ‘*’, ‘+’, and ‘?’, which denote the number What they mean is: “zero or more”, “one or more”, “ab*”: matches a string that has an a followed by zero or “ab+”: same, but there’s at least one b (“ab”, “abbb”, etc.); “ab?”: there might be a b or not; “a?b+$”: a possible a followed by one or more b’s ending a string. You can also use bounds, which come inside braces “ab{2}”: matches a string that has an a followed by exactly two b’s (“abb”); “ab{2,}”: there are at least two b’s (“abb”, “abbbb”, etc.); “ab{3,5}”: from three to five b’s (“abbb”, “abbbb”, or “abbbbb”). Note that you must always specify the first number of a range Also, as you might the symbols ‘*’, ‘+’, and ‘?’ have the same effect as Now, to quantify a sequence of characters, put them inside parentheses: “a(bc)*”: matches a string that has an a followed “a(bc){1,5}”: one through five copies of “bc.” There’s also the ‘|’ symbol, which works as an OR operator: “hi|hello”: matches a string that has either “hi” or “hello” in it; “(b|cd)ef”: a string that has either “bef” or “cdef”; “(a|b)*c”: a string that has a sequence of alternating a’s and b’s ending in a c; A period (‘.’) stands for any single character: “a.[0-9]”:matches a string that has an a followed by one character and a digit; “^.{3}$”: a string with exactly 3 characters. Bracket expressions specify which characters “[ab]”: matches a string that has either an a or a b (that’s the same as “a|b”); “[a-d]”: a string that has lowercase letters ‘a’ “^[a-zA-Z]”: a string that starts with a letter; a string that has a single digit before a percent sign; “,[a-zA-Z0-9]$”: a string that ends in a comma You can also list which characters you DON’T want — just use a ‘^’ as the first symbol in a bracket expression character that is not a letter between two percent signs). In order to be taken literally, you must escape the characters “^.[$()|*+?{\” with a backslash (‘\’), as On top of that, you must escape the backslash character itself in PHP3 strings, so, the function call: ereg(“(\\$|¥)[0-9]+”, $str) (what string does that validate?) Example 1. Examples of valid patterns * /<\/\w+>/ * |(\d{3})-\d+|Sm * /^(?i)php[34]/ * {^\s+(\s+)?$} Example 2. Examples of invalid patterns * /href='(.*)’ – missing ending delimiter * /\w+\s*\w+/J – unknown modifier ‘J’ * 1-\d3-\d3-\d4| – missing starting delimiter Some useful PHP Keywordsand their use (php.net man pages) preg_split(PHP 3>= 3.0.9, PHP 4 ) preg_split — Split string by a regular expression ( string pattern, string subject [, int limit [, int flags]]) Returns an array containing substrings of subject split along boundaries matched by pattern. If limit is specified, then only substrings up to limit are returned, and if limit is -1, it which is useful for specifying the flags. flags can be any combination of the following flags (combined with bitwise | operator): PREG_SPLIT_NO_EMPTY If this flag is set, only non-empty pieces will be returned by preg_split(). PREG_SPLIT_DELIM_CAPTURE If this flag is set, will be captured and PREG_SPLIT_OFFSET_CAPTURE If this flag is set, for every occuring match the appendant string offset will also be the return value in an array where every element is an array consisting of the matched string at offset 0 and it’s string offset into subject This flag is available since PHP 4.3.0 . Example 1. preg_split() example : Get the parts of a search string <?php // split the phrase by any number of commas or space characters, // which include " ", \r, \t, \n and \f $keywords = preg_split ("/[\s,]+/", "hypertext language, programming"); ?> Example 2. Splitting a string into component characters <?php $str = 'string'; $chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY); print_r($chars); ?> Example 3. Splitting a string into matches and their offsets <?php $str = 'hypertext language programming'; $chars = preg_split('/ /', $str, -1, PREG_SPLIT_OFFSET_CAPTURE); print_r($chars); ?> will yield: Array ( [0] => Array ( [0] => hypertext [1] => 0 ) [1] => Array ( [0] => language [1] => 10 ) [2] => Array ( [0] => programming [1] => 19 ) ) Note: Parameter flags was added in PHP 4 Beta 3. /colou?r/ Usually you'll want to stick with the default, but if you need to use the forward slash a lot in the actual pattern (especially if you're dealing with pathnames) you might want to use something else: !/root/home/random! To make a match case-insensitive, all you need to do is append the option i to the pattern: /colou?r/i Perl-style functions support these extra metacharacters (this is not a full list): \b A word boundary, the spot between word (\w) and non-word (\W) characters. \B A non-word boundary. \d A single digit character. \D A single non-digit character. \n The newline character. (ASCII 10) \r The carriage return character. (ASCII 13) \s A single whitespace character. \S A single non-whitespace character. \t The tab character. (ASCII 9) \w A single word character - alphanumeric and underscore. \W A single non-word character. Example: /\bhomer\b/ Have a donut, Homer no match A tale of homeric proportions! no match Do you think he can hit a homer? match Corresponding to ereg() is preg_match(). Syntax: preg_match(pattern (string), target (string), optional_array); Example: $pattern = "/\b(do(ugh)?nut)\b.*\b(Homer|Fred)\b/i"; $target = "Have a donut, Homer."; if (preg_match($pattern, $target, $matches)) { print("<P>Match: $reg[0]</P>"); print("<P>Pastry: $reg[1]</P>"); print("<P>Variant: $reg[2]</P>"); print("<P>Name: $reg[3]</P>"); } else { print("No match."); } Results: Match: donut, Homer Pastry: donut Variant: [blank because there was no "ugh"] Name: Homer If you use the $target "Doughnut, Frederick?" there will be no match, since there has to be a word boundary after Fred. but "Doughnut, fred?" will match since we've specified it to be case-insensitive. Contributed code which is applicable (and very useful!) mkr at binarywerks dot dk A (AFAIK) correct implementation of Ipv4 validation, this one supports optional ranges (CIDR notation) and it validates numbers from 0-255 only in the address part, and 1-32 only after the / <? function valid_ipv4($ip_addr) { $num="([0-9]|1?\d\d|2[0-4]\d|25[0-5])"; $range="([1-9]|1\d|2\d|3[0-2])"; if(preg_match("/^$num\.$num\.$num\.$num(\/$range)?$/",$ip_addr)) { return 1; } return 0; } $ip_array[] = "127.0.0.1"; $ip_array[] = "127.0.0.256"; $ip_array[] = "127.0.0.1/36"; $ip_array[] = "127.0.0.1/1"; foreach ($ip_array as $ip_addr) { if(valid_ipv4($ip_addr)) { echo "$ip_addr is valid<BR>\n"; } else { echo "$ip_addr is NOT valid<BR>\n"; } } ?> plenque at hotmail dot com I wrote a function that checks if a given regular expression is valid. I think some of you might find it useful. It changes the error_handler and restores it, I didn't find any other way to do it. Function IsRegExp ($sREGEXP) { $sPREVIOUSHANDLER = Set_Error_Handler ("TrapError"); Preg_Match ($sREGEXP, ""); Restore_Error_Handler ($sPREVIOUSHANDLER); Return !TrapError (); } Function TrapError () { Static $iERRORES; If (!Func_Num_Args ()) { $iRETORNO = $iERRORES; $iERRORES = 0; Return $iRETORNO; } Else { $iERRORES++; } } PHP Get_title tag code which uses simple regex and nice php string functions (As from Zend PHP) <?php function get_title_tag($chaine){ $fp = fopen ($chaine, 'r'); while (! feof ($fp)){ $contenu .= fgets ($fp, 1024); if (stristr($contenu, '<\title>' )){ break; } } if (eregi("", $contenu, $out)) { return $out[1]; } else{ return false; } } ?> My Own 'Visitor Trac' code which uses regex XML parsing methods <?php $referer = $_SERVER['HTTP_REFERER']; $filename = $_SERVER[REMOTE_ADDR] . '.txt'; //print_r($_SERVER); if (file_exists($filename)){ $lastvisit = filectime($filename); $currentdate = date('U'); $difference = round(($currentdate - $lastvisit)/84600); if ($difference > 7) { unlink($filename); $fp = fopen($filename, "a"); } else $fp = fopen($filename, "a"); } else $fp = fopen($filename, "a"); if (!$_SERVER['HTTP_REFERER']) $url_test = 'http://dinki.mine.nu/weblog/'; else $url_test = $_SERVER['HTTP_REFERER']; $new_title = return_title ($url_test); //print $new_title; $new_name = stripslashes("<beg>$new_title\n"); $new_URL = stripslashes("<beg>$referer\n"); fwrite($fp,$new_URL); fwrite($fp,$new_name); fclose($fp); $fp = fopen($filename, "r"); $file = implode('', file ($filename)); $foo = preg_split("/<beg>/",$file); $number = count($foo); //print $number; if ($number > 11) { fclose($fp); $fp = fopen($filename, "w"); $count = $number - 10; while ($count < $number) { $print1 = $foo[$count]; $print2 = $foo[$count+1]; print " <img src = arrow.gif> "; print "<a href=$print1>$print2</a>"; //print $count; $count += 2; $new_name = stripslashes("<beg>$print2"); $new_URL = stripslashes("<beg>$print1"); fwrite($fp,$new_URL); fwrite($fp,$new_name); } fclose($fp); } //print_r($foo); else { $count = 1; while ($count <= $number) { $print1 = $foo[$count]; $print2 = $foo[$count+1]; print " <img src = arrow.gif> "; print "<a href=$print1>$print2</a>"; //print $count; $count += 2; } fclose($fp); } function return_title($url) { print $filename." ".$difference; $array = file ($url); for ($i = 0; $i < count($array); $i++) { if (preg_match("/<title>(.*)<\/title>/i",$array[$i], $tag_contents)) { $title = $tag_contents[1]; $title = strip_tags($title); } } return $title; } ?> |