RegularExpressions

last edited April 27, 2010 23:34:11 (119.245.19.204)
CocoaDev is sponsored by: Panic: Shockingly good Mac software!

A regular expression is:
  • a string whose pattern describes a set of strings.
  • is a set of characters (string) that specify a pattern.
Also known as a regex or regexp.


References:


Software:


I have looked into a few RegularExpressions libraries for use in my program, but all the ones I have used are rather slow. I've resorted to my original way of piping data to a Perl script which handles the regular expressions and returns it back:

#!/usr/bin/perl
# script for removing HTML
$str = "";
while ($line=<STDIN>)
{
	$str .= $line;
}
$str =~ s/<(?:[^>'"]*|(['"]).*?)*>//gsi;
print $str;
If anyone knows of a fast way of doing regular expressions that is thread safe, please let me know!

Maybe not as easy to work with, but you could use flex to generate the scanner (I _think_ the experimental C++ output is thread safe). There are tools similar to flex, i.e. which generate a DFA-based matcher, and thus are much faster than the misc. regexp libraries -- as for libraries, perl is probably the most optimized one!

P.S. To do the above (with perl) you can call (can't disable the italic, so view source):

perl -pe 's/<(?:[^>'\"]*|(['\"]).*?)*>//gsi'


I don't know if it's any faster than what you've already tried, or even usable, but NSString has some undocumented regexp functions.


There is a NSStringRegExp? addition: http://homepage.mac.com/jrc/contrib/

Note that this addition is not unicode-safe; in particular, the returned ranges are incorrect for strings containing non-ASCII characters, which will probably result in thrown exceptions as the substring operations fail.

Another wrapper around the regexec from C is also available at: http://www.spikesoft.ch/?p=24


No need for an external solution to test a NSString object against a regexp! Regular Expressions for NSString http://www.stiefels.net/2007/01/24/regular-expressions-for-nsstring/

Unfortunately in many cases being able to test only for matching without being able to get the location or contents of match groups is as close to useless as makes no difference.