EZY Regular Expressions

back to main page

 

EZY Prolog has built-in support for regular expressions.

 

Below is sample of EZY Prolog programs which demonstrates how to use regular expressions.

 

 

prolog_main():-

syspath ( PATH , _ ),

format ( RESULTFILE , "%sRegexpression_results.txt" , PATH ),

tell ( RESULTFILE ),

write ( "EZY PROLOG Regular Expressions test\n" ),

print_regexp_test_header ,

MAX = 1 ,

for ( I , 1 , MAX ),

nl ,

write ( "Running test " , I , " of " , MAX ),nl ,

regexp_test ,

pie_process_events ,

I = MAX ,

told (),

write ( "Test completed - see results in " , RESULTFILE ),nl ,

file_str ( RESULTFILE , RESULTS ),

write ( RESULTS ),nl ,

! .

 

regexp_test():-

write ( "Possible Regular Expression syntax:" ),

nl ,

regexp_syntax_values ( SYNTAX ),

write ( "" , SYNTAX ),nl ,

fail .

regexp_test():-

write ( "Start test" ),nl ,

syspath ( PATH , _ ),

format ( FULLNAME , "%sRegexpression_Tests.txt" , PATH ),

write ( "Regexp file:" , FULLNAME ),nl ,

see ( FULLNAME ),

repeat ,

readln ( Line ),

regular_expression_extract( Line , REGESP , STRING ),

write ( "/****EVALUATE*/\nRegular Expression:" , REGESP ),nl ,

write ( "String:" , STRING ),nl ,

regular_expression_evaluate( STRING , REGESP , RESULT ),

write ( "Results:" , RESULT ),nl ,

end_of_file ( FULLNAME ),

! .

regexp_test():-! .

 

regular_expression_extract( LINE , REGESP , STRING ):-

searchstring ( LINE , "" , POS ),

P =  POS + 1 ,

frontstr ( P , LINE , REGESP , S1 ),

frontchar ( S1 , _ , STRING ),

! .

regular_expression_extract( Line , Line , "Failed" ):-

! .

 

regular_expression_evaluate( STRING , REGESP , RESULT ):-

regexp_syntax_values ( SYNTAX_LIST ),

found_regexp_syntax( SYNTAX_LIST , STRING , REGESP , RESULT ),

! .

regular_expression_evaluate( STRING , _ , RESULT ):-

format ( RESULT , "Fail in {%s}" , STRING ),

! .

 

 

found_regexp_syntax([], _ , _ , _ ):-! ,fail .

found_regexp_syntax([ SYNTAX |_], STRING , REGESP , RESULT ):-

regexp_search ( SYNTAX , STRING , REGESP , FOUNDPOS , LENGTH ),

LENGTH > 0 ,

substring ( STRING , FOUNDPOS , LENGTH , FOUND_STRING ),

format ( RESULT , "Syntax [%], Found: {%s} at % , Len % " ,

 SYNTAX , FOUND_STRING , FOUNDPOS , LENGTH ),

! .

found_regexp_syntax([ _ |SYNTAX_LIST], STRING , REGESP , RESULT ):-

found_regexp_syntax( SYNTAX_LIST , STRING , REGESP , RESULT ),

! .

 

 

print_regexp_test_header():-

write ( "This is a test for Pattern Search via Regular Expressions" ),

nl ,

regexp_version ( VERSION ),

write ( "Regular expression engine: " , VERSION ),

nl ,

! .

 

 

 

Output of the regular expressions test

 

 

 

 

Regular Expression:

String

Results

-?[0-9]+

Valid Integer -93334

Syntax emacs, Found: {-93334} at 15 , Len 6

(\+|-)?[0-9]*\.[0-9]*([Ee](\+|-)?[0-9]+)

Valid Real Value +734.11e-1234

Syntax awk, Found: {+734.11e-1234} at 18 , Len 13

http:+(\\|//)+([a-zA-Z_0-9.]*+(\\|/))+([a-zA-Z_0-9]*).htm*

URL in form http://www.ser.com/dir/file.htm

Syntax awk, Found: {http://www.ser.com/dir/file.htm} at 13 , Len 31

(\([0-9]{3}\))+(-| )+([0-9]{3})+(-| )+([0-9]{4})

Valid Phone number (415)-999-8888

Syntax posix_awk, Found: {(415)-999-8888} at 38 , Len 14

(\([0-9]{3}\))+(-| )+([0-9]{3})+(-| )+([0-9]{4})

Valid Phone number (415) 999-8888

Syntax posix_awk, Found: {(415) 999-8888} at 38 , Len 14

([0-9]{2})([ ]+)([a-z]+)([ ]+)([1-2]([0-9]{3}))

Valid date DD Month Year 12 February 2001

Syntax posix_awk, Found: {12 February 2001} at 44 , Len 19

([0-1][0-2](\\|/|-|.))([0-9]{2}(\\|/|-|.))([1-2][0-9]{3})

Valid date MM-DD-YYYY 02/02/2000

Syntax posix_awk, Found: {02/02/2000} at 41 , Len 10

([0-1][0-2](\\|/|-|.))([0-9]{2}(\\|/|-|.))([1-2][0-9]{3})

Valid date MM-DD-YYYY 12-02/2000

Syntax posix_awk, Found: {12-02/2000} at 41 , Len 10

([0-1][0-2](\\|/|-|.))([0-9]{2}(\\|/|-|.))([1-2][0-9]{3})

Invalid date MM-DD-YYYY 22/02/2000

Fail {Invalid date MM-DD-YYYY 22/02/2000}

Chapter+[ ][0-9]

Chapter followed by a single whitespace character (space, tab, newline, etc), followed by a single digit Chapter 9 bla-bla

Syntax emacs, Found: {Chapter 9} at 106 , Len 9

Chapter+[ ]\w*

Chapter followed by a space, followed by a word character

Syntax emacs, Found: {Chapter followed} at 1 , Len 16

((jan[a-z]*)|(feb[a-z]*)|(mar[a-z]*)|(apr[a-z]*)|(may)|(ju[a-z]*)|(aug[a-z]*)|(sep[a-z]*)|(oct[a-z]*)|(nov[a-z]*)|(dec[a-z]*))+([ ]+|\.|-)+([0-9]{2}([ ]+|.|-))(([1-2][0-9]{3})|[0-9]{2})

Line with march 12 1999 or mar 12 1999

Syntax posix_awk, Found: {march 12 1999} at 29 , Len 13

((jan[a-z]*)|(feb[a-z]*)|(mar[a-z]*)|(apr[a-z]*)|(may)|(ju[a-z]*)|(aug[a-z]*)|(sep[a-z]*)|(oct[a-z]*)|(nov[a-z]*)|(dec[a-z]*))+([ ]+|\.|-)+([0-9]{2}([ ]+|.|-))(([1-2][0-9]{3})|[0-9]{2})

Text with march 12, 99 or march 12, 99

Fail {Text with march 12, 99 or march 12, 99}

((jan[a-z]*)|(feb[a-z]*)|(mar[a-z]*)|(apr[a-z]*)|(may)|(ju[a-z]*)|(aug[a-z]*)|(sep[a-z]*)|(oct[a-z]*)|(nov[a-z]*)|(dec[a-z]*))+([ ]+|\.|-)+([0-9]{2}([ ]+|.|-))(([1-2][0-9]{3})|[0-9]{2})

Para with mar 12, 1999 or mar 12, 99

Fail {Para with mar 12, 1999 or mar 12, 99}

(d{1,2}):(\d\d)\s*(am|pm|\s*)

04:30 pm

Fail {04:30 pm}

(\d{1,2}):(\d\d)\s*(am|pm|\s*)

12:45am

Fail {12:45am}

(\d{1,2}):(\d\d)\s*(am|pm|\s*)

14:45

Fail {14:45}

ab{2}

matches a string that has an a followed by exactly two b's abb

Syntax posix_awk, Found: {abb} at 78 , Len 3

ab{2,}

there are at least two b's "abb", "abbbb", etc.)

Syntax posix_awk, Found: {abb} at 47 , Len 3

ab{3,5}

from three to five b's ("abbb", "abbbb", or "abbbbb")

Syntax posix_awk, Found: {abbb} at 44 , Len 4

a?b+

a possible a followed by one or more b's ending a string

Syntax emacs, Found: {b} at 8 , Len 1

a(bc)*

matches a string that has an a followed by zero or more copies of the sequence bc

Syntax awk, Found: {a} at 2 , Len 1

a(bc){1,5}

one through five copies of bc such as abcbc

Syntax posix_awk, Found: {abcbc} at 64 , Len 5

hi|hello

matches a string that has either hi or hello in it;

Syntax awk, Found: {hi} at 52 , Len 2

(b|cd)ef

a string that has either bef or cdef

Syntax awk, Found: {bef} at 44 , Len 3

(a|b)*c

a string that has bc sequence of alternating a's and b's ending in a c

Syntax awk, Found: {c} at 25 , Len 1

(a|b)*c

a string that has ac sequence of alternating a's and b's ending in a c

Syntax awk, Found: {c} at 25 , Len 1

a.[0-9]

matches a string that has an a followed by one character and a digit a0234

Syntax emacs, Found: {a02} at 88 , Len 3

a..[0-9]

matches a string that has an a followed by one character and a digit a0234

Syntax emacs, Found: {a023} at 88 , Len 4

.{5}

a string with exactly 5 characters

Syntax posix_awk, Found: {a str} at 1 , Len 5

[0-9]%

a string that has a single digit before a percent sign 6%

Syntax emacs, Found: {674} at 2 , Len %

,[a-zA-Z0-9]

a string that ends in a comma followed by an alphanumeric character ,C

Syntax emacs, Found: {,C} at 87 , Len 2

%[^a-zA-Z]%

matches a string with a character that is not a letter between two percent signs %6%

Syntax emacs, Found: { 1003} at % , Len %

$([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(\.[0-9]{1,2})?

money as $10000.00

Syntax awk, Found: {$10000} at 10 , Len 6

$([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(\.[0-9]{1,2})?

money as $10,000.00

Syntax awk, Found: {$10} at 10 , Len 3

$([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(\.[0-9]{1,2})?

money without the cents: $10000

Syntax awk, Found: {$10000} at 26 , Len 6

$([0-9]+|[0-9]{1,3}(,[0-9]{3})*)(\.[0-9]{1,2})?

money without the cents: $10,000

Syntax awk, Found: {$10} at 26 , Len 3

\([A-Za-z]\:\)\([A-Za-z]+\)\.\([A-Za-z]+\)

MS DOS file names c:file.txt

Syntax emacs, Found: {c:file.txt} at 19 , Len 10

[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*

Find email address username@servername.com

Syntax awk, Found: {username@servername.com} at 38 , Len 23

[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*

Find email address name.department@servername.com

Syntax awk, Found: {name.department@servername.com} at 38 , Len 30

Chapter [0-9]

Chapter X, Chapter 1, bla-bla-bla

Syntax emacs, Found: {Chapter 1} at 12 , Len 9

Chapter [^0-9]

Chapter 1, Chapter X, bla-bla-bla

Syntax emacs, Found: {Chapter X} at 12 , Len 9

a+x

ax, bx, aax, abx, bax, bbx, aaax, aabx, abax, abbx, baax, babx, bbax, bbbx, aaaax ....

Syntax emacs, Found: {ax} at 1 , Len 2

[abcde]x

ax, bx, cx, dx, ex

Syntax emacs, Found: {ax} at 1 , Len 2

[a-e]x

ax, bx, cx, dx, ex

Syntax emacs, Found: {ax} at 1 , Len 2

[-ae]x

ax, bx, cx, dx, ex

Syntax emacs, Found: {ax} at 1 , Len 2

[ae-]x

-x, ax, ex

Syntax emacs, Found: {-x} at 1 , Len 2

[ae-]x

-x, ax, ex,

Syntax emacs, Found: {-x} at 2 , Len 2

[a-e-[bd]]x

ax, cx, ex

Fail {ax, cx, ex}

[^0-9]x

any non-digit character followed by the character ZY6x ZCVx BBB

Syntax emacs, Found: {Vx} at 59 , Len 2

[0-9]x

any digit character followed by the character ZY6x ZCVx BBB

Syntax emacs, Found: {6x} at 50 , Len 2

\Dx

any non-digit character followed by the character Bx9 BBxF

Fail { any non-digit character followed by the character Bx9 BBxF}

.*abc.*

1x2abc, abc1x2, z3456abchooray ....

Syntax emacs, Found: {1x2abc, abc1x2, z3456abchooray ....} at 1 , Len 35

[0-9][0-9]

two digits 98 bbbb

Syntax emacs, Found: {98} at 12 , Len 2

ab{2}x

abbx

Syntax posix_awk, Found: {abbx} at 1 , Len 4

ab{2,4}x

abbx, abbbx, abbbbx

Syntax posix_awk, Found: {abbx} at 1 , Len 4

ab{2,}x

abbx, abbbx, abbbbx ....

Syntax posix_awk, Found: {abbx} at 1 , Len 4

(ab){2}x

ababx

Syntax posix_awk, Found: {ababx} at 1 , Len 5

a*

aaabc

Syntax emacs, Found: {aaa} at 1 , Len 3

abc

abc

Syntax emacs, Found: {abc} at 1 , Len 3

abc

xbc

Fail {xbc}

abc

axc

Fail {axc}

abc

abx

Fail {abx}

abc

xabcy

Syntax emacs, Found: {abc} at 2 , Len 3

abc

ababc

Syntax emacs, Found: {abc} at 3 , Len 3

ab*c

abc

Syntax emacs, Found: {abc} at 1 , Len 3

ab*bc

abc

Syntax emacs, Found: {abc} at 1 , Len 3

ab*bc

abbc

Syntax emacs, Found: {abbc} at 1 , Len 4

ab*bc

abbbbc

Syntax emacs, Found: {abbbbc} at 1 , Len 6

ab+bc

abbc

Syntax emacs, Found: {abbc} at 1 , Len 4

ab+bc

abc

Fail {abc}

ab+bc

abq

Fail {abq}

ab+bc

abbbbc

Syntax emacs, Found: {abbbbc} at 1 , Len 6

ab?bc

abbc

Syntax emacs, Found: {abbc} at 1 , Len 4

ab?bc

abc

Syntax emacs, Found: {abc} at 1 , Len 3

ab?bc

abbbbc

Fail {abbbbc}

ab?c

abc

Syntax emacs, Found: {abc} at 1 , Len 3

^abc$

abc

Syntax emacs, Found: {abc} at 1 , Len 3

^abc$

abcc

Fail {abcc}

^abc

abcc

Syntax emacs, Found: {abc} at 1 , Len 3

^abc$

aabc

Fail {aabc}

abc$

aabc

Syntax emacs, Found: {abc} at 2 , Len 3

^

abc

Fail {abc}

$

abc

Fail {abc}

a.c

abc

Syntax emacs, Found: {abc} at 1 , Len 3

a.c

axc

Syntax emacs, Found: {axc} at 1 , Len 3

a.*c

axyzc

Syntax emacs, Found: {axyzc} at 1 , Len 5

a.*c

axyzd

Fail {axyzd}

a[bc]d

abc

Fail {abc}

a[bc]d

abd

Syntax emacs, Found: {abd} at 1 , Len 3

a[b-d]e

abd

Fail {abd}

a[b-d]e

ace

Syntax emacs, Found: {ace} at 1 , Len 3

a[b-d]

aac

Syntax emacs, Found: {ac} at 2 , Len 2

a[-b]

a-

Syntax emacs, Found: {a-} at 1 , Len 2

a[b-]

a-

Syntax emacs, Found: {a-} at 1 , Len 2

[k]

ab

Fail {ab}

a[b-a]

-

Fail {-}

a[]b

-

Fail {-}

a[

-

Fail {-}

a]

a]

Syntax emacs, Found: {a]} at 1 , Len 2

a[]]b

a]b

Syntax emacs, Found: {a]b} at 1 , Len 3

a[^bc]d

aed

Syntax emacs, Found: {aed} at 1 , Len 3

a[^bc]d

abd

Fail {abd}

a[^-b]c

adc

Syntax emacs, Found: {adc} at 1 , Len 3

a[^-b]c

a-c

Fail {a-c}

a[^]b]c

a]c

Fail {a]c}

a[^]b]c

adc

Syntax emacs, Found: {adc} at 1 , Len 3

ab|cd

abc

Syntax awk, Found: {ab} at 1 , Len 2

ab|cd

abcd

Syntax awk, Found: {ab} at 1 , Len 2

()ef

def

Syntax awk, Found: {ef} at 2 , Len 2

()*

-

Fail {-}

*a

-

Fail {-}

^*

-

Fail {-}

$*

-

Fail {-}

(*)b

-

Fail {-}

$b

b

Fail {b}

a\

-

Fail {-}

a\(b

a(b

Syntax awk, Found: {a(b} at 1 , Len 3

a\(*b

ab

Syntax awk, Found: {ab} at 1 , Len 2

a\(*b

a((b

Syntax awk, Found: {a((b} at 1 , Len 4

a\\b

a\b

Syntax emacs, Found: {a\b} at 1 , Len 3

abc)

-

Fail {-}

(abc

-

Fail {-}

((a))

abc

Syntax awk, Found: {a} at 1 , Len 1

(a)b(c)

abc

Syntax awk, Found: {abc} at 1 , Len 3

a+b+c

aabbabc

Syntax emacs, Found: {abc} at 5 , Len 3

a**

-

Fail {-}

a*?

-

Fail {-}

(a*)*

-

Fail {-}

(a*)+

-

Fail {-}

(a|)*

-

Fail {-}

(a*|b)*

-

Fail {-}

(a+|b)*

ab

Syntax awk, Found: {ab} at 1 , Len 2

(a+|b)+

ab

Syntax awk, Found: {ab} at 1 , Len 2

(a+|b)?

ab

Syntax awk, Found: {a} at 1 , Len 1

[^ab]*

cde

Syntax emacs, Found: {cde} at 1 , Len 3

(^)*

-

Fail {-}

(ab|)*

-

Fail {-}

)(

-

Fail {-}

 

abc

Fail {abc}

abc

 

Fail {}

a*

 

Fail {}

abcd

abcd

Syntax emacs, Found: {abcd} at 1 , Len 4

a(bc)d

abcd

Syntax awk, Found: {abcd} at 1 , Len 4

([abc])*d

abbbcd

Syntax awk, Found: {abbbcd} at 1 , Len 6

([abc])*bcd

abcd

Syntax awk, Found: {abcd} at 1 , Len 4

a|b|c|d|e

e

Syntax awk, Found: {e} at 1 , Len 1

(a|b|c|d|e)f

ef

Syntax awk, Found: {ef} at 1 , Len 2

((a*|b))*

-

Fail {-}

abcd*efg

abcdefg

Syntax emacs, Found: {abcdefg} at 1 , Len 7

ab*

xabyabbbz

Syntax emacs, Found: {ab} at 2 , Len 2

ab*

xayabbbz

Syntax emacs, Found: {a} at 2 , Len 1

(ab|cd)e

abcde

Syntax awk, Found: {cde} at 3 , Len 3

[abhgefdc]ij

hij

Syntax emacs, Found: {hij} at 1 , Len 3

^(ab|cd)e

abcde

Fail {abcde}

(abc|)ef

abcdef

Syntax awk, Found: {ef} at 5 , Len 2

(a|b)c*d

abcd

Syntax awk, Found: {bcd} at 2 , Len 3

(ab|ab*)bc

abc

Syntax awk, Found: {abc} at 1 , Len 3

a([bc]*)c*

abc

Syntax awk, Found: {abc} at 1 , Len 3

a([bc]*)(c*d)

abcd

Syntax awk, Found: {abcd} at 1 , Len 4

a([bc]+)(c*d)

abcd

Syntax awk, Found: {abcd} at 1 , Len 4

a([bc]*)(c+d)

abcd

Syntax awk, Found: {abcd} at 1 , Len 4

a[bcd]*dcdcde

adcdcde

Syntax emacs, Found: {adcdcde} at 1 , Len 7

a[bcd]+dcdcde

adcdcde

Fail {adcdcde}

(ab|a)b*c

abc

Syntax awk, Found: {abc} at 1 , Len 3

((a)(b)c)(d)

abcd

Syntax awk, Found: {abcd} at 1 , Len 4

[ -~]*

abcy

Syntax emacs, Found: {abc} at 1 , Len 3

[ -~ -~]*

abc

Syntax emacs, Found: {abc} at 1 , Len 3

[ -~ -~ -~]*

abc

Syntax emacs, Found: {abc} at 1 , Len 3

[ -~ -~ -~ -~]*

abc

Syntax emacs, Found: {abc} at 1 , Len 3

[ -~ -~ -~ -~ -~]*

abc

Syntax emacs, Found: {abc} at 1 , Len 3

[ -~ -~ -~ -~ -~ -~]*

abc

Syntax emacs, Found: {abc} at 1 , Len 3

[ -~ -~ -~ -~ -~ -~ -~]*

abc

Syntax emacs, Found: {abc} at 1 , Len 3

[a-zA-Z_][a-zA-Z0-9_]*

alpha

Syntax emacs, Found: {alpha} at 1 , Len 5

^a(bc+|b[eh])g|.h$

abh

Syntax awk, Found: {bh} at 2 , Len 2

(bc+d$|ef*g.|h?i(j|k))

effgz

Syntax awk, Found: {effgz} at 1 , Len 5

(bc+d$|ef*g.|h?i(j|k))

ij

Syntax awk, Found: {ij} at 1 , Len 2

(bc+d$|ef*g.|h?i(j|k))

effg

Fail {effg}

(bc+d$|ef*g.|h?i(j|k))

bcdd

Fail {bcdd}

(bc+d$|ef*g.|h?i(j|k))

reffgz

Syntax awk, Found: {effgz} at 2 , Len 5

((((((((((a))))))))))

-

Fail {-}

(((((((((a)))))))))

a

Syntax awk, Found: {a} at 1 , Len 1

multiple words of text

uh-uh

Fail {uh-uh}

multiple words

multiple words, yeah

Syntax emacs, Found: {multiple words} at 1 , Len 14

(.*)c(.*)

abcde

Syntax awk, Found: {abcde} at 1 , Len 5

\((.*), (.*)\)

(a, b)

Syntax awk, Found: {(a, b)} at 1 , Len 6

 


 

Copyright 1998-2002 EDMGROUP (Australia)

 

Last Updated: July 29, 2002