Thursday, October 11, 2007

Using regex for matching multiple words anywhere in the sentence

This blog post is to save time for those developers who want to use Regular Expressions to determine whether ALL of the multiple words are located anywhere in the input string. It took me several hours to make it work right for 3 sets with 3 alternative words in each set. Google was not to much help, only a couple of pages contained useful examples.

It began with me implementing regex (regular expressions) matching mechanism for our Sametime Bot to make a more flexible pattern-matching solution than current wildcard matching. So that instead of simply specifying *helpdesk* to match all incoming questions where word "helpdesk" is present, with regex it is possible to fine-tune the match and handle "what is phone number to helpdesk?" incoming question and "How can I contact helpdesk on weekends" question differently. Matching capabilities of regex are amazing, there are very little operations you can't do with it.

Pattern "any one word is enough":
helpdesk|assistance|support

Matches for: Can I get some assistance? How can I contact support? Does helpdesk have an email address?
Not matches for: What's the time? Can you assist me?

-------------------------------------

Pattern "all words must be present":
^(?=.*?(phone|fone|call|contact))(?=.*?(help|assistance|support)).*$

Matches for: What is the phone number to helpdesk? Can I call to support department from my cell phone? How can I contact helpdesk?
Not matches for: I need help! I want to call my mom. My phone doesn't work. Charlie, Charlie, this is Bravo, send more air support!

-------------------------------------

Pattern "all words must be present, but NOT that one":
^(?=.*?(phone|fone|call|contact))(?=.*?(help|assistance|support))((?!weekend|night).)*$

Matches for: I want to come in contact with support now. I need phone assistance to install ABC software today.
Not matches for: What number can i call on weekends to get help with this tool? What phone number can I call to contact helpdesk at night?

-------------------------------------


With a little help from blog post on lekkimworld, I created this LotusScript testing module so the creator of the pattern can test the pattern functionality by providing a text string which is matched to the pre-defined regex expression. The result of the test is either Match or Not match.

Sub Click(Source As Button)
Dim workspace As New NotesUIWorkspace
Dim uidoc As NotesUIDocument
Dim doc As NotesDocument
Set uidoc = workspace.CurrentDocument
Set doc=uidoc.Document
Dim regexp As Variant
Dim result As Integer
Set regexp = CreateObject("VBScript.RegExp")
regexp.IgnoreCase = True
uinput=Inputbox("Input text to test for pattern match:", "Regex tester", userinput)

userinput=uinput
regexp.Pattern = doc.RegmatchSubject(0)
result= regexp.Test(userinput)
If result = -1 Then
Msgbox "Regex match found!"
Else
Msgbox "Regex match NOT found!"
End If
Set regexp=Nothing
End Sub

Online demo of regex questions to Bot: http://www.botstation.com/sametime/bot_regex.html

Sametime Bot regex


Tags: