Regex Help- Escaping Characters And Matcher Method

The simplest solution is to use the \W character as your entire expression, it matches any non-word character. Unfortunately, this solution would also match any whitespace characters, and ignore the underscore character (_).

Here's the expression I would use:

(_|[^\w\s])

What does it mean?

  • 1st Capturing Group (_|[^\w\s])
    Matches either alternative

    • 1st Alternative _
      • _ matches the character _ literally (case sensitive)
    • 2nd Alternative [^\w\s]
      • Match a single character not present in the list below [^\w\s]
      • \w matches any word character (equal to [a-zA-Z0-9_])
      • \s matches any whitespace character (equal to [\r\n\t\f\v ])

Here are some examples:

String expression = '(_|[^\\w\\s])';

String allPunctuation = '~!@#$%^*()_+|}{":?><`=;/.,][-\'\\';
String input1 = 'This is a test!', output1 = 'This is a test';
String input2 = 'This is a test...', output2 = 'This is a test';
String input3 = '([{This_is_a_test}])', output3 = 'Thisisatest';

system.assertEquals('', allPunctuation.replaceAll(expression, ''));
system.assertEquals(output1, input1.replaceAll(expression, ''));
system.assertEquals(output2, input2.replaceAll(expression, ''));
system.assertEquals(output3, input3.replaceAll(expression, ''));

Given example 3, you may want to change things up and replace underscores with space characters instead. Then you could simplify somewhat:

String sanitize(String name)
{
    if (name == null) return name;
    return name.replaceAll('[^\\w\\s]', '')
        .replaceAll('_', ' ').trim();
}

String allPunctuation = '~!@#$%^*()_+|}{":?><`=;/.,][-\'\\';
String input1 = 'This is a test!     ', output1 = 'This is a test';
String input2 = 'This is a test...   ', output2 = 'This is a test';
String input3 = '([{This_is_a_test}])', output3 = 'This is a test';

system.assertEquals('', sanitize(allPunctuation));
system.assertEquals(output1, sanitize(input1));
system.assertEquals(output2, sanitize(input2));
system.assertEquals(output3, sanitize(input3));

You can match all punctuation using \\p{Punct}, as mentioned in the Pattern class, which matches:

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

For example, the following code results in an empty String:

String s = '~!@#$%^*()_+|}{":?><`=;/.,][-\'\\';
System.debug(s.replaceAll('\\p{Punct}',''));

Note that the "escapes" are not disappearing, they're being compiled. If you want a literal backslash escape, you have to escape it twice:

String s = '~!@#$%^*()_+|}{":?><`=;/.,][-\\\'\\\\';

Where \\\' results in the pattern/matcher/regexp engine seeing \', and \\\\ results in the engine seeing \\.

Adrian's solution also works, but I think that \p{Punct} is a bit more explicit with declaring the intent of your code (to match any punctuation).