How can I remove all leading and trailing punctuation?

Ok. So basically you want to find some pattern in your string and act if the pattern in matched.

Doing this the naiive way would be tedious. The naiive solution could involve something like

while(myString.StartsWith("." || "," || ";" || ...)
  myString = myString.Substring(1);

If you wanted to do a bit more complex task, it could be even impossible to do the way i mentioned.

Thats why we use regular expressions. Its a "language" with which you can define a pattern. the computer will be able to say, if a string matches that pattern. To learn about regular expressions, just type it into google. One of the first links: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial

As for your problem, you could try this:

myString.replaceFirst("^[^a-zA-Z]+", "")

The meaning of the regex:

  • the first ^ means that in this pattern, what comes next has to be at the start of the string.

  • The [] define the chars. In this case, those are things that are NOT (the second ^) letters (a-zA-Z).

  • The + sign means that the thing before it can be repeated and still match the regex.

You can use a similar regex to remove trailing chars.

myString.replaceAll("[^a-zA-Z]+$", "");

the $ means "at the end of the string"


You could use a regular expression:

private static final Pattern PATTERN =
    Pattern.compile("^\\p{Punct}*(.*?)\\p{Punct}*$");

public static String trimPunctuation(String s) {
  Matcher m = PATTERN.matcher(s);
  m.find();
  return m.group(1);
}

The boundary matchers ^ and $ ensure the whole input is matched.

A dot . matches any single character.

A star * means "match the preceding thing zero or more times".

The parentheses () define a capturing group whose value is retrieved by calling Matcher.group(1).

The ? in (.*?) means you want the match to be non-greedy, otherwise the trailing punctuation would be included in the group.