Stories about Software


Splitting Strings With Substrings

The String.Split() method in C# is probably something with which any C# developer is familiar.

Here, “tokens” will be an array of strings containing “Erik” and “Dietrich”. It’s not exactly earth shattering to tokenize a string in this fashion. And some incarnation or another of this predates .NET, C# and probably even my time on this planet.

It’s Actually Harder Than You’d Think to Split Strings Using Sub-Strings

But what about if we want to split over a string instead?

What about if we have “..” as a delimiter instead of ‘.’ and I want to split “Erik..Dietrich” in the same way? Probably an overload of String.Split() that takes a string instead of a char, right? Well, actually no. As it turns out, the API for string.Split() is pretty unintuitive.

First of all, that call to x.Split(‘.’) is not actually invoking Split(char), but rather Split(params char[]). (Notwisthanding the fact that this isn’t advertised in the MSDN page unless you drill into the individual method.)

So, calling x.split(‘.’) and x.Split(‘.’, ‘&’, ‘%’, ‘^’) are equally valid, syntax-wise in the case of “Erik.Dietrich” (and in this case, both will give me back my first and last name).

So, what one might expect is that there would be an overload Split(params[] string) to allow the same behavior as splitting over zero or more characters. Nope. Instead you have Split(string[] separator, StringSplitOptions options).

What’s Really Not Great about the Default Way to Split Strings with Sub-Strings

Two things suck about this.

  1. I have to specify some enum that I don’t care about in the first place and that has only two options, one of which is “none”. I mean, really? You can’t just assume “none” and let users specify a different case if they want with another overload?
  2. But what sucks even more about this is that params have to be the last argument in the parameter list, so that option is out the window. You no longer get that snazzy params syntax that the char version has, and now you have to actually awkwardly create a string array. So, here is the new syntax following the old. Note that the new syntax is pretty hideous.

This Gets a Lot Easier and Prettier using Regex.Split

I was getting ready to write something to hide this mess from myself as a client, when I stumbled across a better alternative than rolling my own extension method or string splitting class: Regex.Split(). Here’s how it works:

No fuss, no muss, and exactly what String.Split() should do. Granted, the arguments to Regex.Split() are both single strings (so if you want to specify multiple delimiters, you’ll have to cook up a regex recipe) and it’s a static method, but it has the advantage of already existing in the framework and being a much, much cleaner API than x.Split().

Use in good health!

By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.