Wednesday, October 29, 2008

Removing items from a collection

A common coding pattern is removing items from a collection based on the value of the items. The simplest way to do this is to enumerate the collection, and remove all the items that you don't want.

Code example:

   1: List<string> colors = new List<string>();
   2: colors.Add("red");
   3: colors.Add("green");
   4: colors.Add("blue");
   5:  
   6: foreach (string color in colors) // WARNING: this code will throw an exception
   7: {
   8:     if (color == "green") colors.Remove("green");
   9: }

However, doing this will throw this exception: System.InvalidOperationException: Collection was modified; enumeration operation may not execute. It seems that the enumerator is invalid after the collection has been modified.

There are several ways to work around this behavior.

1 Store the items to be removed in a temporary collection

This approach is really simple: instead of immedeately removing the items, we keep them in a separate collection. In the end, after the original list has been enumerated, we remove all these items.

Code example:

   1: List<string> colors = new List<string>();
   2: colors.Add("red");
   3: colors.Add("green");
   4: colors.Add("blue");
   5:  
   6: List<string> itemsToRemove = new List<string>();
   7: foreach (string color in colors)
   8: {
   9:     if (color == "green") itemsToRemove.Add("green");
  10: }
  11:  
  12: foreach (string itemToRemove in itemsToRemove) colors.Remove(itemToRemove);

In most situations this approach works very well. Only in very large collections, the itemsToRemove collection may become very big and consume a lot of memory. In my experience this is seldom the case.

2 Iterate the collection from back to front

This alternative uses a for-loop to iterate the collection from the back to the front.

Code example:

   1: List<string> colors = new List<string>();
   2: colors.Add("red");
   3: colors.Add("green");
   4: colors.Add("blue");
   5:  
   6: for (int counter = colors.Count - 1; counter >= 0; counter--)
   7: {
   8:     if (colors[counter] == "green") colors.RemoveAt(counter);
   9: }

Here we do not need a temporary collection. It is also slightly more efficient because we can remove the item based on its index. This means that the collection does not have to search for the item that has to be removed (as was the case in the first approach).

3 Using a predicate

The third alternative is only available from .NET 2 onwards. The List-collection implements a RemoveAll() method that takes a predicate as a parameter. Simply put, a predicate is a delegate that is used for filtering. In this case, the predicate returns true for each item that needs to be removed.

In this code example, the predicate is implemented as an anonymous method:

   1: List<string> colors = new List<string>();
   2: colors.Add("red");
   3: colors.Add("green");
   4: colors.Add("blue");
   5:  
   6: colors.RemoveAll(delegate(string color)
   7: {
   8:     return color == "green";
   9: });

This approach has some benefits:

  • It is self-documenting.
  • We don’t have to worry about the performance, we can assume that the RemoveAll() method is implemented in the most efficient way. If the internals of the collection-class would later change, the RemoveAll() method will likely be changed as well to keep the best performance.

5 comments:

Anonymous said...

I like to add yet another alternative. The functional approach: Here we don't alter the original collection, but create a complete new one.

This code shows a LINQ query and works with C# 3.0:
colors = (
from color in colors
where color != "green"
select color
).ToList();

But even cleaner is it when we use the List{T}.FindAll method. This next example works with .NET 2.0 and C# 2.0.

colors = colors.FindAll(s => s != "green");

Anonymous said...

I'm sorry. Of course lambda expressions only work with C# 3.0. For C# 2.0 the syntax would be like this:

colors = colors.FindAll(delegate(string color) { return color != "green"; });

Anonymous said...

The first method is great to Remove Collection which is in a Collection.

For example i have two Collections

List:Page Pages // Id,PageId,PageName

List:SecuredPage SecurePages // Id,PageId,RoleId

I want to remove the each Page object inside Pages collection where Pages[x].PageId == SecurePages[x].PageId

If you directly loop through Pages and SecurePages(nested) and remove the item x.PageId == y.PageId you get an error about "collection changed, it is cancelled"

so the first method is really a nice tactic.

jtomasko said...

Sorry this is not about the thread, but I just wanted to thank you for such a useful and well written collection of "easily consumable" point of interest that I care about.

Great work Kristof!

www.oracledba.in said...

i like this