Friday, June 4, 2010

I just got back from India...

and that pretty much sums it up.

I'm reading Three Cups of Tea.

Its the first book I've read in a really long time. Which makes me both sad and happy about my current state of affairs.

C# Dictionary, Hashtable, and HashSet

C# Data Structures: Dictionary, Hashtable, Set

Dictionary<>

The Dictionary C# data structure is extremely useful data structure since it allows the programmer to handle the index keys. What does that mean? Well an ArrayList automatically makes its "keys" integers that go up one by one, 1, 2, etc, so to access a value in an ArrayList one goes like: myArrayList[2];

So what the C# Dictionary data structure does is let us specify the keys, which can be any type of object. For example:


Dictionary<string, int> myDictionary = new Dictionary<string, int>();
myDictionary.Add("one", 1);
myDictionary.Add("twenty", 20);


Retrieving a value is pretty straight forward:


int myInt = myDictionary["one"];


Notice how convenient the Dictionary data structure is, in that there is no need to cast between types. Also there is nothing stopping you from creating a Dictionary like so:


Dictionary<int, Dictionary<string, int>> nestedDictionary =
new Dictionary<int, Dictionary<string, int>>();


That is a nested Dictionary C# data structure and it is fair game.

I understand that it can be confusing on how to go about getting all the values out of a Dictionary data structure since we have no way to knowing the pattern in the keys. Luckily we don't have to, here is the code to transverse a C#.Net Dictionary:


//List<[same type as index]>
List<string> keyList = new List<string>(myDictionary.Keys);
for (int i = 0; i < keyList.Count; i++)
{
int myInt = myDictionary[keyList[i]];
}


Hashtable

The C# Hashtable data structure is very much like the Dictionary data structure. A Hashtable also takes in a key/value pair, but it does so as generic objects as opposed to typed data.

Values are then stored in order according to their key's HashCode. Meaning that the order in which items are added to a C# Hashtable is not preserved. On the other hand, the Dictionary data structure does keep items in the same order.

The reason is speed. A C# Hashtable stores items faster than a C# Dictionary, which sacrifices speed for the sake of order..

(For those Java programmers, a Dictionary is more or less a TreeMap and a Hashtable is a HashMap).


Hashtable myTable = new Hashtable();


HashSet

The HashSet data structure was introduced in C#.NET 3.5. This particular C# data structure very strongly resembles the List<> data strucuture.

So what is the difference? A HashSet has the very important characteristic that it does not allow duplicate values. For example:


HashSet<int> mySet = new HashSet<int>();
mySet.Add(3);
mySet.Add(5);
mySet.Add(3);
mySet.Add(10);

List<int> myListFromSet = mySet.ToList<int>();
int myInt = myListFromSet[2];


If mySet were a regular List data structure, the index 2 should return the value 3 (count it out). But if you run the example you will see that myInt actually returns the value 10. This is because the HashSet C# data structure ignored the duplicate addition of the value 3.

You might wonder what is the point of this. After all, you could achieve the same behavior with a List data structure. Something like:


if (!myList.Contains(element))
myList.Add(element);


The result is indeed the same. But what is not apparent is the speed at which this happens. When an element is added to a HashSet, internally the same thing happens: the data structure makes sure the element doesn't already exist. However a HashSet is not a simple array, it is specifically designed to allow fast search times which dramatically improves the performace of checking whether a new element is a duplicate or not.