[Wednesday, August 19, 2009]

Rethrowing an Exception without resetting the Stack Trace.

0 comments

Source code: http://www.box.net/shared/kjkgq36itq

Exception handling in .NET is a complicated subject. It's complicated and always spawns all kinds of debates. I won't post my opinion or anything like that, but I want to point out a subtle yet important difference when re-throwing an exception. The two ways you can re-throw an exception are:



try
{
DoSomeExceptionThrowingMethod();
}
catch (Exception ex)
{

throw ex;
}


try
{
DoSomeExceptionThrowingMethod();
}
catch (Exception ex)
{

throw;
}


As you can see, the only difference is in the catch block where I'm calling throw. In the first case I'm calling throw ex, while in the second case I'm simply calling throw. What's the difference?

Well when you're simply calling throw, you're effectivley calling "rethrow" meaning "re throw the exception you just caught". When you're calling "throw ex" you're basically just saying "throw" and you're not rethrowing the exception you just caught. That doesn't make much sense, so let's whip up a code sample:

First, we'll have a class that has one method which throws an exception:



public class ExceptionThrower
{
public void InvalidMethod()
{
throw new InvalidOperationException("This method is invalid.");
}
}


Now, we'll add a few layers of method calling to this demo so we can make the actual stack trace a little bigger (I'll explain more soon):



public class Layer1
{
private ExceptionThrower thrower;

public Layer1()
{
this.thrower = new ExceptionThrower();
}

public void Layer1Method()
{
this.thrower.InvalidMethod();
}
}


Here's one layer of method calling, now we'll add one more:



public class Layer2
{
private Layer1 layer1;

public Layer2()
{
this.layer1 = new Layer1();
}

public void Layer2Method()
{
this.layer1.Layer1Method();
}
}


These classes aren't doing much more than just wrapping the method calls from the object they have internally.

Now, let's code up our main method:



class Program
{
static void Main(string[] args)
{
try
{
Console.WriteLine("Calling KeepStackTrace");
KeepStackTrace();
}
catch (InvalidOperationException ex)
{
Console.WriteLine(ex.StackTrace);
}

try
{
Console.WriteLine("Calling ResetStackTrace");
ResetStackTrace();
}
catch (InvalidOperationException ex)
{
Console.WriteLine(ex.StackTrace);
}

Console.ReadKey(true);
}

private static void KeepStackTrace()
{
Layer2 l2 = new Layer2();
try
{
l2.Layer2Method();
}
catch (InvalidOperationException ex)
{
throw;
}
}

private static void ResetStackTrace()
{
Layer2 l2 = new Layer2();
try
{
l2.Layer2Method();
}
catch (InvalidOperationException ex)
{
throw ex;
}
}
}


So basically we have two methods that call into our Layer2 class. One wraps the method in a try / catch but just calls throw, while the other calls throw ex. In the main method, we output the Stack Trace of the exception. Let's examine the output:

Calling KeepStackTrace

at ExceptionReThrow.ExceptionThrower.InvalidMethod() in C:\Users\Alex\Documents\Visual Studio 2008\Projects\ExceptionReThrow\ExceptionReThrow\ExceptionThrower.cs:line 12
at ExceptionReThrow.Layer1.Layer1Method() in C:\Users\Alex\Documents\Visual Studio 2008\Projects\ExceptionReThrow\ExceptionReThrow\Layer1.cs:line 19
at ExceptionReThrow.Layer2.Layer2Method() in C:\Users\Alex\Documents\Visual Studio 2008\Projects\ExceptionReThrow\ExceptionReThrow\Layer2.cs:line 19
at ExceptionReThrow.Program.KeepStackTrace() in C:\Users\Alex\Documents\Visual Studio 2008\Projects\ExceptionReThrow\ExceptionReThrow\Program.cs:line 48
at ExceptionReThrow.Program.Main(String[] args) in C:\Users\Alex\Documents\Visual Studio 2008\Projects\ExceptionReThrow\ExceptionReThrow\Program.cs:line 19

Calling ResetStackTrace

at ExceptionReThrow.Program.ResetStackTrace() in C:\Users\Alex\Documents\Visual Studio 2008\Projects\ExceptionReThrow\ExceptionReThrow\Program.cs:line 61
at ExceptionReThrow.Program.Main(String[] args) in C:\Users\Alex\Documents\Visual Studio 2008\Projects\ExceptionReThrow\ExceptionReThrow\Program.cs:line 29


As you can tell, when calling the KeepStackTrace which simply did "throw" the entire stack trace with all the layers are kept and we can see it all the way down to where the exception originated.

When calling ResetStrackTrace though, the method where we do "throw ex", you'll notice that all we see is down till our ResetStackTrace method. We don't see anything past that, even though that's not really where the exception originated.

Bottom line, for the most part, this isn't all that relevant because you really shouldn't be catching exceptions if you don't plan on doing anything with it. However, if you do want to at least log it, but let it bubble up, be sure to do "throw" so that when the Exception does finally bubble to the top, you have the entire stack trace.

[Thursday, July 30, 2009]

Passing objects using the ref keyword...Wait, aren't objects *always* passed by reference???

4 comments

For the past month I've been on the job market and have been exposed to all kinds of technical questions. Most of them were run of the mill, but every once and a while I'd get asked a question that made me stop and think. Perfect for a guy with a blog, because now you have tons more stuff to blog about right?

P.S. For those interested, I accepted an offer earlier this week at a company called BIA. They are a computer forensics firm and I'm real excited to start!

So one question that I really liked, was the point of this blog post. The interviewer actually informed me that I was only one out of one hundred that got this question right. Not believing that these numbers were true, I asked everyone on my team at my old job, and only one of them sort of half knew. The others were completely stumped. I guess it is something that many developers just glance over.

OK, so what is this question already?? Well it goes something like this: "What's the difference between passing an object to a method the standard way, and passing an object to a method using the ref keyword?"

Let's break it down. Let's first talk about what the ref keyword does when it comes to value types. Simple example:



static void Main(string[] args)
{
int number = 10;

Add5(number);
Console.WriteLine(number);

Add5(ref number);
Console.WriteLine(number);

Console.ReadKey(true);
}

public static void Add5(int x)
{
x += 5;
}

public static void Add5(ref int x)
{
x += 5;
}


Before trying to run this, see if you can guess what the output would be.....


Answer: The first Console.Writline outputs 10 and the second one outputs 15. Why? Well, the definition of a "value type" is that when passing it to a method, it's passed by value. Meaning, not the actual variable itself is passed to the method, rather a copy of the value is passed to the method. Therefore, when calling the first method, just adding 5 to the variable that was passed in, does NOT affect the variable in the main method, because we only added 5 to the COPY of the x variable not the actual one.

In the second method call though, we're passing the variable using the ref keyword. This changes the behavior and actually DOES pass the actual variable itself to the method. Therefore, in the second method, when adding 5 to the value being passed in, you're actually messing with the same exact variable that's in the main method. Therefore, it outputs 15 showing the changes DID take effect.

So now we have an understanding of how value types work, and how ref changes the behavior. Let's talk about objects now, which are passed by reference. I'll start with another example:

First a simple Person class:



public class Person
{
public Person(string name, int age)
{
this.Name = name;
this.Age = age;
}

public override string ToString()
{
return String.Format("{0} is {1} years old.", this.Name, this.Age);
}

public string Name { get; set; }
public int Age { get; set; }

}


Simple class with two properties. Name and Age. Also, the ToString is overridden to make it easier to demonstrate.

Now, let's say we had something like this in our Program.cs:



static void Main(string[] args)
{
var alex = new Person("Alex", 27);

ChangePerson(alex);
Console.WriteLine(alex);

Console.ReadKey(true);
}


public static void ChangePerson(Person p)
{
p.Age += 5;
}


Try guessing what the output would be for this program.

Answer: Alex is 32 years old.

If you're paying attention you'll notice that this is different than how it was with value types. The change in the method DID affect the one in the main method! That's because objects are passed by reference, meaning that a reference (pointer) to the SAME object is being passed to the method. So in the method itself, you still have a reference to the same object that exists in the main method. Therefore, when you make changes, it does show up back in the main.

So back to the original question: If all objects are passed by reference, what's the point of the ref keyword when passing an object to a method??

So here's the deal. Let's talk in terms of the stack and the heap. In the current version of the CLR value types are stored on the stack, and reference types are stored on the heap. However, and this is key, for reference types, a *pointer* to that object is ALSO stored on the stack! Ok, so why is this so important? Well, I sort of mispoke earlier when I said objects are passed by refernece. I didn't give you the whole picture. What's actually happening is that a copy of the pointer is being passed to the the method. So while you are referencing the same object in memory in the method and the one in the main, the POINTER variable on the stack, is NOT the same. HOWEVER, when using the ref keyword, the pointer itself is passed to the method, not a copy of it! So when you're inside the method itself, you're dealing with the exact same pointer variable that's in the main.

The only way to explain is with an example:



static void Main(string[] args)
{
var alex = new Person("Alex", 27);

ChangePerson(alex);
Console.WriteLine(alex);

ChangePerson(ref alex);
Console.WriteLine(alex);

Console.ReadKey(true);
}


public static void ChangePerson(Person p)
{
p = new Person("Alex", 35);
}

public static void ChangePerson(ref Person p)
{
p = new Person("Alex", 45);
}


Here we have two methods that look the same. The only difference is the ref keyword. What the method is doing, is it's assigning a NEW person object to the Person p (the pointer) that was passed in to the method. However, if you run this program you'll see that the first Console.Writeline still outputs 27 years old, even though we assigned it to a person object that's 35 years old! The reason for this is because the pointer itself was passed by VALUE so when you're assigning a new person object, you're not assigning it to the same pointer referenced in the main.

In the second case however, since we're using the ref keyword, the pointer in the method is the SAME one that's in the main method. Therefore, the second Console.Writeline outputs 45 years old, because the pointer in the main, is now pointing to the object that was assigned to it in the method.

Personally I've never used this yet in production code, but if you understand this, then that means you understand the nitty gritty details of how parameters are passed around. Very impressive on interviews :-)

[Tuesday, July 14, 2009]

Using JQuery to post a Form with ASP.NET MVC with AJAX

0 comments

Source code:http://www.box.net/shared/2to3vfajqp In order for this sample to work on your machine, you need to have the Northwind database, and you need to configure the connection string in the HomeController.

When you install ASP.NET MVC, you'll notice that when you create a new project, the latest jQuery libraries get added for you as well. For those of you who don't know what jQuery is, think of it as a layer of abstraction for common tasks you would do with javascript. Instead of having to rewrite tons of javascript to let's say, do some animation, or post to a server with AJAX, jQuery makes it all extremely simple. I've found that there aren't that many tutorials online for getting started with jQuery and AJAX when using ASP.NET MVC, so I figured I'll share what I've learned so far in the hopes that maybe others can get some insight.

The premise here will be simple. We'll be using the Northwind database (specifically the Products table) to display a list of Products and some of their attributes. Then, there will be a textbox on top of the list. When the user enters some text into the textbox, it will post back to the server via AJAX and find any products that match what the user entered. Here's what it will look like:




So first I started with a simple ASP.NET MVC application. I'll be using the standard project for this tutorial. I then added a NorthwindDataContext to the Models folder with only the Products table from the Northwind database. Then, I added a repository that will help us retrieve the items from the database. Here's the code for the repository:



using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;

namespace MVCJqueryDemo.Models
{
public class NorthwindRepository : IDisposable
{
#region Members

private NorthwindDataContext dataContext;

#endregion

#region Constructors

public NorthwindRepository()
: this(null)
{
}

public NorthwindRepository(string connectionString)
{
dataContext = String.IsNullOrEmpty(connectionString) ? new NorthwindDataContext()
: new NorthwindDataContext(connectionString);
}

#endregion

#region Methods

public IEnumerable<Product> GetAllProducts()
{
return this.dataContext.Products.ToList();
}

public IEnumerable<Product> GetProductsByName(string name)
{
return this.dataContext.Products.Where(p => p.ProductName.Contains(name)).ToList();
}

#endregion

#region IDisposable

public void Dispose()
{
this.dataContext.Dispose();
}

#endregion
}
}


So this class will help us retrieve what we need from the db. Now, over in the Home controller, I've added two actions:



public class HomeController : Controller
{
#region Members

//CHANGE THIS CONNECTION STRING IF YOUR NORTHWIND IS IN A DIFFERENT LOCATION!!!!
private const string CONNECTIONSTRING = "Data Source=.;Initial Catalog=Northwind;Integrated Security=True";

#endregion

public ActionResult Products()
{
using (var repository = new NorthwindRepository(CONNECTIONSTRING))
{
var allProducts = repository.GetAllProducts();
return View(allProducts);
}
}

[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Search(string name)
{
using (var repository = new NorthwindRepository(CONNECTIONSTRING))
{
var result = repository.GetProductsByName(name);
return View("ProductsPartial", result);
}
}
}


So there are two methods here. Once will be ../Home/Products and the other will be a url where we'll post to ../Home/Search.

The first one is straight forward. It just hits the repository for all the products, and passes it on to the View. We can see from this, that there's a Products View. Here's the code for the Products View:



<%@ Page Title="" Language="C#" MasterPageFile="~/Views/Shared/Site.Master" Inherits="System.Web.Mvc.ViewPage<IEnumerable<Product>>" %>

<%@ Import Namespace="MVCJqueryDemo.Models" %>

<asp:Content ID="Content1" ContentPlaceHolderID="TitleContent" runat="server">

Products
</asp:Content>
<asp:Content ID="Content2" ContentPlaceHolderID="MainContent" runat="server">
<h2>
Products</h2>
<form id="searchForm" action="javascript:void();">
<input type="text" name="name" id="searchBox" />
</form>
<div id="products" class="productsDiv">
<%Html.RenderPartial("ProductsPartial", this.Model); %>
</div>
</asp:Content>


It has a form with a textbox, and then it has a div where we call RenderPartial to render a partial view called: ProductsPartial. Here's the code for the ProductsPartial.ascx:



<%@ Control Language="C#" Inherits="System.Web.Mvc.ViewUserControl<IEnumerable<Product>>" %>
<%@ Import Namespace="MVCJqueryDemo.Models" %>


<table id="productsTable">
<tr>
<th>Product ID</th>
<th>Product Name</th>
<th>Units in Stock</th>
<th>Unit Price</th>
<th>Being Produced</th>
<th>Units on Order</th>
</tr>
<%foreach (var product in this.Model)%>
<%{%>
<tr>
<td><%=product.ProductID %></td>
<td><%=product.ProductName %></td>
<td><%=product.UnitsInStock %></td>
<td><%=product.UnitPrice.Value.ToString("$#0.00")%></td>
<td><img class="inStockImages" src="<%=product.Discontinued ? "../../Content/x.png" : "../../Content/check.png" %>" /></td>
<td><%=product.UnitsOnOrder %></td>
</tr>
<%}%>
</table>


So basically, what's happening is this. When you go to ../Home/Products, the Products Action gets called on the Home controller. Then, we get a list of Products from the database, and pass that on to the Products View. The Products View then passes that on to the ProductsPartial which actually renders the products in a nice HTML table.

At this point, we haven't done anything fancy yet. If we were to run it at this point, you'd see a list of all the products displayed. If you were to type anything in the textbox, nothing would happen. Here's where we want to start using some AJAX. The idea will be, that whenever the keyup event will be triggered in the textbox, we'll fire off an AJAX call to the server, and display the results. So first, let's look at the second Action in the Home Controller:



[AcceptVerbs(HttpVerbs.Post)]
public ActionResult Search(string name)
{
using (var repository = new NorthwindRepository(CONNECTIONSTRING))
{
var result = repository.GetProductsByName(name);
return View("ProductsPartial", result);
}
}
}


As you can see, this Action only accepts HTTP POST requests. Again, we call our repository, and get back a list of Products that match the search criteria. We then call return View to display the ProductsPartial, and we pass in the list of products.

That's all very nice, but how do we call this method? How do we hook up an event to our textbox to trigger this method to be called? This is where we'll use jQuery to make the AJAX call. First, in the head section of your master page, you need to add these lines:



<script src="../../Scripts/jquery-1.3.2.js" type="text/javascript"></script>
<script src="../../Scripts/jquery-1.3.2.min.js" type="text/javascript"></script>


This will include the jQuery libraries in your page. Then, I've added another file called ProductScripts.js into the Scripts folder. Then, I added this line to the head of my page:



<script src="../../Scripts/ProductScripts.js" type="text/javascript"></script>


Here's what the ProductScripts.js file looks like:



$(document).ready(function() {
$("#searchBox").keyup(function(item) {
var textValue = $("#searchBox")[0].value;
var form = $("#searchForm").serialize();
$.post("/Home/Search", form, function(returnHtml) {
$("#products").html(returnHtml);
});

});
});


Looks a little weird at first, but I'll try to explain. First we call $(document.ready(..)). In here is where we hook up all of our jQuery events. This ready function gets called as soon as the DOM is loaded. Then, we get a reference to the searchBox using $("#searchBox"). This is the equivalent of document.getElementById(..) in JavaScript. We then hook into the keyup event and whenever that event is triggered we call this function:



var textValue = $("#searchBox")[0].value;
var form = $("#searchForm").serialize();
$.post("/Home/Search", form, function(returnHtml) {
$("#products").html(returnHtml);


First, we get the text that was typed into the textbox. Then, we get the entire form (which in this case is just the textbox itself). We then serialize the form and call the post method. This is where the actual AJAX call is happening. The parameters that are passed into the post method are as follows:

URL : in our case it's /Home/Search
Data: in our case it's the serialized form, which will then get sent as a parameter to our Search Action on the Home Controller
Callback: in our case it's a function that we use to set the inner HTML of the products div. Remember, the Search Action in our controller renders the ProductsPartial View. So basically it send HTML back to the browesr as a result of the AJAX request. We take that HTML, and stick it into the Products div.

The full source code is available at the link posted at the top, download it and mess with it. It's actually real simple, and real powerful.

In this post I only demonstrated how to send back HTML. Another very common way of sending data is through JSON. I'll cover that in another post. ASP.NET MVC makes it EXTREMELY easy to send JSON across the wire.

[Monday, June 29, 2009]

Factory Pattern with Attributes. Get rid of the ugly switch / case.

2 comments

Source Code: http://www.box.net/shared/2yh2r6l91c

A very common design pattern used in Object Oriented Programming, is the Factory Pattern. The purpose of this post isn't to explain the factory pattern or why it's useful, rather I want to show a simple way to eliminate a giant switch / case found in many factory pattern implementations. For some good reading on the Factory Pattern, I suggest reading these two articles:

Wikipedia : http://en.wikipedia.org/wiki/Factory_method_pattern

MSDN : http://msdn.microsoft.com/en-us/library/ms954600.aspx

To demonstrate a simple example of the Factory Pattern, I've created a few classes. First, I created a base Vehicle class that looks something like this:



public abstract class Vehicle
{
public virtual int TopSpeed
{
get
{
return 150;
}
}

public abstract int Wheels
{
get;
}

public override string ToString()
{
return String.Format("A {0} has {1} wheels, and a top speed of {2} MPH."
, this.GetType().Name, this.Wheels, this.TopSpeed);
}
}


Just a base class that has one virtual property, one abstract property, and it overrides ToString. Then, I've created 4 subclasses:



public class Car : Vehicle
{
public override int Wheels
{
get { return 4; }
}
}

public class SuperCar : Car
{
public override int TopSpeed
{
get
{
return 200;
}
}
}

public class Truck : Vehicle
{
public override int Wheels
{
get { return 18; }
}
}
public class Motorcycle : Vehicle
{
public override int Wheels
{
get { return 2; }
}

public override int TopSpeed
{
get
{
return 190;
}
}
}


So we have Vehicle, Car, SuperCar, Truck and Motorcycle. Now, say we wanted to create a Factory that returns us the correct Vehicle class based on an enum that we'd supply. So let's create an enum:



public enum VehicleType
{
Car,
SuperCar,
Truck,
Motorcyle
}


We'd then have a method that looks something like this:



public static Vehicle GetVehicle(VehicleType vehicle)
{
switch (vehicle)
{
case VehicleType.Car:
return new Car();
case VehicleType.SuperCar:
return new SuperCar();
case VehicleType.Truck:
return new Truck();
case VehicleType.Motorcyle:
return new Motorcycle();
default:
return null;
}
}

Well, this is all nice, and works fine, but imagine a scenario where you may have many many subclasses. This switch statement would get huge, and unmaintainable quickly. Imagine then that new subclasses come along, you'd have to first update the enum, and then remember to update this switch / case. Well, I think there's a better way to do this.

The premise is simple; create an attribute that has one property called Type. This attribute will go on the enum, and will represent which type should be instantiated for each value of the enum. So, let's first create the Attribute class:



public class VehicleInfoAttribute : Attribute
{
private Type type;

public VehicleInfoAttribute(Type type)
{
this.type = type;
}

public Type Type
{
get
{
return this.type;
}
}
}


Nothing fancy, just a simple attribute that will house the Type to be created. Now, let's go back to our enum, and decorate the values with the correct attributes:



public enum VehicleType
{
[VehicleInfo(typeof(Car))]
Car,

[VehicleInfo(typeof(SuperCar))]
SuperCar,

[VehicleInfo(typeof(Truck))]
Truck,

[VehicleInfo(typeof(Motorcycle))]
Motorcyle
}


Now, each enum has an attribute that tells us which type to be instantiated for that type. Now, the fun part. The reflection bit in the factory method itself:

First, I've created an extension method for enum's that helps with getting custom attributes off of enum values:



public static class Extensions
{
public static T GetAttribute<T>(this Enum enumValue)
where T : Attribute
{
FieldInfo field = enumValue.GetType().GetField(enumValue.ToString());
object[] attribs = field.GetCustomAttributes(typeof(T), false);
T result = default(T);

if (attribs.Length > 0)
{
result = attribs[0] as T;
}

return result;
}
}


This allows you to do something like this:

MyCustomAttribute a = myEnumValue.GetAttribute<MyCustomAttribute>();

Now, we have all the pieces in place to write the Factory Method:



public static Vehicle GetVehicle(VehicleType vehicle)
{
var vehicleAttribute = vehicle.GetAttribute<VehicleInfoAttribute>();
if (vehicleAttribute == null)
{
return null;
}

var type = vehicleAttribute.Type;
Vehicle result = Activator.CreateInstance(type) as Vehicle;

return result;
}


First we call our extension method to get the attribute value for the enum passed in. Then, we use the handy Activator.CreateInstance() to create an object of that type.

To test it out, we can write a quick app:



static void Main()
{
Vehicle v = VehicleFactory.GetVehicle(VehicleType.Truck);
Console.WriteLine(v);
}


This will output:

A Truck has 18 wheels, and a top speed of 150 MPH.

Using this approach, yields two benefits. First, you no longer have a giant ugly switch / case. Secondly, if you ever have a case where another subclass is added, you just add another enum value (which you'd have to do anyway if you were using the switch / case), slap on the attribute, and you're done. The Factory method doesn't need to change at all.

[Tuesday, June 23, 2009]

LINQ Overkill. Why? Ha! Why not??

0 comments

Source Code: http://www.box.net/shared/tnt1569ql9

Before going to SetFocus, we had to take a few test before being accepted. One of these tests was a simple coding exercise, which went something like this:

Write a program that accepts a string input from the user. Your program will then count the occurrence of each character in the string, and output the results in this format:

There are 0 A's.
There are 2 B's.
There are 1 C's.
...
...


Simple program really, with a few easy approaches. I personally used a HashTable to hold the chars and the integers. After going through SetFocus, and learning about generics, I rewrote the app to use a Dictionary<char,int>, and the app basically looked something like this:



private static void CountCharacters(string text)
{
Dictionary<char,int> chars = new Dictionary<char, int>();
for (int i = 65; i <= 90; i++) //character integer values for 'A' - 'Z'
{
chars.Add((char)i, 0);
}

foreach (var character in text.ToUpperInvariant())
{
if (chars.ContainsKey(character))
{
chars[character]++;
}
}

foreach (var kvp in chars)
{
Console.WriteLine("There are {0} {1}'s.", kvp.Value, kvp.Key);
}
}


Basically, populate the dictionary with the alphabet in upper case, initialize all int's to zero's. Then, loop through the text, and increment the corresponding integer. Finally, loop through the dictionary, and output the results.

Very nice. Recently however, I was bored one bight, and it hit me. Having used LINQ extensivley for quite a while now, I figured, I'm sure I can figure out a way to do this in a "LINQ one liner", ie. one gigantic impossible-to-understand LINQ statement. Why? Why not!!

So, here's what I came up with. I'm warning you, it's isn't pretty but it works. I'll try my best to explain after:



public static StringBuilder CountCharsAlexLinqMethod(string text)
{
var result = new StringBuilder();
foreach (var item in (from p in "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
join pa in
(from p in text.ToUpperInvariant()
group p by p into g
select new { Count = g.Count(), Character = g.Key })
on p equals pa.Character into w
from a in w.DefaultIfEmpty()
select new
{
Count = a == null ? 0 : a.Count,
Character = p
}))
{
result.AppendFormat(Format, item.Count, item.Character);
}

return result;
}


Told you it wasn't pretty :) Ok, it looks worse than it is. The basic premise is as follows. First, let's analyze the inner query:


(from p in text.ToUpperInvariant()
group p by p into g
select new { Count = g.Count(), Character = g.Key })


Think of this as a "GROUP BY" in SQL. It basically groups all the letters together in the given text, and counts them up. So at this point, if our text for example was the word "Hello", then this collection of anonymous objects would look something like this:

{Character = 'H', Count = 1}
{Character = 'E', Count = 1}
{Character = 'L', Count = 2}
{Character = 'O', Count = 1}

Once we have that, we basically do the equivalent of a LEFT OUTER JOIN on the entire alphabet. Since string implements IEnumerable<char> we can query it just like any other IEnumerable and even join on it. So what's happening is, we're joining our results from the inner query earlier, to a collection that contains the entire alphabet. Wherever they join, meaning wherever the letter is found in the inner query, use that as the count, if not, use 0 as the count:


on p equals pa.Character into w
from a in w.DefaultIfEmpty()
select new
{
Count = a == null ? 0 : a.Count,
Character = p
}))


Crazy, but not too bad. Well, the story isn't over. Since I thought this was kinda neat, and I HAD to show someone, I decided to bug James Arendt who was one of my instructors at Set Focus and I sent him my code. Here was his response:

Fun solution, Alex! I don't use the group join too often so it was neat to see it in action on this solution. I decided to take a stab at the code as well, but I took a different direction.




private static void CountCharacters(string text)
{
// I could have used a literal, but I wanted to demonstrate some
// other LINQ routines.
var letters = Enumerable.Range('A', 26).Select(i => (char)i);
var sourceChars = letters.Concat(text.ToUpperInvariant());

var results = from c in sourceChars
group c by c into g
where char.IsLetter(g.Key)
orderby g.Key
select new { Char = g.Key, Count = g.Count() - 1 };

foreach (var result in results)
{
Console.WriteLine("There are {0} {1}'s.", result.Count, result.Char);
}
}


His approach is a little different. First, he builds up an enumerable with the alphabet without using literals (no real difference there). Then however, he tacks on the text to the end of that enumerable. So at this point, if our test text was "Hello", the enumerable would be:

'A','B','C','D'.....'Z','H','E','L','L','O'.

Then, he does a group by on this enumerable, counting up the duplicates.

Nice. So, whenever you have more than one way of doing something, what does every computer nerd like me love to do? Why benchmark of course!!

In my initial comparisons between my method and the way James did it, his way was quicker every time. I wrote back to him, and congratulated him that his way was in fact quicker, but he pointed out to me, that his way was only quicker with shorter strings. When the strings got really long, my way turned out to be fast. So, I decided to flesh this all out, and really benchmark all ways of doing it.

Before I do that, just for the sake of really doing it in just one line, I wrote it one more way:



public static StringBuilder CountCharsAlexLinqCompleteOverkill(string text)
{
var result = new StringBuilder();

(from p in "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
join pa in
(from p in text.ToUpperInvariant()
group p by p into g
select new { Count = g.Count(), Character = g.Key })
on p equals pa.Character into w
from a in w.DefaultIfEmpty()
select new
{
Count = a == null ? 0 : a.Count,
Character = p
}).ForEach(item => result.AppendFormat(Format, item.Count, item.Character));

return result;
}

private static void ForEach<T>(this IEnumerable<T> list, Action<T> action)
{
foreach (var item in list)
{
action(item);
}
}


Not really THAT much different, but now it really is just one statement. There's a ForEach extension method now, so the actual CountCharacters method is just one big statement. So, here's the entire class I used for benchmarking:



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqOverkill
{
public static class CharacterCounter
{
private const string Format = "There are {0} {1}'s.\n";

public static StringBuilder CountCharsDictionaryMethod(string text)
{
var chars = new Dictionary<char, int>();
for (int i = 65; i <= 90; i++) //character integer values for 'A' - 'Z'
{
chars.Add((char)i, 0);
}

foreach (var character in text.ToUpperInvariant())
{
if (chars.ContainsKey(character))
{
chars[character]++;
}
}

var result = new StringBuilder();


foreach (var kvp in chars)
{
result.AppendFormat(Format, kvp.Value, kvp.Key);
}

return result;
}

public static StringBuilder CountCharsAlexLinqMethod(string text)
{
var result = new StringBuilder();
foreach (var item in (from p in "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
join pa in
(from p in text.ToUpperInvariant()
group p by p into g
select new { Count = g.Count(), Character = g.Key })
on p equals pa.Character into w
from a in w.DefaultIfEmpty()
select new
{
Count = a == null ? 0 : a.Count,
Character = p
}))
{
result.AppendFormat(Format, item.Count, item.Character);
}

return result;
}


public static StringBuilder CountCharsJamesLinqMethod(string text)
{
var builder = new StringBuilder();

var letters = Enumerable.Range('A', 26).Select(i => (char)i);
var sourceChars = letters.Concat(text.ToUpperInvariant());

var results = from c in sourceChars
group c by c into g
where char.IsLetter(g.Key)
orderby g.Key
select new { Char = g.Key, Count = g.Count() - 1 };

foreach (var result in results)
{
builder.AppendFormat(Format, result.Count, result.Char);
}

return builder;
}

public static StringBuilder CountCharsAlexLinqCompleteOverkill(string text)
{
var result = new StringBuilder();

(from p in "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
join pa in
(from p in text.ToUpperInvariant()
group p by p into g
select new { Count = g.Count(), Character = g.Key })
on p equals pa.Character into w
from a in w.DefaultIfEmpty()
select new
{
Count = a == null ? 0 : a.Count,
Character = p
}).ForEach(item => result.AppendFormat(Format, item.Count, item.Character));

return result;
}

private static void ForEach<T>(this IEnumerable<T> list, Action<T> action)
{
foreach (var item in list)
{
action(item);
}
}
}
}


I then wrote a test method which tested it first with a short string of 24 characters, and then 4135 characters. Here's the main method:



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;

namespace LinqOverkill
{
class Program
{
static void Main(string[] args)
{
string text = "This is some dummy text.";
int iterations = 10000;

BenchmarhMethods(text, iterations);

string bigText = LinqOverkill.Resource.BigText;
BenchmarhMethods(bigText, iterations);

Console.ReadKey(true);
}

private static void BenchmarhMethods(string text, int iterations)
{
Console.WriteLine("Testing with {0} characters.", text.Length);
Stopwatch watch = new Stopwatch();

watch.Start();
for (int i = 1; i <= iterations; i++)
{
CharacterCounter.CountCharsDictionaryMethod(text);
}
watch.Stop();
Console.WriteLine("Dictionary method took: {0} milliseconds.", watch.ElapsedMilliseconds);

watch.Reset();

watch.Start();
for (int i = 1; i <= iterations; i++)
{
CharacterCounter.CountCharsAlexLinqMethod(text);
}
watch.Stop();
Console.WriteLine("Alex LINQ method took: {0} milliseconds.", watch.ElapsedMilliseconds);

watch.Reset();

watch.Start();
for (int i = 1; i <= iterations; i++)
{
CharacterCounter.CountCharsAlexLinqCompleteOverkill(text);
}
watch.Stop();
Console.WriteLine("Alex Overkill method took: {0} milliseconds.", watch.ElapsedMilliseconds);

watch.Reset();

watch.Start();
for (int i = 1; i <= iterations; i++)
{
CharacterCounter.CountCharsJamesLinqMethod(text);
}
watch.Stop();
Console.WriteLine("James LINQ method took: {0} milliseconds.", watch.ElapsedMilliseconds);

watch.Reset();
Console.WriteLine();
}
}
}


Here were the results on my machine:

Testing with 24 characters.
Dictionary method took: 427 milliseconds.
Alex LINQ method took: 1176 milliseconds.
Alex Overkill method took: 1157 milliseconds.
James LINQ method took: 636 milliseconds.

Testing with 4135 characters.
Dictionary method took: 7484 milliseconds.
Alex LINQ method took: 6417 milliseconds.
Alex Overkill method took: 6429 milliseconds.
James LINQ method took: 6738 milliseconds.

As you can see, with 24 characters, using a Dictionary and looping yourself is the quickest way. Second quickest is the way James did it. Then, my way (the Alex LINQ and the overkill are basically the same exact thing) came in last.

When I pumped it up to 4135 characters though, my was fast fastest :) Even faster than using a Dictionary.

One final note. In production code, this really isn't a good idea. Yes, it's loads of fun to do, (I love doin stuff like this) but if another developer, or even you really, ever needs to go back to look at this code, it'll take them 3 times as long to figure out what's going on and how it's doing it.

Finally, I'll leave you with this example of the worst abuse of LINQ ever. It's awesome, but I have no clue what the heck it's doing!! http://blogs.msdn.com/lukeh/archive/2007/10/01/taking-linq-to-objects-to-extremes-a-fully-linqified-raytracer.aspx

[Friday, June 19, 2009]

C# Html Screen Scraping Part 2 / Performing POST with Cookies

1 comments

Source code: http://www.box.net/shared/r7u052y507

I just want to point out, that I purposely didn't break this out into separate classes and methods, and I know I'm duplicating ALOT of code. I simply wanted to demonstrate each technique on its own.


In my previous post, I demonstrated how to connect to a website, and download the HTML (aka Screen Scraping). That all works nicely if it's a simple site you need to connect to. Sometimes however, you can't simply connect to the site and download the HTML, rather you need to first login, or maybe you need to enter some kind of search term first into a text box. In the HTML world, generally the page will have a simple form that posts to the server which queries the database or something based on the info supplied in the post, and then dynamically builds up the page. How do you do that in your C# app? How do you pass the values on to the Form Post that the server is expecting?

Throughout this post, I'll be referring to the code that's attached to this post. It basically has two projects. One's a simple ASP.NET MVC website, and the other is a winforms app. In order to run this properly, you'll need to first launch the web app, and then launch the windows app. Here's a simple screen shot of what the windows app looks like so you can get an idea:



First, a little note. In order to demonstrate this, I needed a site that had a simple Form with cookies. Since I couldn't find anything that was really simple and that would be easy to demo, I decided to create my own little "Website". It's written in ASP.NET MVC, so if you want to be able to run the code sample supplied in the link, head over to the ASP.NET MVC Website and download it (if you don't already have it.)

The site basically has two URL's that are of interest. The first is ../Home/SimplePost. If you navigate to that page, you'll see a simple textbox with a button. When you click the button, it simply posts the text in the textbox back to the server, and then it just outputs it back to the browser. Here's the HTML rendered for that form:

<form action="/Home/SimplePost" method="post">
<input type="text" id="text" name="text" />
<input type="submit" value="submit" />
</form>

The Form will Post to a site on the server /Home/SimplePost. We also can see in the form, that the server is expecting a parameter that's called "text". It's safe to assume, that anything with an input field (except for the button) is needed by the server. So, now we have enough info to write our C# function:



private void PostWithoutCookies()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(
String.Format("http://localhost:{0}/Home/SimplePost", port));
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
string postData = String.Format("text={0}", String.IsNullOrEmpty(textBoxPost.Text)
? "somerandomemail@address.com" : this.textBoxPost.Text);
byte[] bytes = Encoding.UTF8.GetBytes(postData);
request.ContentLength = bytes.Length;

Stream requestStream = request.GetRequestStream();
requestStream.Write(bytes, 0, bytes.Length);

WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
stream.Dispose();
reader.Dispose();
this.richTextBox1.Text = reader.ReadToEnd();
}


(This is all part of the app that's attached at the top of this post. It basically outputs all the results to a richtextbox.)

A bit more complicated than last time, but not as bad. The main difference here is that we'll actually be writing TO the Request stream. This will insert the form values into the headers, which will allow the server to receive this data and process it. The trick is that for each value that you need to add, you use this syntax:

field1=value1&field2=value2&field3=value3

where field is the name of the input field, and the value is the value you want to send over to the server (ie. the text that would be entered into the textbox).

So if we were to run this method now, we'd see the text that we posted to the server (the text that was in the textbox of the app) in the richtextbox.

There is one more thing that we can do with this. Very often, in order to access certain areas of a site, you need to first log in. When you login, the server sends a cookie to the browser, and then for each subsequent request that is for authenticated users only, the browser send the cookie back to the server so that you can access those parts of the site. Here, we're acting like a browser, so we need to have the ability to get the cookie, retain it somehow, and then pass that on to the next request.

To demonstrate this, in the web app of this demo, there's a page called "..Home/PostWithCookie". When you access this page, it sends a cookie to the browser. Then, on that page there's a form identical to the first one. When you post back to the server though, it checks if the cookie is there. If it is, it outputs "Cookie Found" along with the cookie value, if not, it outputs "Cookie not found."

So back in our Windows App, we need a way to first access that first page that gets us our cookie, then we need to GET the cookie, and finally we need to pass the cookie on with the form post. Here's the code:



private CookieCollection GetCookies()
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(
String.Format("http://localhost:{0}/Home/PostWithCookie", port));
request.CookieContainer = new CookieContainer();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream responseStream = response.GetResponseStream();
return response.Cookies;
}

private void PostWithCookies()
{
CookieCollection cookies = this.GetCookies();
var request = (HttpWebRequest)WebRequest.Create(
String.Format("http://localhost:{0}/Home/PostWithCookie", port));
request.CookieContainer = new CookieContainer();
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
string postData = String.Format("text={0}", String.IsNullOrEmpty(textBoxPost.Text)
? "somerandomemail@address.com" : this.textBoxPost.Text);
byte[] bytes = Encoding.UTF8.GetBytes(postData);
request.ContentLength = bytes.Length;
if (cookies != null)
{
request.CookieContainer.Add(cookies);
}

Stream requestStream = request.GetRequestStream();
requestStream.Write(bytes, 0, bytes.Length);

var response = request.GetResponse();
var stream = response.GetResponseStream();
var reader = new StreamReader(stream);
this.richTextBox1.Text = reader.ReadToEnd();
}


The first bit of code, the GetCookies method looks JUST like the original Screen Scrape method, however here we're actually grabbing the cookies. The trick is to new up a new CookieContainer before we do the request. Once we have a container, and we execute the request, we can get the cookies out of the response.

Now we have the cookie, but we aren't done. We want to pass this cookie back to the server when we post to the form. The only difference again here is that we have to new up a CookieContainer on the request, and add the cookies to that container. Once it's there, when you execute the POST, the cookies will get sent over as well.

The only way to really understand this all, is to download the sample, and mess with it. It's not very complicated, but you need to just mess with it a bit to understand. Once you do grasp it though, you'll see just how powerful this is. You can access many websites straight from within your app, and get the data right into your application.

C# Html Screen Scraping Part 1

1 comments

This is the first post of a two part series.

Source Code: http://www.box.net/shared/i2p7t9kxkt

Very often, when a particular website has some information you'd like to use in your application, you'd see if they have some kind of API which you can use to query their data. However, it's very common for a website either to not have an API altogether, or not have that little bit of info you need made available in their API. What's done generally to get around this is a technique knows as "Screen Scarping". (Screen Scraping is a general term, not just for the web, but for the purpose of this blog, when I say Screen Scarping, I mean HTML Screen Scraping).

The general gist of it is this: when a browser contacts a site, an HTML document is sent back to the browser. The browser then has the (tedious) task of parsing out that HTML and rendering it out to the screen. End of the day though, the HTML is just a text file. What screen scraping basically is, you write an app that "acts" like a web browser, meaning it contacts the web site, downloads the HTML file into memory, at which point you're free to parse it out any which way you like and extract the data you need. In .NET this is incredibly easy to do, and I'll demonstrate a simple sample:


private string GetWebsiteHtml(string url)
{
WebRequest request = WebRequest.Create(url);
WebResponse response = request.GetResponse();
Stream stream = response.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string result = reader.ReadToEnd();
stream.Dispose();
reader.Dispose();
return result;
}


Yup, that's pretty much it. First, you create a WebRequest object with the given URL. Then, you get a Response object out of that Request. Finally you get the response stream and read it with a StreamReader.

I attached a simple app so you can give it a whirl. Basically, it's a simple windows app with a textbox and a button. Enter any url in the textbox (make sure to write the full url including http://....) and hit the Go button. That will get you the entire HTML of that site, and display it in the richtextbox.

This is obviously a rough sample, make sure to add proper error handling, but other than that, it's pretty straightforward and real simple! The only thing to watch out for when scraping, is that your parsing code will rely on the HTML being formatted a VERY specific way. If the site changes in any way, your code WILL break.

This was a very simple post; in the next post, I'll take this much further, and demonstrate how we can actually POST to a server, and even get the cookies.