Wednesday, July 29, 2015

JavaScript anno 2015 – Gulp

Gulp is an automation tool used for build automation. Just like MSBuild, Nant, make, and pretty much any build automation tool out there, Gulp runs one or more tasks that you define.

Gulp is built on Node.js which means that you define your tasks in JavaScript. To get started you need to install Gulp and create a gulpfile.js which will be the home for your automation tasks:
npm install gulp -g
npm install gulp --save-dev
new-item gulpfile.js -type file
invoke-item gulpfile.js


(btw I’m using Powershell as my shell)

If you run those 4 commands you should now have your gulp file open and ready to define some tasks. Here is an example of a Gulp task;
var gulp = require('gulp');

gulp.task('default', function() {
    process.stdout.write("Gulp is running the default task\n");
});

Since Gulp is running in Node.js we can use Node’s process object to write to the console. To run this task just run the gulp command in your shell and the default task will run.

Gulp doesn’t really do much itself. In fact, there’s only 4 functions in the Gulp API:
- src
- dest
- task
- watch

Src & Dest


src and dest are filesystem operations. To read files from the filesystem, use gulp.src(). To write files, use gulp.dest(). The src function takes a glob pattern as input (using Node’s glob which again uses the minimatch library), so to read all js-files in a scripts folder you can use this syntax to recursively find all files:
gulp.src('Scripts/**/*.js')

The dest function takes a path as parameter and will output files to that folder (and create the folder if it doesn’t exists). Matching files that already exists will be overwritten. To create a simple copy task, we can chain the src and dest functions together, and Gulp uses the pipeline pattern to do this (similar to pipes in Powershell and bash):
gulp.src('Scripts/**/*.js').pipe(gulp.dest('Copies'));


This will copy all js-files from the Scripts-folder to a folder called Copies. Note that it will keep the directory structure from the source, so if the file index.js exists in a folder ‘src’ in the Scripts folder it will end up in ‘Copies\src\index.js’.

Task


If we want to make a task for the file copying above we could define it like this:
gulp.task('copy', function(){
    gulp.src('Scripts/**/*.js')
        .pipe(gulp.dest('Copies'));
});


To run a specific task, we just use the gulp command and send in the name of the task as parameter:
gulp copy


If you don’t provide a named task, Gulp will look for a task called ‘default’ and run it.

The task function has two required parameters and one optional;
- name: the name of the task
- function: the function that defines what the task do
- dependencies (optional): an array of strings that contain names of other tasks that should be run prior to this one

As with all build automation tools, tasks can be chained together. If we want the default task to depend on the copy task, we can define it like this:
gulp.task('default', ['copy'], function() {
    process.stdout.write("Gulp is running the default task\n");
});


Watch


The watch function is (as src() and dest()) filesystem related and it takes a glob pattern as input. With watch() you can get notified when a file is changed and act accordingly. For instance if we want the copy task to automatically copy files when changes occur, we can setup a watch for that:
gulp.task('watch', function() {
    gulp.watch('Scripts/**/*.js', ['copy']);
});
When this task is running all changed files will be copied. Any new files will also be copied, but since the copy function only copies (doh!) it will not remove deleted files in the source folder from the destination folder. If we want that we need to add a new task that deletes files and that reacts to a ‘deleted’ event:
gulp.task('watch', function() {
    gulp.watch('Scripts/**/*.js', function(event){
        if(event.type === 'deleted') {
            deleteFile(event.path);
        }
        else {
            gulp.start('copy');
        };
    })
});


Note: I haven’t shown the implementation of the function deleteFile as this is just an example on how to react to different types of events. The available event types are: changed, added and deleted.

Plugins


Since Gulp itself doesn’t do much it depends upon plugins to provide the usefulness. And there’s a lot of plugins to choose from. At the time of writing there’s 1690 plugins listed on the Gulp home page. Let’s start with one of the most popular of them; the JSHint plugin. Since Gulp is running on Node it’s off course using npm so we install gulp-jshint just like any other npm package:
npm install gulp-jshint --save-dev


Now back to the Gulp file and let JSHint do some error checking on our JavaScript code:
var jshint = require('gulp-jshint');

gulp.task('jshint', function(){
    gulp.src('./src/scripts/*.js')
        .pipe(jshint())
        .pipe(jshint.reporter('jshint-stylish'));
});


Alternatives


Gulp is not the only build automation / task runner for JavaScript. It’s predecessor is Grunt and it’s quite similar to Gulp, but the definition of tasks are configuration-based and therefore more verbose. You configure tasks instead of defining them as JavaScript functions. Gulp is also more flexible than Grunt. Personally I prefer the Gulp-way of doing it, but Grunt is still the most popular task runner AFAIK.

Cake is pretty much the same as Gulp, but uses Coffeescript instead of pure JavaScript (as a side note; you can actually use Coffeescript with gulp too, but that’s another story).

Broccoli seems to be the new kid on the block right now and it seems promising. For bigger projects the watch-and-rebuild can take a ‘long’ time (everything is relative), so Broccoli was designed to do incremental builds. That is; only build whatever has changed since the last build. Broccoli is still in it’s early stages (at version 0.16 at the time of writing) and the “Windows support is still spotty” according to their own documentation.

Last but not least; you don’t actually need a dedicated automation tool as npm can already do this for you. I’m not going to go into detail on this one, but you can check out this blog post by Keith Cirkel for more information.

Resources


Gulp plugins – Search for available plugins
Node Glob – Documentation for the glob syntax in Node
Automate your tasks easily with Gulp.js by Justin Rexroad
Smashing Magazine: Building with Gulp
For more information on Node.js and npm, take a look at my previous post in this Getting started with JavaScript series: JavaScript anno 2015 - Node and npm

Saturday, July 25, 2015

JavaScript anno 2015 – Node and npm

Node.js is server-side JavaScript; a “runtime environment for server-side and networking applications” [Wikipedia]. It’s open source as just about everything in the JavaScript world of frameworks and tools. Node is an amazing runtime and in just about 5 lines of code you can have a simple web server up and running:
var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(1337, '127.0.0.1');



We’re not using Node per se in our project yet. We’re running our ASP.NET web application on IIS, but it might be something we’ll be looking more at for specific tasks later on. But for now we’re using it implicit through the Node Package Manager.

Node Package Manager (NPM)


A package manager is responsible for installing, upgrading, configuring and uninstalling software packages for a given platform. NPM is a package manager for JavaScript, just like NuGet is a package manager for .NET, CPAN for Perl, Maven and Ivy for Java, RubyGems for Ruby, and so on.

NPM is bundled with Node so the way to get NPM on your machine is to install Node. With Node installed you can run npm commands from PowerShell or the command prompt.

NPM differentiates between modules and executables. When you install a module it will be placed in a node_modules folder, whereas executables are placed in a sub-folder called .bin. You can run the npm root and npm bin commands respectively to see where those folders exists on your local machine.

npm install                                        


NPM also differentiates between installing globally or locally. Locally means local to the folder you run the npm install command from (typically your project root folder). A good rule is to never install modules globally. Only executables that need to be available across many project should be installed globally, but to avoid versioning dependencies one should strive to install most packages locally. To install a package globally you just run the install command with the --global (or -g) option.

Example:

npm install browserify => Downloads and unpack the Browserify package to a local node_modules folder

npm install grunt-cli –g => Downloads and unpack the Grunt command line interface to the central .bin folder, and adds grunt.exe to your PATH.

Dependencies


Often a package uses other packages and these dependencies are expressed in the package.json file inside the package. So when npm installs a package it will look in the package.json file and install any package that they rely on. Therefore an install of for instance Browserify will install no less than 49 direct dependencies, which again install their dependencies.

Unlike NuGet the dependencies of a package will not be installed at the same directory level as their dependent. Instead they will be installed in a node_modules folder inside the Browserify folder. Inside each dependent module there might be other dependencies. In fact, installing Browserify will create no less than 87 (!) node_modules folder underneath the Browserify folder.

Placing all dependencies inside each module is a great way to prevent versioning issues between dependencies of different modules. In .NET and NuGet this wouldn’t work since there’s no way to reference 2 different versions of the same assembly into the same project. You can however reference assemblies that are dependent on different versions of the same assembly. The conflict between them is solved by assembly binding redirects in web/app.config, but it can lead to problems if a dependent assembly has breaking changes between versions. In the JavaScript world there is no concept of binary assemblies. The modules are just one or more js-files and as long as they are loaded within different scopes, they will not cause any conflicts.

There’s a lot more to say about dependencies in npm, but the one thing you need to be aware of is the difference between dependencies and devDependencies. The first one is all dependencies required to run, while the latter is additional dependencies needed for development (there’s also peerDependencies and bundledDependencies, but I won’t go into that here). Typically devDependencies will include unit tests, test harnesses, minification, transpilers, etc.

Package.json


If you run npm install without any package name, npm will look for a package.json in the directory where you run the install command. If it finds it npm will install all dependencies listed (including devDependencies). The package.json is similar to packages.config in NuGet, but I dare to say that npm seems a lot more sophisticated and solid than NuGet.

Typically you will have a package.json file in the root directory of your project and you will place all your dependencies there. A simple package file might look like this:
{
    “name”: “my-app”,
    “version”: “0.0.1”
    “dependencies”: {
        “browserify”: “11.0.x”  
    }
}

Note the ‘x’ in the Browserify version. This means that any 11.0-versions will do, so when you do a npm install (or update) 11.0.0, 11.0.1, etc, is OK, but not 11.1.0 or 12.0.0. Npm follows semantic versioning (semver) and the numbers means ‘major.minor.patch’. If you’re happy to trust that Browserify will be backward compatible across minor versions (which it should be if they follow semver), you can specify ‘11.x’ instead of ‘11.0.x’. If you just want the newest version – regardless of breaking changes (not recommended!), then you can just put an ‘x’ instead of ‘11.0.x’.

You could create the package.json file manually, but a better way is to use the init command:

npm init
Note; if you create the file manually be sure that the file is ASCII encoded. Unicode will result in a parsing error in npm and UTF-8 will result in the file not been updated (e.g. when running with the --save flag below).

You could also edit your package.json file manually and add all dependencies by hand, but again; it’s better to let npm handle this. You do this by appending a --save flag (-S for short) to the install command;

npm install browserify --save
If the package is only meant for development purpose and not applicable for the production environment, for instance testing frameworks, you can use the --save-dev instead;

npm install jasmine --save-dev
When you have a lot of packages installed, it’s a great chance that some of them share some dependencies. Because of npm’s hierarchical structure, it’s possible to optimize these shared dependencies by moving them further up the tree and thereby get rid of duplicated modules. The command for that is dedupe:
npm dedupe

If you want to remove any packages that is not in your package.json, you can run the prune command:

npm prune
If you run the prune command with the --production flag, all devDependencies will be removed (nice when deploying to production).

If you want to see all installed packages there’s a ls command for that:

npm ls
Note that this will list all top level packages as well as all their dependencies. If you’re only interested in the top level packages you can add the –depth 0 parameter to ls.

If you want to search for available packages, there’s a search:
npm search
…which also can do regular expressions. But for the most part it’s just easier to browse and search on npm’s home page.

npm update


Installing is just one side of the story. Once you’ve added any dependencies to your project you would like to keep them updated as well:

npm update
If you run it without specifying a particular package to update, npm will go through the package.json file and see if any newer versions are available. It will off course respect the versioning you’ve applied, so if won’t upgrade to a newer major version if you have specified that only minor versions are acceptable.

You can update a specific package by providing the name of the package. If the package is installed in the global scope you need to add the –g flag.

npm update will also download any missing packages, but you need to add the --dev flag to get all devDependencies. One thing to be aware of is that npm will not do any recursive update of all package dependencies. It will only update the top level packages, but you can force recursion with the --depth flag.

As with the install command you can let npm update your package.json file with the updated package versions;

npm update --save
If you want to check whether any new packages exists without updating them, you can run the outdated command:

npm outdated

npm uninstall


Removing packages is as easy as installing. Just run the uninstall command with the name of the package to remove and the package is gone;

npm uninstall browserify
As with install and update you can let npm update package.json:
npm uninstall browserify --save

Wrap up


The three major take-aways from this post should be that

1. Npm packages comes in two flavors; executables and modules. Executables are typically command line tools, while modules are libraries that you want to use in your code.

2. Npm has two modus operandi; global and local. In general you should install executables in the global scope and modules in the local.

3. Put a package.json file in the root of your project and add all of your project dependencies there.

What I haven’t talked about is configuring npm. There’s a lot to say about this, but I’m just going to keep it short and say that npm is highly configurable and I’ll just point you to the resources below.

Resources


npmjs.org – The home page for npm where you can search and browse for available packages
docs.npmjs.org – The documentation for npm is darn good if I may say so. I really recommend taking a look at it as I promise you’ll learn a lot from it.
As for configuring npm, here is a starting point for you.
For a great explanation of the difference between the various dependencies in npm, take a look at the top-voted answer to this question on StackOverflow.
To get some insight into the history of npm and why it is as it is, read through the answers from Isaac Schlueter (the main developer on npm) in this thread.
nodejs.org – The home page for Node.

Wednesday, June 3, 2015

Logging to SQL Server with Log4Net

How do you know what’s happening on your production servers? Logging off course (if you wonder; no, ‘debug & breakpoints’ is never the correct answer. Never ever. Ever.).
We have been using Log4Net as our logging tool for 3-4 years by now and I just wanted to share how we are using it and how incredibly powerful good logging can be.
First of all, if you are not familiar with Log4Net it is an open source, free-for-use logging framework under the Apache Foundation umbrella. Among its strengths is that it is fairly easy to get started with, it has a low impact on the application performance and it has a lot of adapters that lets you log to a lot of different destinations (console, file, database, event log, etc).
At the beginning we set up logging to console (for those systems that had console output) and file, but after a while we added logging to SQL Server. It is the combination of logs stored in a SQL database and full-text indexing of these logs that really gives us eyes in to what happens on our production servers.

Log to console

Logging to console is definitely the easiest way to get started with Log4Net. But writing to the console output is also the one that gives you least payback in form of long-term insight into your production systems. Log4Net can be configured either using xml or code, but xml is by far the most used. Typically you do the xml configuration in your app/web.config, but you can also keep the Log4Net configuration in separate xml files if you prefer. We chose the app/web.config approach and so the xml for console logging looks like this:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
    <configSections>
        <section 
            name="log4net" 
            type="log4net.Config.Log4NetConfigurationSectionHandler,Log4net" />
    </configSections>
    <log4net>
        <root>
            <level 
                value="DEBUG" />
            <appender-ref 
                ref="ConsoleAppender" />
        </root>
        <appender 
            name="ConsoleAppender" 
            type="log4net.Appender.ConsoleAppender">
            <layout 
                type="log4net.Layout.PatternLayout">
                <param 
                    name="ConversionPattern" 
                    value="%d [%t] %-5p [%x] - %m%n" />
            </layout>
            <filter 
                type="log4net.Filter.LevelRangeFilter">
                <param 
                    name="LevelMin" 
                    value="DEBUG" />
                <param 
                    name="LevelMax" 
                    value="FATAL" />
            </filter>
        </appender>
    </log4net>
</configuration>

You can do this configuration in code as well, but the great benefit of using xml for the configuration is that you can change the settings (for instance the log level threshold) without re-deploying your application. In the case of web hosts you can even change it without restarting the application. If you’ve been a good boy/girl and set up debug-level logging in your code, you can just flip an xml-switch and additional log entries will start flowing in.

Log to file


If you want your logs to survive application restarts (and the console window buffer) and/or have an application that doesn’t have console output, logging to file would be the next step on logging ladder.
The main thing to keep in mind when logging to file is to set limits on how large each log file can get. Log4Net has some defaults that might not suit your situation so be sure to check out the documentation on how you can configure logging to file. For one of our systems we chose to have a 10 mb limit on each file which you can see in this xml config:
<log4net>
    <root>
        <level value="DEBUG" />
        <appender-ref ref="LogFileAppender" />
    </root>
    <appender 
        name="LogFileAppender" 
        type="log4net.Appender.RollingFileAppender">
        <param 
            name="File" 
            value="logs.txt" />
        <param 
            name="AppendToFile" 
            value="true" />
        <!-- Logfiles are rolled over to backup files when size limit is reached -->
        <rollingStyle 
            value="Size" />
        <!-- Maximum number of backup files that are kept before the oldest is erased -->
        <maxSizeRollBackups 
            value="10" />
        <!-- Maximum size that the output file is allowed to reach before being rolled over to backup files -->
        <maximumFileSize 
            value="10MB" />
        <!-- Indicating whether to always log to the same file -->
        <staticLogFileName 
            value="true" />
        <layout type="log4net.Layout.PatternLayout">
            <param 
                name="ConversionPattern" 
                value="%-5p%d{yyyy-MM-dd hh:mm:ss} – %m%n" />
        </layout>
    </appender>
</log4net>

The above config specifies that maximum 100 mb of logs will be kept on file (10 mb pr file and max 10 files).

Log to console and file


There is no problem logging to both console and file simultaneously and you can even set different log levels on each appender. If you want to have different files for different log levels (e.g. ‘debug.log’, ‘info.log’, etc), you can just configure as many file appenders as you need. Here is an example of logging to both console and file at the same time:
<log4net>
    <root>
        <level value="INFO" />
        <appender-ref ref="LogFileAppender" />
        <appender-ref ref="ConsoleAppender" />
    </root>
    <appender name="LogFileAppender" type="log4net.Appender.RollingFileAppender">
        <filter type="log4net.Filter.LevelRangeFilter">
            <param name="LevelMin" value="WARN" />
            <param name="LevelMax" value="FATAL" />
        </filter>
        ...
    </appender>
    <appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender">
        ...
    </appender>
</log4net>    

The default log level is set to INFO, which means that unless otherwise specified in the appenders, messages with level INFO, WARN, ERROR and FATAL will be logged. The file appender is set to only log WARN, ERROR and FATAL though.

Log to SQL Server


As already mentioned the logging to file and console is easy to get started with and does not take much effort to set up. Setting up logging to a database takes a bit more work, but it is far from difficult. Here is how we configured logging to a SQL database from one of our web hosts:
<root>
    <level value="DEBUG" />
    <appender-ref ref="AdoNetAppender" />
</root>
<appender 
    name="AdoNetAppender" 
    type="log4net.Appender.AdoNetAppender">
    <threshold>INFO</threshold>
    <bufferSize 
        value="50" />
    <connectionType 
        value="System.Data.SqlClient.SqlConnection, System.Data, Version=1.0.3300.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />
    <connectionString 
        value="data source=SERVERNAME;initial catalog=DATABASE;integrated security=false;persist security info=True;User ID=USERNAMEN;Password=PASSWORD" />
    <commandText 
        value="INSERT INTO Logs ([Date],[Thread],[Source],[Level],[Logger],[Message],[Exception],[HostName]) VALUES (@log_date, @thread, 'LOG SOURCE',@log_level, @logger, @message, @exception, @hostname)" />
    <parameter>
        <parameterName value="@log_date" />
        <dbType value="DateTime" />
        <layout type="log4net.Layout.RawTimeStampLayout" />
    </parameter>
    <parameter>
        <parameterName value="@thread" />
        <dbType value="String" />
        <size value="255" />
        <layout type="log4net.Layout.PatternLayout">
            <conversionPattern value="%thread" />
        </layout>
    </parameter>
    <parameter>
        <parameterName value="@hostname" />
        <dbType value="String" />
        <size value="255" />
        <layout type="log4net.Layout.PatternLayout">
            <conversionPattern value="%property{log4net:HostName}" />
        </layout>
    </parameter>
    <parameter>
        <parameterName value="@log_level" />
        <dbType value="String" />
        <size value="50" />
        <layout type="log4net.Layout.PatternLayout">
            <conversionPattern value="%level" />
        </layout>
    </parameter>
    <parameter>
        <parameterName value="@logger" />
        <dbType value="String" />
        <size value="255" />
        <layout type="log4net.Layout.PatternLayout">
            <conversionPattern value="%logger" />
        </layout>
    </parameter>
    <parameter>
        <parameterName value="@message" />
        <dbType value="String" />
        <size value="-1" />
        <layout type="log4net.Layout.PatternLayout">
            <conversionPattern value="%message" />
        </layout>
    </parameter>
    <parameter>
        <parameterName value="@exception" />
        <dbType value="String" />
        <size value="-1" />
        <layout type="log4net.Layout.ExceptionLayout" />
    </parameter>
</appender>
    </log4net>

The xml config is the same whether you are configuring logging in web- or app.config (you need to insert your own values for servername, database and login).
The main thing to point out here is the ‘buffer’ element, which tells Log4Net how many log entries to buffer up before writing them the database. There isn’t any correct number here and you need to figure out what suits your environment the best. The trade-offs are performance versus reliability, since a low buffer will take more resources because of the many writes to the database table (and yes, we learned that the hard way off course). A high buffer limit will be less reliable because if your application crashes, the logs not yet written will never be written.
Also; it might make sense to have different buffer limits for different environments. In the development and test/QA environments, a low limit might be preferable since the logs will be written faster to the database. And since the number of log entries will be far less than in the production system, it might be long time to wait for the logs to be available if you run with the same limits as in production. In a production environment, instant logs are in most cases not relevant and performance is more critical. Then again, reliability is also a good thing so you need to find a good trade off.
Another thing to notice is that we have a lot of subsystems (web hosts, windows services, message bus, cron jobs, etc) that logs to the database. To know where the logs come from we add the ‘LOG SOURCE’ as the name of the subsystem where the config is defined in (e.g ‘CommandsHost’ as the web host that receives commands from our application).
To get the logs into a database, you will need to create a table that matches the log entry that you have defined in the appender config. Here is the t-sql to create a table that matches the above config:
CREATE TABLE [dbo].[Logs](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Date] [datetime] NOT NULL,
    [Thread] [varchar](255) NOT NULL,
    [Level] [varchar](50) NOT NULL,
    [Logger] [varchar](255) NOT NULL,
    [Message] [nvarchar](max) NOT NULL,
    [Exception] [nvarchar](max) NULL,
    [Source] [varchar](100) NULL,
    [HostName] [nvarchar](255) NULL
CONSTRAINT [PK_Log] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)

Xml transforms


Using xml transforms is an easy way to set up different settings for different environments. For web projects this is built into Visual Studio and MSBuild/MSDeploy, so the tooling support for this is pretty good. The only caveat is that the transformation is only run during deployment – not during the build. So if your switching between different build configs in Visual Studio, the web host on your dev machine will only use the web.config – not any of the web.debug.config, web.release.config, etc (unless you are actually deploying to your local IIS).
If you are developing a console/WPF/WebForms application you still can take advantage of the same xml transform as web projects, but the tooling is not built into Visual Studio or MSBuild/MSDeploy. There is however an excellent free tool (VS extension) called SlowCheetah developed by Sayed Ibrahim Hashimi that will do this for you. You can download it as a Visual Studio extension, and it has an extra gem that Visual Studio doesn’t have; transformation preview.

SQL Server Full-Text search


The real power when it comes to database log entries is when you pair it with full-text searching. Full-text search will require quite a bit of resources in the form of hardware (disk, memory, cpu), but you don’t have to (and shouldn’t) set up the full-text indexing on your production database server. Instead you should set up log shipping in SQL Server (or some other form of pulling the logs off your production servers) and then do your full-text indexing and searching on a separate database server.

Pair full-text search of logs with a message based (event driven) system, and you have an incredible insight to your production system and an invaluable, searchable history.

Resources

Log4Net: http://logging.apache.org/log4net/

SlowCheetah: https://visualstudiogallery.msdn.microsoft.com/69023d00-a4f9-4a34-a6cd-7e854ba318b5















Monday, August 11, 2014

Getting started with Powershell Desired State Configuration (DSC)

I wanted to try out the DSC in Powershell 4.0 on my Windows 8.1 Pro machine, but I got stuck on the ‘getting started’ part. I just couldn’t figure out how to generate the actual configuration files (.mof-files).
I google around quite a bit before I finally got it; the configuration file that you create is off course just like a normal script file in the sense that it doesn’t actually do anything. All it does is to define a function that you need to call!
So in order to actually generate the .mof-files, I ‘dot-sourced’ the script into the current session and called the function from the ps1-file.
Here’s an example configuration file called ‘demoConfig.ps1’:
configuration Demo
{
    Node localhost
    {
        File TestFiles
        {
            SourcePath = "c:\temp\test.txt"
            DestinationPath = "c:\temp\testdir"
            Ensure = "Present"
            Type = "File"
        }
     }
}

And to generate the .mof files:
PS C:\temp> . .\demoConfig.ps1
PS C:\temp> Demo

Tuesday, June 25, 2013

T-SQL joins

What's the difference between an inner and full join in T-SQL? Or a right versus left join? I never have this at the top of my head when I need it, so for future references I've assembled a little example that shows the resulting difference between them.
Given the following t-sql:
declare @t1 table (id int)
declare @t2 table (id int)
 
insert into @t1 values(1),(2),(3)
insert into @t2 values(3),(4)
 
select 't1' as 'Table name', * from @t1
select 't2' as 'Table name', * from @t2
 
select 'inner join' as 'Join', t1.id as 'Left', t2.id as 'Right'
from @t1 as t1 inner join @t2 as t2 on t1.id = t2.id
 
select 'left join' as 'Join', t1.id as 'Left', t2.id as 'Right'
from @t1 as t1 left join @t2 as t2 on t1.id = t2.id
 
select 'right join' as 'Join', t1.id as 'Left', t2.id as 'Right'
from @t1 as t1 right join @t2 as t2 on t1.id = t2.id
 
select 'full join' as 'Join', t1.id as 'Left', t2.id as 'Right'
from @t1 as t1 full join @t2 as t2 on t1.id = t2.id

This is the result from the joins:


Wednesday, November 28, 2012

Unit testing asynchronous operations with the Task Parallel Library (TPL)

Unit testing asynchronous operations has never been easy in C#. The most common methods (or at least the methods I usually end up with) is either;
  1. Write a synchronous version of the method to test, unit test this one and then call the synchronous method from another method that runs it asynchronous in the production code.
  2. Raise an event in the production code when the asynchronous operation has finished, subscribe to the event in the unit test, and use the ManualResetEvent to wait for the event before making any assertions.
Neither is a good solution.
Writing a synchronous version and let the production code call it is probably the easiest one, but breaks down once you need to do more than just call the synchronous method in production (e.g. orchestrating several dependent asynchronous operations, or have some logic run when the asynchronous operation(s) completes). And the worst part of it; a vital part of the production code will be untested.
The ManualResetEvent is better, but it takes a lot more code, makes the unit tests harder to read and you need to fire events in the prod code that possibly only unit tests are interested in. And unit tests dependent on ManualResetEvent tends to be fragile when run in parallel.
But with the Task Parallel Library (TPL) the table has turned; TPL makes unit testing asynchronous code a lot easier. That is; it’s easy if you now how to do it.
Running some code asynchronously without any concerns for testability is pretty straight forward with TPL:
Task.Factory.StartNew(MyLongRunningJob);
And in fact; it’s not much harder to make it test-friendly. You only need a bit insight into what’s going in the Task Factory. And to have it straight from the horse’s mouth; here’s what MSDN says about it:
Behind the scenes, tasks are queued to the ThreadPool, which has been enhanced with algorithms (like hill-climbing) that determine and adjust to the number of threads that maximizes throughput. This makes tasks relatively lightweight, and you can create many of them to enable fine-grained parallelism. To complement this, widely-known work-stealing algorithms are employed to provide load-balancing.

The Task Factory will use a Task Scheduler to queue the tasks and the default scheduler is the ThreadPoolTaskScheduler, which will run the tasks on available threads in the thread pool.

The trick when unit testing TPL code is to not have those tasks running on threads that we have no control over, but to run them on the same thread as the unit test itself. The way we do that is to replace the default scheduler with a scheduler that runs the code synchronously. Enter the CurrentThreadTaskScheduler;

public class CurrentThreadTaskScheduler : TaskScheduler
{
    protected override void QueueTask(Task task)
    {
        TryExecuteTask(task);
    }

    protected override bool TryExecuteTaskInline(
       Task task, 
       bool taskWasPreviouslyQueued)
    {
        return TryExecuteTask(task);
    }

    protected override IEnumerable<Task> GetScheduledTasks()
    {
        return Enumerable.Empty<Task>();
    }

    public override int MaximumConcurrencyLevel { get { return 1; } }
}

TaskScheduler is an abstract class that all schedulers must inherit from and it only contains 3 methods that needs to be implemented;
  1. void QueueTask(Task)
  2. bool TryExecuteTaskInline(Task, bool)
  3. IEnumerable<Task> GetScheduledTasks()
In the more advanced schedulers like the ThreadPoolTaskScheduler, this is where the heavy-lifting of getting tasks to run on different threads in a thread-safe manner happens. But for running tasks synchronously, we really don’t need that. In fact, that’s exactly what we don’t need. So instead of scheduling tasks to run on different threads, the TryExecuteTaskInline method will just execute them immediately on the current thread.

Now it’s time to actually use it in the production code;

public TaskScheduler TaskScheduler
{
    get
    {
        return _taskScheduler
            ?? (_taskScheduler = TaskScheduler.Default);
    }
    set { _taskScheduler = value; }
}
private TaskScheduler _taskScheduler;

public Task AddAsync(int augend, int addend)
{
    return new TaskFactory(this.TaskScheduler)
        .StartNew(() => Add(augend, addend));
}

To be able to inject a different TaskScheduler from unit tests, I’ve made the dependency settable through a public property on the class I’ll be testing. If no TaskScheduler has been explicitly set (which it won’t be when executed ‘in the wild’), the default TaskScheduler will be used.

The method Task AddAsync(int, int) is the method we would like to unit test. As you can see it’s a highly CPU intensive computation that will add 2 numbers together. Just the kind of work you’d want to surround with all the ceremony and overhead of running asynchronously.

The important part here is the instantiation of the TaskFactory that will take the TaskScheduler as a constructor parameter.

With that in place we can set the TaskScheduler from the unit tests:

[Test]
public void It_should_add_numbers_async()
{
    var calc = new Calculator
    {
        TaskScheduler = new CurrentThreadTaskScheduler()
    };

    calc.AddAsync(1, 1);

    calc.GetLastSum().Should().Be(2);
}

The System Under Test, SUT, is the Calculator-class that has the AddAsync-method we’d like to unit test. Before calling the AddAsync-method we set the CurrentThreadTaskScheduler that the TaskFactory in the Calculator should use.

Since AddAsync doesn’t return the result of the calculation, I’ve added a method to get the last sum. Not exactly production-polished code, but it’ll do for the purpose of this example.

Anyway; the end result is that the test pass. And if I don’t assign the CurrentThreadTaskScheduler to Calculator.TaskScheduler – that is it runs with the default ThreadPoolTaskScheduler – it will fail, because the addition will not be finished before the assertion.

But don’t trust me on this. I’ve uploaded the complete (absurd) example to GitHub, so you can run the tests and see for yourself; https://github.com/bulldetektor/TplSpike.

References


You can read the MSDN-article that I quoted from here; http://msdn.microsoft.com/en-us/library/dd537609.aspx

I found the code for the CurrentThreadTaskScheduler in the TPL samples here; http://code.msdn.microsoft.com/windowsdesktop/Samples-for-Parallel-b4b76364. The samples contains a dozen or so TaskSchedulers, for instance;


  • QueuedTaskScheduler - provides control over priorities, fairness, and the underlying threads utilized
  • OrderedTaskScheduler - ensures only one task is executing at a time, and that tasks execute in the order that they were queued.
  • ReprioritizableTaskScheduler - supports reprioritizing previously queued tasks
  • RoundRobinTaskSchedulerQueue - participates in scheduling that support round-robin scheduling for fairness
  • IOCompletionPortTaskScheduler - uses an I/O completion port for concurrency control
  • IOTaskScheduler - targets the I/O ThreadPool
  • LimitedConcurrencyLevelTaskScheduler - ensures a maximum concurrency level while running on top of the ThreadPool
  • StaTaskScheduler - uses STA threads
  • ThreadPerTaskScheduler - dedicates a thread per task
  • WorkStealingTaskScheduler - a work-stealing scheduler, not much more to say about that

Monday, October 1, 2012

Commit-Driven Development

Test-Driven Development, TDD, is the art of writing a test before you start implementing anything in the production code. TDD is often known as Test-First Development.

Behavior-Driven Development, BDD, is the art of describing a behavior in an executable format before you start implementing anything in the production code. Since these behavior-descriptions are often written as scenarios within a feature, BDD is also known as Scenario-Driven Development.

Commit-Driven Development is the art of writing a comment for your next commit before you start implementing. You could say it’s Comment-First Development.

So why all this emphasis on “Something”-First development? Does the order of things really matter that much? As it turns out; it really does. It helps you focus.

By writing a test first – before any of the logic is implemented – you say to yourself; this is what I’m going to do the next 5-10 minutes. I will focus solely on getting this test to past. Until then I will not do any refactoring or touch any other parts of the code base.

The same thing goes with a scenario; until this scenario pass, I will not focus on anything else. When the tests pass, I can rename classes/methods/variables, move methods in to new/other classes, extract smaller methods, and so on.

By writing the comment for the next commit first, you get the same benefit as test- and scenario-first; Focus. When you have a clean sheet with not pending, un-committed changes and you write that comment first, your telling yourself (and your subconscious) that this – and preferably only this – is what I will work on now.

If you get interrupted along the way and you lose track of what it was you actually was supposed to work on, you can just take a look on the commit comment and your right back on track.

When I leave the office I usually try to end the day with a failing test. I write the test, see it fail, and call it a day. Next morning I can just run the test suite and I know exactly where to start coding. The context switch gets really cheap. With the pending commit comment I also get the bigger picture of what I was working on; I know when I should be done and commit my changes.

Added benefit

There’s another side to writing the commit comment first; it makes it easier to write better comments.

When you write a commit comment after you’ve made all changes, it’s easy to fall into the I-did-this-then-that style of comment. Take a look at this fairly common change-log;

  • Fixed bug #123: Error when saving customer
  • Increased first name max length
  • Added field on customer
  • Added Cancel-button on customer list

Implicitly these comments say ‘I fixed bug…’, ‘I increased…’, etc. Problem is; I don’t care if you did it. I already know that from the change-log. What I want to know is; how does the behavior of the system differ from prior to this commit?

Writing the comment first – before you actually do something – makes it easier to compose comments where the focus is on the behavior of the system. You write the comments describing how the system should act when you’ll do the commit later on.

Here’s an attempt to re-write the comments above;

  • When the user tries to save a customer with an invalid email address, then an error message will be displayed (bug #123)
  • Max length of a customer first name increased from 64 to 256 characters
  • Corporate customers can be assigned a contact person
  • When loading the customer list takes too long, then clicking the Cancel-button will cancel any further loading. All customers loaded at the time of cancellation will be displayed in the list.

I don’t say that you couldn’t have written these commit comments even if you did it on the time of committing the changes. It’s just a lot easier to write them if you do it upfront. And it’s a lot easier to verify that you did what you set out to do, than it is to try to figure out what you’ve actually done when it’s time to commit.

When I write my commit comment I try to think of them as release notes. Preferably I could just extract all comments from the repository log since last deploy and paste them right into the release notes for the new deploy.

References

This blog post is highly influenced by Arialdo Martini and his excellent post “Preemptive commit comments”. If you haven’t already, please go read it now.

Monday, October 31, 2011

Auto-Wiring EventAggregator Subscription in Caliburn.Micro

 

Just wanted to make a quick note about how to get auto-wiring of the EventAggregator subscription up and running for Caliburn.Micro. What I want to accomplish is to avoid having to write this:

CaliburnMicro_AutoEA_1

… and instead make this "just happen" when a type implements IHandle. And as you can see from the code above, the IoC I use here is MEF.

So I haven't used MEF before, but I found this post ("Introduction to InterceptingCatalog – Part I") by Piotr Włodek and figured that with a little bit of tweaking this should work.

This code relies on the MEFContrib project up on CodePlex/GitHub, so if you haven't already downloaded it you can get it from there or just NuGet-it into your project;

CaliburnMicro_AutoEA_2

To be able to get any class that implements IHandle to list itself as a subscriber in the EventAggregator, we need to hook into the creation pipeline in MEF. And MEF hasn't any hooks that let us do this, but fortunately MEFContrib has and it's called an InterceptingCatalog.

The InterceptingCatalog takes two arguments; a ComposablePartCatalog and an InterceptionConfiguration. It's the InterceptionConfiguration that let's us provide an interceptor that can do the auto-wiring for us. But first, let's create the class that will do the interception - the EventSubscriptionsStrategy:

CaliburnMicro_AutoEA_3

This object creation strategy will be added to the MEF creation pipeline. This class will be called for every object resolved from MEF, but in this case we're only interested in those who implements the IHandle interface. So if the casting succeeds we now that this is class that will want to subscribe to events. So by using the Intercept method from the IExportedValueInterceptor interface, we can tell the EventAggregator that this object is an event subscriber.

The only thing missing then, is to plug our EventSubscriptionStrategy into MEF;

CaliburnMicro_AutoEA_4

This is from the default AppBootstrapper from Caliburn.Micro with the changes I made to get the EventSubscriptionStragegy registered in MEF marked in red.

References

Caliburn.Micro: http://caliburnmicro.codeplex.com/ - This is a WPF/SL/WP7 framework along the same lines as PRISM from Microsoft Patterns & Practices. Only much smaller (in size, not necessarily feature set) and much more "opinionated".

MEF Contrib: http://mefcontrib.codeplex.com/ and https://github.com/mefcontrib - Open source extensions to the Managed Extensibility Framework (MEF). In other words; extending the extensibility with extensions…

Tuesday, August 30, 2011

Create & Boot VHD

This is just quick list for creating and booting from a Virtual Hard Disk (VHD) in Windows 7.

1a. Creating VHD from scratch


If you’re creating a VHD from scratch (that is not a child-disk based on another VHD) this is the commands you need to run from command prompt:
:\> diskpart

Microsoft DiskPart version 6.1.7601
Copyright (C) 1999-2008 Microsoft Corporation.
On computer: xxxxxxxx


DISKPART> create vdisk file=[FILEPATH] type=expandable maximum 50000

  100 percent completed

DiskPart successfully created the virtual disk file.

DISKPART> exit

The ‘:\>’ means command prompt, typically it would be ‘c:\>’ or something similar, but the actual directory you’re in doesn’t matter for the command.

Running the command ‘Diskpart’ will start the DOS-utility that we’ll use for creating the VHD. When diskpart starts, the command prompt will change to ‘DISKPART>’. Now you can type the command ‘create vdisk’ to create the actual VHD.

The command has only two required arguments; ‘file’ and ‘maximum’. The ‘file’-argument specifies where to output the vhd and the name of the vhd-file. So [FILEPATH] in the example above will typically be ‘D:\VHD\MyNewVhd.vhd’.

The ‘maximum’ argument specifies the size of the VHD in megabytes. You can choose between to types of disks (the ‘type’ argument); either fixed, meaning that the VHD file will be created with the size specified by the ‘maximum’ argument, or expandable, which means it will only be as big as the data on the disk requires but not larger than ‘maximum’. The default is fixed and if you have lots of space available it’s the recommended setting as it’s a bit faster than expandable disks. But on my laptop I do not have unlimited storage, so here I’ll go with the expandable option and setting the maximum file size to about 50 GB. It’s also a bit cheaper to do backups of smaller files, so expandable has it’s advantage there.

1b. Creating a differencing VHD


If you already have a base-VHD that you want to use as a parent-disk, this is the commands you need to run from command prompt:

:\> diskpart

Microsoft DiskPart version 6.1.7601
Copyright (C) 1999-2008 Microsoft Corporation.
On computer: xxxxxxxx


DISKPART> create vdisk file=[FILEPATH] parent=[PARENT_FILEPATH]

100 percent completed

DiskPart successfully created the virtual disk file.

DISKPART> exit

As with the 1a) option above, the [FILEPATH] specifies the path and name of the disk to create. The ‘parent’ argument should be pretty obvious and [PARENT_FILEPATH] is the fully qualified name of the existing VHD you want to use as base-image.

Note that you cannot specify ‘maximum’ or ‘type’ as arguments for a differencing disk because the size of the child disk is set from the parent. It’s also possible to merge a diff-disk with it’s parent at a later point, to create a new VHD.

Remember to make your parent-disk read-only before you use it for differencing disk. Failing to do so and then accidentally starting up or booting directly in to your parent VHD, will render your differencing disks useless (and yes; I learned that the hard way).

Another gotcha to be aware of; if you want to boot in to the new differencing disk, it needs to be located on the same disk volume as the parent disk. They can be in different folders, but they cannot reside on another volume – not even if the volumes is on the same physical disk.

2. Boot it up!


When the VHD is ready to use, issue these commands to add them to your boot menu;
:\>bcdedit /copy {current} /d “My virtual boot entry”

The entry was successfully copied to {df399c86-f723-11df-85b6-8d1c50594e14}.

:\>bcdedit /set {df399c86-f723-11df-85b6-8d1c50594e14} device vhd=[locate]\Path\To\Disk.vhd

The operation completed successfully.

:\>bcdedit /set {#GUID#} osdevice vhd=[locate]\Path\To\Disk.vhd

The operation completed successfully.

:\>bcdedit /set {#GUID#} detecthal on

The operation completed successfully.

If you run the ‘bcdedit’ command without any arguments, you should be seeing your new boot entry at the bottom. And it should look something like this;

Windows Boot Loader
-------------------
identifier              {df399c86-f723-11df-85b6-8d1c50594e14}
device                  vhd=[locate]\Path\To\Disk.vhd
path                    \Windows\system32\winload.exe
description             My virtual boot entry
locale                  en-US
inherit                 {bootloadersettings}
recoverysequence        {df399c79-f723-11df-85b6-8d1c50594e14}
recoveryenabled         Yes
osdevice                vhd=[locate]\Path\To\Disk.vhd
systemroot              \Windows
resumeobject            {df399c77-f723-11df-85b6-8d1c50594e14}
nx                      OptIn
detecthal               Yes


The first command, ‘bcdedit /copy’ will copy the default boot entry and create a new with the name specified by the /d argument. Then you’ll use the /set argument to modify the new entry. You’ll need to copy the id that will be displayed after the initial copy command, and input this as the first argument for the /set command.

There’s 3 things that need to be set; the device, the osdevice and the detecthal. The first to are similar and takes the path to the VHD you want to boot from as input. Note the ‘[LOCATE]’ syntax in the path; this will tell the boot manager to figure out which drive to locate the VHD on. So instead of ‘vhd=d:\path\to\disk.vhd’, you  need to enter ‘vhd=[locate]\path\to\disk.vhd’.

A little tip if you have spaces in the path or filename for the VHD; surround the path with apostrophes (‘).

The last /set command will give some instructions to the kernel to detect certain hardware information (HAL = Hardware Abstraction Layer), which is needed on some x86-based system.
And that’s it; Ready to boot!

Resources


In some of my earlier posts I wrote a bit about how to compact VHDs, how to automate Windows install on VHDs and some pros and cons with virtual machines.

Also, David Loongnecker has some good tips in his blog post Tips for Booting/Using VHDs in Windows 7.

Thursday, August 4, 2011

Compact virtual space

Keeping a virtual machine tidy will make it run faster and the VHD file itself smaller. Here’s a little list of things you can do to tidy it up a bit.

The usual suspects


First things to do is what you’ll do on any machine – virtual or not – to keep things as optimal as possible; The usual maintenance jobs includes deleting temporary files, emptying the recycle bin, cleaning up the registry, etc. I’ve found Glary Utilities (links at the end of this blog post) to be a good tool for this. Just install the free version and run the ‘Scan for issues’ under ‘1-click maintenance’.

If this is a fresh VHD which haven’t been used much you’ll probably won’t gain much from the usual housekeeping, but defragmenting and checking the disk (chkdsk) and Windows system files (sfc /scannow) seldom hurts.

Shrink wrap


To shrink the size of the VHD file you can compact the disk, but before you do that you need to pre-compact it. Compacting the disk is an operation you’ll do on the VHD-file, while pre-compacting is something you’ll do on the running virtual machine to make the compacting operation as effective as possible.

To pre-compact the VHD you need to run the precompact.exe which you can find on an ISO in the Virtual PC install directory (on my box the precompact.iso is in the folder c:\program files (x86)\windows virtual pc\integration components, but the exact location will vary depending on your OS and Virtual PC version).

precompact.exe must be run inside the virtual machine. You can either boot into it or fire it up in Virtual PC. Once you’re running the virtual machine you need to open the command line and navigate to the folder where that precompact.exe resides. Make sure you’re running with administrator privileges and run the following command:

:\> precompact –Silent –SetDisks:C

The last parameter specifies which disks to pre-compact and if not specified it will pre-compact all disks. In my case that wouldn’t be very wise, since I also have access to the host partition from my virtual machines.

When pre-compacting is done (can take quite some time depending on the size of your disk(s)), shut down the virtual machine.

Now it’s time for the actual compact process, so from your host OS fire up the command line and start the disk part utility;

:\> diskpart

When running the diskpart, enter the following commands (assuming the VHD file is named ‘parent.vhd’ and is in a folder called ‘vhd’ on partition ‘d’);

DISKPART> select vdisk file=”d:\vhd\parent.vhd”
DISKPART> attach vdisk readonly
DISKPART> compact vdisk
DISKPART> detach vdisk
DISKPART> exit

And that’s about it. You should now have a small and fast virtual machine (everything’s relative, right?)

Resources


Glary Utilities – Optimization software for Windows. Comes in two flavors; a free version for the basic optimization that’ll be good enough for most users and a paid version with more advanced tools.

Thursday, May 26, 2011

Installing Windows is boring

So why not automate it?
automatic-for-the-people-by-rem
I’m a big fan of virtual machines and especially booting from VHD natively in Windows 7 (as you might have guesed from my previous post).

On those virtual machines I run Windows 7 for the most part. Creating new VHD is pretty easy, but installing the OS and everything you need is a tedious task.

One way to solve this is to create a parent VHD with Windows 7 and then create child VHD based on this one. This is what I do for the most part, but sometimes I just need a fresh install of Windows 7.

I could off course do this by creating a blank VHD, boot into it and install Windows 7. But where’s the fun in that? Everything boring must be automated, so here’s the way to make your life just a little less boring. 
  1. Grab your copy of Windows 7 and extract/copy the content of the install disk into a local directory (if it’s a ISO file, use a tool like 7-zip). For exemplification I’ll use ‘d:\installs\win7’ as the directory containing the extracted Win7.
  2. Download the ISO Windows Automated Installation Kit (AIK) for Windows 7. As the name implies, this one is needed for automating the installation of Win7.
  3. Extract the AIK files from the downloaded ISO file (or burn it to disk) and install it.
  4. Download a neat little tool called WIM2VHD (which stands for Windows Image to Virtual Hard Disk), which is the tool that will actually do the automation for us. 
The WIM2VHD download is a Windows Script File (.wsf), which you will run after you’ve found out the correct SKU on your Windows Install Media (WIM). A Windows 7 installation disk may contain one or more versions (or stock-keeping units a.k.a. SKUs), e.g. Home, Premium, and Ultimate. To be able to install from the installation source extracted to ‘d:\install\win7’, the automation tool needs to know which version you intend to install. And for that you need a tool called ImageX, which came with the AIK you just installed.
  1. Go to the ‘tools’ folder in the AIK install folder (defaults to ‘c:\program files\windows aik\tools’)
  2. Find the appropriate version of the ImageX (if you’re running on a 32-bit OS that would be in the ‘x86’-folder, for 64-bit it’s the ‘ia64’) and copy it to the same folder as the WIM2VHD script.
  3. Run ImageX to find out which SKU to install
  4. Then run the WIM2VHD script with the the path to the WIM and the desired SKU as params;
cscript wim2vhd.wsf /wim:d:\installs\win7 \sources\install.wim /sku:ultimate

The script will then start making you a brand new VHD in the same folder you’re running the script from (alternatively you can add a /vhd param to the command above to specify a path to the output vhd).

Some minutes later you’ll have a (almost) pre-installed VHD with Windows 7.

Tuesday, August 31, 2010

Waking up to a New Virtual Reality

My first experience with a virtualized developer environment was back in 2007. I was taking over a project that had been in prod for a a year or so. It wasn’t a very large project and I was just in to fix some bugs and add some features.

But anyone who has been added to a project ‘just’ to add some value knows it can be a tedious affair just to get the development environment up and running. This was a project based on EpiServer (a SharePoint’ish Content Management System) running on ASP.NET 2.0 on the frontend and Sql Server 2005 in the backend.

New VRI would probably have had to spend at least 2 or 3 days just to get my dev machine setup for this project. That would be; if I could get my dev environment set up.

I had never worked on EpiServer before and so just set this up would have been a long and windy road. The version we were running was a couple of versions old and the web contained little to none information on the version we were running. Even getting the correct installer would probably take a couple of days…

Lucky for me this project had been using VMware to virtualize both the developer, test and staging environments, and so getting my first dev build up was merely a matter of installing the VMware client, copy over the virtual image and start it up. Almost too easy!

The dev machine in this case was a Windows XP image with Visual Studio 2008 and Sql Server 2005. An environment quit fit for virtualization. My next project was onsite at a customer with an already setup up machine, and so I didn’t have the chance to virtualize anything there. But starting on a new project again in late 2008, a decided to give VMware a try again.

This time it was on a Vista box (with bitlocker) and a dev environment that also required Vista. Needless to say; Vista rendered, to my big disappointment, useless to both be virtualized and to be the virtualization host. At least for a developer environment.

Then came Windows 7 around and I saw some blog posts on how you could boot directly off a Virtualized Hard Drive (VHD). Meaning you would only suffer about 3-5% performance loss due to virtualization. The only hardware that is actually virtual is the hard drive. Everything else, CPU, memory, graphic card, network, usb, are all non-virtualized. You’re running directly off the hardware.

I didn’t take the time to test out VHD boot as long as I had my dev environment already set up and everything was running fine. But about a month ago I got a brand new Lenovo W510, and so I finally got the ‘excuse’ I needed to give virtualization a new chance. Gold

And I can tell you; so far it’s been pure gold!

From what I’ve experienced so far, here are the pros and cons of VHD native boot on Windows 7:

Pros

- Easy to set up new dev environment for testing out new tools, framework, languages or what-have-you. You just need to keep a copy of your ‘base images’ so that you can start fresh from there.

- Easy to get new members of a team up and running. A little disclaimer here as I haven’t actually tried this, but it should only be a matter of running sysprep with the ‘generalize’ option on the virtual machine.

- Backup is just a file copy operation

- Getting up and running on a new physical machine is just a matter of installing Windows 7, edit the boot manager (bcdedit) and copy the VHD-file over to your new machine

- “Avoid” BitLocker; Now this might seem like a strange thing to do, but as a consultant I have two sets of security manuals to confirm to. One for my employer and one for the customer that hires me. Now the security regulation is seldom at level between these, and so I always have to be set up to meet whoever has the highest security bar. Most often that would be my employer.

And as every dev knows; the more layers of security you add to your machine, the longer does your compile take. For some reason I was not equipped with a lot of patience at birth – and I haven’t gotten any since – and so sluggish machines does not suit me well. BitLocker, enterprise anti-virus clients, and other well-intended enterprise security apps, can really suck the life out of any machine. If all you need to do your job is Outlook, Word, Excel and a browser, that would be fine. When you need to compile a 65-project-large solution 400 times a day, it isn’t. And so if I’m working for a client who doesn’t require disk encryption and sluggish anti-virus software, then I’m perfectly fine with that.

Cons

Source of performance numbers- A performance hit; I’ve seen 3-5%, but those numbers apparently came out of Scott Hanselman’s butt (his words, not mine) so take that for what it’s worth (I could make a pun and say that ain’t worth sh**, but I’ll refrain myself from adding that kind of toilet humor to this blog). What I can say, though, is that I can’t tell the difference between running on my virtual and my ‘real’ Windows 7 installation. In a blind-test I don’t think I’d be able to tell them apart.

- Hibernation is not supported on the virtualized machine

- Calculation of the Windows Experience Index (WEI) not supported

For me the pros outweigh the cons. The loss in performance is leveled out by not running BitLocker (which also gives you a 3-5% perf hit). Hibernation is nice, but I can live without it and I still have WEI on the host.

 

My next blog posts will cover how I created my VHDs and got my multi-boot set up.

Sunday, February 14, 2010

Option Explicit On – Commands in CQRS

The idea behind commands in a CQRS architecture is that they should be very explicit and very specific about their intention. You would try to shed away from generic CRUD operations and rather try to capture the essence of what the user is trying to accomplish. Meaning; instead of a ‘one form to save all data’ you would rather let the user explicitly tell what (s)he wants to achieve.

Ok, example. Say you have an application for car dealers. In here you have the possibility to set the price of the car you want to sell. Now, you can either put this as a ‘price field’ among a bunch of other car related data like registration number, brand, model, horsepower, etc, and save it along the rest of the data. Or you can make sure that the changing of a car’s price is a operation of it’s own.

In CQRS you would typically go for the second option. You would make sure that the user’s intent is expressed very explicit by giving this operation it’s own command. That is; instead of putting it inside some big SaveCar method, you make it a method of its own. Something like ChangeCarPrice, or even LowerCarPrice and RaiseCarPrice.

Wouldn’t that be an awful lot of commands, you say? Will you be making a command for every change of value in the application? Hell, no. That would be a lot of commands. And that’s why we don’t do that. We’re making a specific command for ‘car price’ value because the change of this very value is something that has a specific meaning in this domain.

Lowering the price of car is probably an action you need to do because nobody is willing to pay the price you set earlier on. And it can be one of several other marketing actions you can take in order to make the car more saleable. Adding more equipment or freshening up the sales description can be other ‘marketing actions’.

Tracking these specific actions can be very valuable for the car dealer, because having a car taking up space in your warehouse for a longer period of time is bad for profit. Having cars in stock for a minimum amount of time is good for profit. And so tracking which marketing actions that are most effective over time can be very lucrative for a car dealer (or any kind of dealer I guess).

Behave!

Another way of looking at commands is that they should capture the behavior expressed in your domain model. Patterns like Table Module are arguably more focused on data than behavior, which makes them very well suited for systems where complexity is not that high. And contrary for complex domains; Domain Model is more focused on behavior and less data centric.

I would argue that your average Customer Relationship Management system (CRM) or Content Management System (CMS) are examples of systems were data is more important, or rather more valuable, than the behavior of the system. As to all things in life there’s exceptions, but from my own experience the typical CRM and CMS system would make a good fit for a Table Module or Record Set pattern.

Systems built using data centric models are far easier to build and maintain. That is off course until you start having too much logic – too much behavior – sprinkled around the code. In that case you’re probably better off using using something like the Domain Model pattern.

So let’s focus on the Domain Model again, because in a CQRS architecture there will typically be a Domain Model that contains the essential business logic. The core of the business so to speak.

In a sufficiently complex system there will be a lot of behavior and complex rules attached to those behaviors. Let’s take for instance the aforementioned ChangeCarPrice. Larger car dealers can have hundreds of cars for sale, and all cars will have a designated ‘responsible salesman’. Each salesman can have several cars which they are responsible for and they probably will have some kind of bonus arrangement tied to how many cars they sell.

Imagine a scenario where a potential car buyer walks into the shop. Let’s call our potential customer ‘Johnny’.  Johnny has some preferences to what car he want, but for the most part he’s pretty open to which exact car he’ll end up buying. He’s looking for a 4x4 station wagon, preferably black or dark gray, with diesel engine and leather seats. Johnny’s got about $50.000 to spend on the car - which by the way is a mid-priced car here in Norway. (Yes, I know. It’s an expensive country and everything cost more than it should and blah, blah, blah. It’s a whole other story.)

The salesman of this story, let’s call him Bob, doesn’t have any cars that fits within Johnny’s preferences. At least non that appeals enough to make him leave his $50.000 in the shop. Johnny did however spot a BMW at $55.000 that he really liked, but the extra $5.000 is more than Johnny can afford at the moment. And Bob is not willing to let the BMW go for as little as $50.000, so no business is done.

4 weeks go by and the BMW is still in the shop, but now it’s starting to be costly to having it just standing there, and so the price is lower to $50.000. Wouldn’t it be nice if the Bob’s software were smart enough to notify Johnny about this event?

Yes, it would, but building a system that can handle these kind of events is actually very tricky. Having an explicit command that triggers when the car price changes makes it a whole lot easier to add a business rule like ‘notify customer if price drops to or below $50.000’, because you know exactly were to put that behavior.

If you have a system where business logic has been randomly added from the UI all the way down to the database, this will be a lot tougher job to get done.

So what about the CRUD?

I believe you can still have your ‘store these 30 fields to the database’-operations in a domain driven CQRS architecture. You can have your SaveData command. But commands like that, CRUD commands, were you don’t care about anything but persisting the data, will not trigger any behavior in you domain model. They will just persist data into your relational database, file system, blob storage, or whatever medium that holds your data.

Then when new requirements arrive and you need to attach behavior to some of the data in that SaveData command, you will just extract those properties out into their own command and you make that new behavior explicit.

Maybe even all the way from UI and down to the domain model. That way you will capture the user’s intent and you will have means to encapsulate that precious domain knowledge inside your model.

Further Reading

For more background and resources on CQRS you can take a look at my previous post “Growth is optional. Choose wisely.”.

I mentioned the Domain Model, Table Module and Record Set patterns and there’s no better way to learn about these – and other patterns – than to read Martin Fowler’s excellent “Patterns of Enterprise Application Architecture”. A short description of the patterns can be found in the P of EAA Catalog’s "Domain Model", "Table Module" and "Record Set".

I also touched Domain Driven Design a bit, and again; No better source than the source itself. If you haven’t already – go read Eric Evans “Domain Driven Design – Tackling Complexity in the Heart of Software”. Just do it. And you can come back and thank me for the tip afterwards :)

Thursday, February 11, 2010

Growth is optional. Choose Wisely.

Command Query Responsibility Segregation, or CQRS for short, is an architectural pattern based on the idea of Command Query Separation, CQS. It’s a pattern currently advocated by people like Udi Dahan, Greg Young, Mark Nijhof and Pål Fossmo (see below for links and resources). sw_fake_ballot_sa03045

The background for CQRS is a mathematical theorem called the CAP Theorem put forward by Eric Brewer. It states that;

“You can have at most two of these properties for any shared-data system: Consistency, Availability, and tolerance of network Partitions.”

You can only get two out of three, which basically means that you have to choose between scalability and continuous consistent data. CQRS is an architectural approach that let’s you scale out and deliver high availability, but is a bit more relaxed on the consistency. Meaning that Consistency has to step aside for Availability and Scalability.

Wouldn’t inconsistent data be a bad thing and something we would really strive to avoid? Yes, it would – if data were to be permanently inconsistent. But as long as the data eventually becomes consistent, this is no longer such a bad thing.

After all; how long are data in a multiuser application 100% consistent anyway? Think about it; As soon as the data has left the database – or whatever storage you might have – and is heading up to the user’s screen, someone else could have updated or even deleted the records. The data can be inconsistent even before they hit the screen!

Making a clear separation of commands (writes) from queries (reads) in an application gives you the ability to better scale out the parts that turns out to be bottlenecks. In most applications there are far more reads than writes, and so scaling out the read part will for most scenarios give a performance boost.

Now, calling it ‘eventual consistency’ might sound like it will take ‘forever’ before data is consistent, but just as you can scale the command and query parts of the system, you can also scale out the transport mechanism between them.

The transport is typically some kind of queue, for instance MSMQ-based, and so the time before data is consistent is coherent to the speed of the transport. Throw in some more power on the queuing machinery, and you get more up-to-date data.

Further reading

Udi Dahan’s “Clarified CQRS” is a good and thorough intro to CQRS.

More introductory on CQRS and how it relates to DDD by Pål Fossmo here; Command and Query Responsibility Segregation (CQRS).

Greg Young gives some clarifications on CQS vs CQRS in "Command Query Separation?".

For some more practical samples check out Mark Nijhof’s blog post "CQRS à la Greg Young", were he introduces his demo app on CQRS and Event Sourcing.

Jonathan Oliver has a run-through of CQRS vs Active Record vs Traditional Domain Model in "DDDD: Why I Love CQRS"

If you’re in the mood for some more background material on Brewer’s CAP Theorem, Julian Browne has an excellent article called "Brewer's CAP Theorem – The cool aid Amazon and Ebay have been drinking".

And just when you’re all pumped up and high on CQRS; Read the “CQRS: Crack for architecture addicts” by Gary Shutler. It might get you down on the ground again. I might not agree with him, but he makes some valid points.

And of course, for all DDD-related topics; The Yahoo Group for Domain Driven Design. Lot’s of good discussion there – including CQRS.

Tuesday, December 29, 2009

So you think you can Host?

Flickr CC by 2.0

Through out my career I’ve often come across small-sized dev-shops that believe…

a) That being their own Application Service Provider (ASP) and hosting their product on their own servers is cheaper, easier and safer than letting a third-party handle it

b) That any dev with a bit of interest for servers and hardware is capable of filling the roles of a full-fledge developer and an IT Pro

Through out my career I’ve never seen this work out particularly well.

The reason why it never works is seldom the lack of talent for the poor dev who ‘stood closest to the server when the last dev-slash-it-guy left’ (freely quoted from Richard Campbell of DotNetRocks). It’s just that being a IT Pro is just as much of a full-day job as being a professional programmer.

In this crazy world of new technologies, languages, frameworks, tools and methodologies that pops up every five minutes, there’s just NO WAY a poor soul can handle two full-time jobs like that and still be GOOD AT BOTH. Some things just got to suffer.

Being a developer by heart – and by job description – it’s pretty obvious which one of those jobs that will suffer. The problem is that you can probably live with this situation for a while before it really hits you. But be sure; it will hit you.

You can have 99,5% uptime for 3 years in a row. But when that server goes up in flames and the backup system won’t restore your last 3 months worth of data, you’ve ruined your uptime numbers for the next 3 decades.

Being a IT Pro means being pro-active. It’s a constant fight to stay ahead of any troubles. And to be prepared and having fail-over when trouble hits you.

Being a dev-slash-it-guy means you won’t have neither time nor devotion to being pro-active. Instead you’re being post-active; you’re putting out small fires every now and then, but you’re seldom doing much to prevent them from catching on.

If you’re a startup company with most customers on beta-programs and not much paying customers yet, that might be ok. But someday you’ll hopefully find yourself with a nice list of paying customers that depends on that nice little piece of software that you hacked together wrote.

They might not expect your software to be flawless (even though they probably should), but they expect it to be there when they need it. They start demanding uptime guarantees and Service Level Agreements, SLAs (or at least they should demand guarantees and SLAs). And you better take steps to make sure that you can provide a level of expected professionalism when it comes to hosting your own services.

Do you think you can deliver that with a (at most) half-time IT pro? My best guess is ‘probably not’. image

From my experience in the field, here’s some questions you should start asking yourself if you find yourself at this stage;

(Now, here comes a full disclosure up front; I’m definitely no IT Pro myself – and I have no intentions what-so-ever to become one. This list might therefore not be 100%-water-and-bulletproof, but if you find some misjudgments or something you’d like to add to the list, please feel free to correct me or give suggestions in the comments below)

  • How many ports are open and how many services are running and available from the outside on your public server(s)? (The server(s) that hosts your software that is). Do you for instance allow remote desktop connections to your public server(s) to be able to troubleshoot it?
  • What happens if someone from the outside takes control over your public server? Do they get access to your local network and domain as well?
  • How many servers are actually accessible from the outside?
  • Do you have a working Virtual Private Network, VPN, that anyone in your business can use? And if so; Is it secure enough?
  • How many times in the last 6 months have you verified that you can actually restore all the data from your backup device? And how sure are you that you’re actually backing up everything you need? Or put it this way; if your office burns down today, will you have all the necessary data available to do business-as-usual tomorrow?
  • How often do you scan your network for suspicious activities? Are you sure you’re alone on your network?
  • Do you have a wireless network available in your office? If so; what minimum level of security does it demand? Do you have just a pre-shared key which then gives you full access to the domain, or do you have something that is actually secure enough to prevent teenage hackers to access your file servers?

I’m not saying that any 3-5 man shops must hire a full time IT Pro to handle this. This is off course a question of cost. But just like you’re probably out-sourcing accounting to some professional book-keeper, you should also out-source other areas that is just as critical for your business.

imageIf you’re a small- or medium-sized dev-shop, hosting is in my experience always handled better by professional ASPs. And the same goes for securing and managing your IT infrastructure.

Don’t get blinded by your luck so far; sooner or later your luck will run out. Then it will no longer be neither cheaper, easier nor safer to handle hosting and infrastructure by yourself – and there’s nothing you can do about it.