Wednesday, June 12, 2013

.Net's Built-In JPEG Encoder: Convenient and Terrible

Those coding in .Net may not have discovered the System.Drawing namespace, which lets you load up an image in any popular web format (gif, jpg, png) without writing any extra code, manipulate it in as elaborate a way as you'd like, and save it back out to any of those web formats.

If you have discovered this you may also have noticed that the jpg codec is just terrible. For my task I was loading up a picture and drawing something geometric on it - the picture being the motivation to save it out as a jpeg. But for these examples I'll just use a blue background and a yellow box. I realize something that plain and geometric is terrible on jpeg (png would be preferred), but if it were a picture in the background png would be a poor choice, and the jpeg results from this bad codec would be no better.

This is a development blog so if you're wondering how to draw a yellow rectangle on a blue background, here's the code:

using System.Drawing;
using System.Drawing.Imaging;

...

var image = new Bitmap(400, 400);
var g = Graphics.FromImage(image);
g.FillRectangle(new SolidBrush(Color.FromArgb(0xa2, 0xbf, 0xdf)), 0, 0, 400, 400);
g.DrawRectangle(new Pen(Color.FromArgb(0xf2, 0x9a, 0x02), 19), 40, 40, 100, 100);

g.Flush();
var jpegCodec = ImageCodecInfo.GetImageEncoders().First(enc => enc.FormatID == ImageFormat.Jpeg.Guid);
var jpegParams = new EncoderParameters(1);
jpegParams.Param = new[] { new EncoderParameter(Encoder.Quality, 100L) };
image.Save(App.AppRoot + @"\test.jpg", jpegCodec, jpegParams);

App.AppRoot is a class I include in all my apps - how you decide where to write files is up to you, but you can determine the project root here.

Note: Blogger isn't a very good blogging platform, and amongst its flaws is the fact it reencodes images and adds an annoying white border to them, as you'll see in the images below. Each will also be linked to the actual output file, which I've uploaded separately outside of Blogger. You can also see a lossless PNG with all 6 here.

.Net 4.5 Windows, JPEG 60% quality

Terrible, so let's get rid of all the compression artifacts by turning quality up to 100%.

.Net 4.5 Windows, JPEG 100% quality

Still pretty bad, and really not acceptable for "100% quality" given that's the max you can possibly ask for. You can easily see the edges of the box are still blurry, and there are still noticeable artifacts inside the box itself. Were we using a photo background instead, artifacts might be less noticeable in the photo - but the photo would likely worsen the artifacts in the box itself. This would be fine if this were the 60% or 80% setting, but 100% should sacrifice file size to get you as close to the original image as possible.

But perhaps the problem is the JPEG format itself. Here's Photoshop:

Photoshop CS5 60%

Photoshop CS5 100%

So clearly the JPEG standard is not the issue - you can render a standard JPEG that's nearly identical to the original. It's also worth noting that Photoshop's 100% comes out at 6.3k, while .Net's comes out at 7.3k, despite the quality disparity.

But that's closed source and it's clear Adobe's invested heavily in their JPEG encoder. How about an open source JPEG encoder with a rag tag bunch of open source coders working on it?

Gimp 2.8 60%


Gimp 2.8 100%

.Net's default JPEG encoder is so bad that Gimp's 60% effort is about equivalent to the top quality level .Net can deliver. If Microsoft were to just use Gimp's codec as-is (available cross-platform including on Windows), that would be a major step up in quality.

I experimented with other encoder implementations; ArpanJpegEncoder is an OK reference project, but as quality goes it's barely better than the built-in .Net encoder. LibJpeg, one of the most popular encoders on the Linux side, has been partially converted to pure C# and is available for free at BitMiracle. Its output is substantially better, and closer to the Gimp output above. Unsurprising given Gimp appears to use the C++ version of LibJpeg. However, BitMiracle's high-level API forces using LowDetail Chroma Subsampling. I forked the code to add support for HighDetail at the top JpegImage level.

By far the best implementation I've been able to get working is unfortunately not C# at all - Magick.Net. You can install it via NuGet pretty easily - you want the AnyCPU q8 version if you feel unsure. And, it requires not just one but 2 special installs on any server you use it on, both from the Visual Studio SDKs - see the docs. Because it's just a wrapper around the famous ImageMagick C++ project you likely won't be making many improvements yourself, but it is able to deliver relatively high-quality, low-filesize images (Photoshop still beats it though, more for some images than others for some reason).

Thursday, June 6, 2013

Getting Started with Google Closure on Windows - 2013

Google Closure Library, Buried Away

Google has a library of highly tested, highly performant code called Closure that's been open source for several years now. I don't see a lot of examples of people using it, and I think that's in part because the documentation is pretty lacking, and it's all done in its own Google-internal sort of way that makes it pretty hard to get started with or blend with other builds. Here's a walkthrough for getting started with it on Windows.

In this example I'm going to get a build of the WYSIWYG editor compiled out, but it should be easy enough to follow the same steps for any major feature of the Closure library. I want to take that minified script I compile rarely (maybe just the once), pull it into a .Net project and use it with jQuery and other more common tools. This might sound like overkill - afterall a big foundational library like jQuery is going to overlap a lot of functionality with the sprawling Google Closure library. But one nicety of Google's Closure approach is that it does dead code removal - that is, if you properly tell it exactly what parts you intend to use, it eliminates everything else in the minification process leaving you just the narrow slice of the library you wanted, even if that's made up of bits across 100s of files.

"If you properly tell it" turns out to be a tall order though, as we'll see below. Let's start with getting a basic build.

Environment

You'll need Java installed (the basic JRE download on the homepage is fine - you aren't writing any Java yourself so the JDK isn't necessary, though it won't hurt), and Python 2.x - not Python 3.x or above. As of this writing, I cannot find anywhere in the documentation that specifies this, but Python 3.x will fail with obscure UTF8 Decode errors when you attempt to build. And yes, you do need Python to get a Closure build going. I'll also recommend that you put them someplace that ignores Windows/Java issues with spaces that's convenient to type - for example I put mine at C:\System\Java\jre\1.6 and C:\System\Python\27.

(Side note: One alternative you might find is to use the Google Page Speed tool which minifies scripts on a page for you. This alternative doesn't really do the same thing - it makes many individual minified files for you instead of one minified package.)

If you don't have Git installed, you'll need to install it - for example TortoiseGit.

Next you'll want to make a folder where you want to put this all together, and check out the closure lib to a subfolder - I called mine goog. The remote origin should point to:

https://code.google.com/p/closure-library/

So at this point you've got a workspace folder with just one subfolder, goog, that has the entire Closure library in it including some of the build tools.

Next you'll need a build of the compiler itself, which I happened to put in a subfolder named compiler.

You now have all the tools you need.

Tell It What You Want To Use

The next step is to tell the compiler what parts you want to use, so it knows what to minify and what to leave out as dead code. You'll probably also want to be able to test this works both minified and not, so we'll cover building a simple test page.

In your workspace folder, make 3 files: index.html, script.js, and externs.js. None of these will end up in your final build so don't worry about the stupid names.

In script.js, your goal is to declare the parts of the library you need. You may want to start by just messing around in index.html to test and see what methods you need to call to get your work done. Here's the source I ended up with for index.html and script.js, which I'll walk you through:

index.html:

<!DOCTYPE html>
<html>
<head>
<title>Basic Editor</title>
<style>
body {
font: 10pt Verdana;
}

#editor {
padding: 5px;
border: 1px solid #000;
}
</style>
</head>
<body>

<p>Edit below.</p>

<div id=editor>Some sample text here.</div>

<p>Edit above.</p>

<script src="goog/closure/goog/base.js"></script>
<script>
goog.require('goog.editor.SeamlessField');
</script>
<script src="script.js"></script>
<!--<script src="editor.min.js"></script>-->
<script>
(new Editor('editor')).makeEditable();
</script>
</body>
</html>

script.js:

goog.provide('Editor');

goog.require('goog.editor.SeamlessField');

window['Editor'] = goog.editor.SeamlessField;
goog.editor.SeamlessField.prototype['makeEditable'] = goog.editor.SeamlessField.prototype.makeEditable;

So let's walk through what's going on. In the HTML file, I start by including base.js - you need to include this to get all the goog.whatever calls to work. I then open a script tag and call goog.require on the one class I intend to use on this page. That call implicitly calls a lot of document.write() calls for all the dependencies that script has, and finally a document.write() call for the SeamlessField file itself. All these calls to document.write() are why the calls to goog.require() must go in their own script tag, with calls depending on those scripts places in a separate, following script tag. Don't worry about all this inefficient lazy loading - it won't be part of your final build.

I then include script.js, and finally, I run some test code that mimics in a very basic way how I intend to use the minified results. Make sure to call all the Closure methods you'll be using on your actual site to verify they export correctly.

So now script.js, the file that's technically going to be minified by our build. In this example it's a pretty spartan file that's just telling the compiler what to do, but you can write actual code here as well and it will work fine (and be minified).

The call to goog.provide() is a requirement of the compiler - you have to provide at least one class. In this case we're renaming goog.editor.SeamlessField to Editor to get rid of the namespace for easier minification - if you don't you'll have to jump through further hoops to export the entire namespace, which I don't recommend doing.

The call to goog.require() you may notice is redundant - it's called both on the page and in script.js. This is harmless when debugging - the second call is smart enough to just return after taking no action. In the minified script this is necessary for the compiler to know what dependency tree to search during minification.

If you had multiple things to provide or require, you would just call them again - always provide calls first, then require calls.

The following 2 lines are called exports in Closure terminology. You're basically abusing the fact that window['Editor'] is for all intents and purposes the same thing as window.Editor, or simply, Editor, in Javascript - window['Editor'] minifies out as simply Editor and guarantees that function by that exact name is available to external code.

Likewise the second line is abusing the fact that goog.editor.SeamlessField.prototype['makeEditable'] is the same thing as goog.editor.SeamlessField.prototype.makeEditable, and causes the compiler to guarantee that the method won't be renamed so you can reliably call it from outside code (or, it may be renamed then exported by assigning it to a method of the same name at the end, if that's more efficient).

Note that you can't use this shorter declaration to export a method:

Editor.prototype['makeEditable'] = Editor.prototype.makeEditable;

Nor this:

window['Editor'].prototype['makeEditable'] = window['Editor'].prototype.makeEditable;

The first will fail to minify correctly (the method is lost), and the second will minify inefficiently - it will work but use up more bytes than necessary.

Make It Work With JQuery

Before moving on to compiling, there's one last detail to sort out - the compiler is going to use whatever single-character names JS allows to rename everything in the local and global namespace. One of those is $ sign, which if you intend to use this with jQuery is a problem - since you very likely already have a lot of code that assumes $ refers to jQuery. You can prevent the minifier from using symbols like this via externs.

externs.js:

function $(selector, context) {};

Simple as that - you're not actually building jQuery here, just telling Closure there's an outside function with that name, so exclude it from the eligible list that global variables can be renamed to. It technically doesn't even need to be this much - you could even just do this:

$ = {};

But there are scenarios where you actually call these externs from your code, at which point you may want the compiler to help you verify you're calling it correctly - thus the function and arguments.

Compiling

The compile command is not going to be short, so I recommend you actually work with it in a .bat file to keep your life simple. That way as you rework it you have a lot less to retype each time.

build.bat:

python goog\closure\bin\build\closurebuilder.py -c compiler\compiler.jar --root=.\ --namespace="Editor" -o compiled -f "--externs=externs.js" -f "--compilation_level=ADVANCED_OPTIMIZATIONS" --output_file=editor.min.js

So here it is all coming together. This calls the closurebuilder script via Python, which searches all the --root arguments you specify for dependencies. Only the js files in these folders are considered (HTML and CSS files are fine for testing but are ignored so far as the compilation is concerned - this isn't full-page minification). The namespace argument tells the script which of these files provides something you actually want to keep - the code in all other files is considered eligible for dead code removal. The -f arguments are flags you want to pass directly to the compiler - the most important being ADVANCED_OPTIMIZATIONS to ensure you get the dead code removal benefit. This Python script builds the dependencies tree then calls the compiler in Java with the (lengthy) arguments necessary to get exactly what you wanted.

It may help to know that if you Shift + right-click a folder, "Open command window here" is one of the options - you can then run build.bat easily from that window.

Here's a link to the minified version I ended up with.

Testing

To verify the minified result is what you want, just comment out all the scripts in the HTML page and uncomment the minified one:

<!--<script src="goog/closure/goog/base.js"></script>
<script>
goog.require('goog.editor.SeamlessField');
</script>
<script src="script.js"></script>-->
<script src="editor.min.js"></script>
<script>
(new Editor('editor')).makeEditable();
</script>
</body>

Run it and there you go, a minified slice of Google's Closure Library you can take and use in other projects with other libraries and even other minifiers.

My minified results were just 11k, as opposed to the 17.4mb the entire library adds up to unminified.