Friday, April 4, 2014

Why jaggies around rasterized lines?

Jaggies

After a PDF document is rasterized, why do we see jaggies around lines?



The answer starts with a bit of theory on the coordinates: In a PDF document these coordinates denotes a positions on a plane using a X and an Y axis. These positions uses floating point numbers which result in the ability to point to any location, not only at the grid lines but also anywhere in between.

However, at the moment that a PDF document is printed or viewed on a screen, it has to be rasterized to something what we call the 'device space'. This device space is a bitmap which consist of discrete pixels, each with its own coordinate. So in this device space case the X and Y coordinates are integers.

So how exactly are the floating point coordinates mapped to the pixels then? 

If we forget about the transformations for a moment, the definition is as follows: "Pixel boundaries always fall on integer coordinates in device space. A pixel is a square region identified by the location of its corner with minimum horizontal and vertical coordinates. The region is half-open, meaning that it includes its lower but not its upper boundaries " In other words: it looks like this:

What happens if we draw a black vertical line with a width of 1 pixel wide positioned exactly at the PDF coordinates (2, 1) to (2,6)?
Then you get a line with a width of 2 pixels with all pixels grayed out, grey instead of black because all pixels are only 50% part of the line. However if you draw a line at exactly halfway each coordinates, you get what you might expect in the first place:
Note that you usually don't want to do this kind of exact positioning, as it is most likely that the PDF coordinates will be scaled by the one or more transformations. When you print an PDF document it will be scaled to A4, letter, or any other format. On screen it will depend on the 'dots per inch' and the size of the window in which it is displayed. So this these exact coordinates are most likely changed in something entirely different in the somewhere in process of viewing. So, just don't use halfway coordinates for this reason.

The jaggies 

The jaggies will become visible if the PDF is rendered to a black and white bitmap, e.g. for a printed page. It is not possible to use gray values then so dithering has to be used instead like:

Why do this dithering instead of moving the line half a pixel?

The question is: Is this dithering really what we want or can we do something better here? it is clear that there is a decision to be made here on what is most important:
  • Either the rasterizer tries to approach the exact position of the lines by means of dithering,
  • or it rounds the position of the line to the nearest pixel. 
Neither of these give perfect results, dithering will look strange if you zoom into the details, and rounding results in lines with variations in location and thickness (as a 2.5 pixel wide line may be rounded to either 2 of 3 pixels). So there is no way to render the following line to a thickness of 2.5 pixels without dithering:

The answer:

The reason that we have chosen in our implementation for these jaggies (is dithering) is that we believe that it results in better readability because:
  • we think that unintended variations in location and thickness are more noticeable than jaggies.
  • the dithering is supposed to be performed at such resolution that the individual pixels are indistinguishable, so you would not see it in normal situations. 

Therefore the answer is: we do dithering as it gives the appearance of sub-pixel precision at the cost of strange pixels, which you will only see if you zoom in too much.

Tuesday, March 11, 2014

Transformations change the world, or everything in it.

 

In most – if not all – graphical systems, it is possible to apply some transformation on graphical elements in order to render them in a certain way. These transformations can often be combined in a particular order to yield a new transformation.



We often notice however, that programmers perceive it as hard to correctly combine these transformations. Graphical transformations are not commutative, so the order of transformations matters. In addition, the actual correct order depends on the way that a graphical systems deals with them. One cannot just always interchange a particular order between various graphical systems, and this is not always explained very well. This typically leads to a lot of trial-and-error before the correct combination of transformations is found.

Below, we will have a look at the graphical transformation systems in PDF, WPF and PDFKit.NET.

Graphical Transformations in PDF

In PDF, as well as in Postscript, is is possible to transform graphical objects via matrix transformations. Here however one does not really transform the objects, but the coordinate system in which they get drawn.

This may seem weird at first sight. Why transform the coordinate system, and not the objects themselves? The main reason for this is that this allows one to draw a group of objects using the same transformation, without having to specify the transformation for each and every object. Within the group, all objects can be drawn relatively to each other as if they are drawn in an untransformed world. And by applying a coordinate system transformation upfront, e.g. a scaling transformation, the entire scene gets scaled.

The order of the transformations is important. The PDF reference manual shows the difference as follows:
image
In general, transformations in PDF should be done in the following order:
  1. Translate
  2. Rotate
  3. Scale or Skew

Transformation order in WPF

WPF also has transformations. MSDN however is not very clear about combined transformations. It suggests that WPF also transforms the coordinate system, and it contains some samples, but there is no discussion what happens exactly. See for example:

http://msdn.microsoft.com/en-us/library/ms750596(v=vs.110).aspx
http://msdn.microsoft.com/en-us/library/ms750975(v=vs.110).aspx

What we see however, is that XAML does not transform the coordinate system. Take for example the following XAML code.
<Window x:Class="WpfApplication1.MainWindow"
      xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
      xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" Height="350" Width="525">
    <Canvas Width="200" Height="200">
        <Rectangle Canvas.Left="0" Canvas.Top="0" Width="200" Height="200" Stroke="Black" Opacity="1.0"/>
        <Rectangle Canvas.Left="0" Canvas.Top="0" Width="50" Height="50" Stroke="RoyalBlue" Opacity="1.0">
          <Rectangle.RenderTransform>
             <TransformGroup>
               <TranslateTransform X="50" Y="75"/>
               <ScaleTransform ScaleX="3"/>
             </TransformGroup>   
          </Rectangle.RenderTransform>
        </Rectangle>
    </Canvas>
</Window>


This has the following effect. Note that the black rectangle has a size of 200 x 200 and that the blue rectangle has been translated much further to the right than 50 units. It has been translated by 150 units to be exact.

image

If however, we reverse the transformations, we get what we expect:

<TransformGroup>
       <ScaleTransform ScaleX="3"/>
       <TranslateTransform X="50" Y="75"/>
</TransformGroup>   
image

What we see here is that WPF applies both transformations in sequence on the object, without changing the coordinate system in which the object resides. In fact, if we want to rotate the object in its current location and without distorting, we will need to do this after scaling an before translating, which is exactly the opposite of the correct order in PDF.

<TransformGroup>
       <ScaleTransform ScaleX="3"/>
       <RotateTransform Angle="-30"/>
       <TranslateTransform X="50" Y="75"/>
</TransformGroup>   
image

If we rotate before scaling, the object gets distorted (skewed) because scaling then takes places on the rotated object along the untransformed, horizontal x-axis:

 image

Whereas, if we rotate at the end, after translating, the object gets placed at a different location because the center of the rotation is still at the original origin of the coordinate system the top left of the black rectangle.

image

This means that the information in MSDN about these transformations is actually incorrect.

There is however a reason to do things differently in WPF.

In PDF we do not really have graphical objects with Transform properties. We only have graphical operations that operate in a certain environment, of which the transformed coordinate system is just one of the aspects. The main reason to do this is compactness. By having an environment with all sorts of graphical context, the operations themselves can be kept simple and short.

In WPF however, we have an object-oriented system where we want all objects to be as independent as possible, and where – consequently – each object has its own transformation, next to properties like color and transparency, instead of these being part of some environment. And in that case it makes sense to not have transformations change the coordinate system but just the object itself.

This also has the advantage that transformations can be combined in a more natural order, and programmers can reason within a fixed coordinate system.

Shape Transformations in PDFKit.NET

In PDFKit.NET one can draw with shapes, and just as in WPF, each shape is a complete graphical element that has its own transformation property. This means that the correct order of doing transformations in PDFKit.NET is similar to WPF:
  1. Scale or Skew
  2. Rotate
  3. Translate
Below we have shown the effect of applying these transformations in that particular order:

Pen pen = new Pen(RgbColor.Black, 1);
FreeHandShape charAsShape = new FreeHandShape { Pen = pen, Brush = null };
charAsShape.Paths.AddRange(Font.TimesRoman.CreatePaths('n', 100));
charAsShape.Transform = new TransformCollection
{  
   new ScaleTransform(3,1),
   new RotateTransform(-30),
   new TranslateTransform(50, 75)
};

Original:
image

Step 1: scaling by a factor 3 in the x direction:
image

Step 2: Rotation by 30 degrees counterclockwise:
image

Step 3: Translation by 50 units in the x direction and 75 units in the y direction:
image

Wednesday, February 19, 2014

Splitting Hairlines

You probably all have seen PDF documents that have very fat graphics when viewed at a low zoom level:

image

And, when zooming in, the fat graphics disappear, and slim down to something more sensible.

image

The effect that you see here is that of a “hairline”. Hairlines can be surprisingly fat at low resolutions but thin at high resolutions. The PDF Reference manual says the following about them:

“A line width of 0 shall denote the thinnest line that can be rendered at device resolution: 1 device pixel wide. However, some devices cannot reproduce 1-pixel lines, and on high-resolution devices, they are nearly invisible. Since the results of rendering such zero-width lines are device-dependent, they should not be used.”

The reality is however that since the PDF specification allows this, documents with these hairlines do exist, and this leads to the effects seen above.

Luckily, this effect is not really that problematic in practice, because the rendering results are pretty much the same for all viewers.

Empty Rectangles

There is another related effect, which is more subtle, and a bit trickier because the PDF reference does not precisely say what to do in this case: rectangles with a zero width or height that are “filled” with a color.

At first sight, one would assume that these should not be painted at all. It seems logical. GDI in fact does not render anything for such zero-width fills. Adobe Reader however shows these rectangles as hairlines. The closest hint we get from the PDF Reference manual is the following information about the fill operator “f”:

“If a subpath is degenerate (consists of a single-point closed path or of two or more points at the same coordinates), f shall paint the single device pixel lying under that point; the result is device-dependent and not generally useful.”

Extending this reasoning to rectangles with a zero width, this seems to imply that indeed, filling these should result in a hairline. And in fact, we have encountered several PDF documents that confirm this.

PDFRasterizer.NET

But now take a look at the following graphics. Adobe reader renders them as follows:

image

Our software (PDFRasterizer.NET) however produces this:

image

Is this wrong? Well apparently. But what is happening here? Closer inspection reveals that this area is covered with zero-width rectangles that are all filled with black. The image below has been taken from the same area, but rendered at a higher resolution by PDFRasterizer.NET.

image

To make things more complex, this area is also covered with very narrow (non empty!) rectangles filled with white. The black and white rectangles are painted alternately.

Logically, the zero width black rectangles should not be visible, because the white rectangles are wider and painted over them. But officially - as these zero-width black rectangles must be painted as hairlines -, they will rendered “thicker” than the white rectangles at low resolutions, and thus become visible. That is exactly what you see in PDFRasterizer.NET.

Note that strictly, one can never render anything thinner than a pixel. What happens normally, is that elements smaller than a pixel get interpolated with their background pixel with a factor that reflects how much they cover that background pixel. At low resolutions however, the black hairlines cause the background pixels to become entirely black, while the thin white rectangles only cover that background for a small fraction, leading to a largely dark-gray appearance.

The question for us here is not so much why our software renders these lines at low resolutions, but rather why Adobe Reader does not. The PDF Reference manual does not give any clue. It could be that Adobe does not take into account hairlines when interpolating new pixels with the existing background. This in itself would make sense, as these hairlines do not really constitute any real area. Unfortunately, if that is the case here, we will probably not be able to solve this in GDI+, as it does not offer us that kind of control.

The good news is however, that these graphical constructs are very exceptional, and there is a workaround by rendering at higher resolutions.

And essentially, this kind of problem is exactly why Adobe indicates that hairlines should not be used.

Tuesday, February 11, 2014

Securing PDF Documents with Passwords

PDF files can be secured with a user password and an owner password. In a way these passwords are very related, as they both control the level of access that you have on a document. But there are some differences, and we sometimes get questions about these, in particular what it means for encryption.

In principle, the use of these passwords is rather straightforward:
  • The user password is needed to access a document at all. If a user password has been specified, you will not be able to open the document without providing this password. So typically, you will set such a password if you intend a document to be read by one specific person, or group, that knows the password.
  • The owner password controls the document restrictions. You will find these restrictions if you open the Document Properties of a document in Adobe Reader (ctrl-D) and select the Security tab. Amongst others it is possible to restrict printing of the document, or copying of content. Adobe Acrobat will not allow you to change the document restrictions without providing the owner password. The owner password however, does not restrict opening or viewing of a document.
image
In real life, we hardly encounter documents with a user password. PDF is mostly used for distributing information and the least that you normally want people to be able to do is to open a document and view it. So if you encounter a secured document, chances are that is has only specified an owner password.

Note that if a document has a user password, you will not be able to ignore this, because all readers will prompt you for it upon opening:

image

Encryption

As soon as one sets one of these password, the PDF document will be encrypted. This sometimes gives rise to confusion, because there is software that is able to remove the document restrictions -controlled by the owner password – without providing this password. PDFKit.NET is able to do this, as well as some other software packages.

Adobe Acrobat on the other hand does not allow you to do his. It will always require you to provide the owner password in order to change the permissions.
There are 2 aspects that need to be considered here.
  • Technically, there is an important difference between the role of the user password and the role of the owner password during encryption. In essence, only the user password plays a role in computing the encryption key. Without the correct user password the PDF file cannot be decrypted, and thus it cannot be viewed at all. If only an owner password is provided however, the file will still get encrypted, but in that case the encryption key will be computed from an number of other properties of the file itself, so in essence the encryption key is no secret then. This means that if no user password has been specified, anyone can decrypt the file, which is effectively as good as no encryption at all.
  • The owner password is meant as a lock on the document restrictions, but as it plays no role in encryption this is merely a convention. This comes as no surprise: if the owner password had played a role in encryption, it would have become impossible to open the document without providing it. So technically, there really is no way to secure the document restrictions better than by convention.
Users that set the document restrictions, need to be aware that technically there is no obstruction against removing these without providing the owner password, although many software applications will restrict this by convention.

TallComponents software

One of the questions you may have now, is why our software does not play by these conventions? The answer is actually quite simple: we do not provide end-user applications, but merely tools for building them, and there is no way that we can effectively enforce these conventions.

Take PDFRasterizer.NET for example. It provides methods for drawing PDF pages to a System.Drawing.Graphics instance. There is no way however that our software is able to control how the result will be used. It could be used for viewing, but also for printing. So if some printing restrictions are set in the document, it is up to the final application to deal with this. Our software cannot control this.

For PDFKit.NET the situation is similar, although less apparent at first sight. PDFKit.Net allows you to remove the Security settings of a document without providing the owner password. We could have made an API that only allows this when a correct owner password gets passed. This however would only seemingly have solved the issue. For even with the security settings intact it would still have been possible to create an unsecured copy of this document using all sorts of other API calls in PDFKit.NET. API calls that also can be used to do things that are allowed for a secured document.

This means that dealing with these document restrictions is not something that our software components can, or should enforce. It is up to the application developer to make sure that these conventions are met. And it is up to the creators of these PDF documents to be aware that the document restrictions cannot be enforced technically.

Tuesday, February 4, 2014

PDF files do not have layers like an onion.

PDF documents have the ability to place particular graphics in layers. When they do, it becomes possible to select certain layers for viewing in a viewer. This feature is often very useful for complex drawings so that users can restrict the view to the parts that they are interested in. The image below shows a typical example.

image

Layers in PDF

So how do layers work in PDF?

Despite their name, layers are not actually layers. Adobe calls them that way, but internally layers are optional content. The contents of a single layer are not stored in one collection at a particular z-order in the file. Instead, the content of a single layer may be located at various places in the file. What is shown on top is what happens to be drawn last. Temporal order defines the z-order. This is the case for all graphics in a PDF file, and layers are no exception. This means that one part of a layer may be drawn on top, while another part of the same layer is drawn at the back. The following illustration shows this effect.


There are two structures in the PDF format related to layers:
  • One defines a name for each layer.
  • One associates particular graphics with a layer using the name above.

PDFKit.NET

PDFKit.NET has support for PDF layers as well.  One of our most basic design principles is that one should not try to abstract away too much from the original format. Certain abstractions may look wise in the beginning, but at some point they will start to hinder you. And of course, one needs to balance this. There is no golden rule.

We have chosen to have the layer support in PDFKit.NET closely follow the way that layers are present in PDF as described above. There are two places where you will encounter them in PDFKit.NET.
  • TallComponents.PDF.Shapes.LayerShape: This is basically a collection that you can put shapes in. These shapes then become part of that layer.
  • TallComponents.PDF.Document.Layers: This is a collection of layers that are present in the document. In essence these are layer names.
The Layer collection in the document defines the names, whereas the LayerShape defines the association. This also means that there can be multiple LayerShapes that are associated with the same Layer.

The following code shows how to use PDFKit.NET to put graphics in a particular layer:

// create a Drawing and make it occupy all space within the margins of the page

ShapeCollection shapes = new ShapeCollection( 100, 100, 400, 400 );

page.Overlay.Add( shapes );



// Create layers

Layer gridLayer = new Layer("Grid");

LayerShape gridLayerShape = new LayerShape(gridLayer);

document.Layers.Add(gridLayer);



Layer graphicsLayer = new Layer("Graphics");

LayerShape graphicsLayerShape = new LayerShape(graphicsLayer);

document.Layers.Add(graphicsLayer);



shapes.Add(gridLayerShape);

shapes.Add(graphicsLayerShape);



// coordinate (0, 0) lies at the bottom left of the shapes.



// draw a grid of 25 x 25 points

Pen gridPen = new Pen( System.Drawing.Color.LightGray, 1 );

for( double x = 0; x < shapes.Width; x += 25 )

{

for( double y = 0; y < shapes.Height; y += 25 )

{

// Draw the lines

gridLayerShape.Add(new LineShape(0, y, shapes.Width, y, gridPen));

gridLayerShape.Add(new LineShape(x, 0, x, shapes.Height, gridPen));



// Add axis numbering

gridLayerShape.Add(new TextShape(0, y, y.ToString(), Font.Helvetica, 8));

gridLayerShape.Add(new TextShape(x, 0, x.ToString(), Font.Helvetica, 8));

}

}



// add a line

graphicsLayerShape.Add(new LineShape(25, 25, 75, 50));



// add an ellipse

graphicsLayerShape.Add(new EllipseShape(150, 50, 50, 25));



// add a rectangle

graphicsLayerShape.Add(new RectangleShape(250, 25, 75, 50, new Pen(System.Drawing.Color.Black), new SolidBrush(System.Drawing.Color.Yellow)));


Removing a Layer

A consequence of this design decision is that it is relatively difficult to remove layers. If you remove a Layer from Document.Layers, all you do is basically removing a layer name. The Graphics that were associated with that name will then simply become disconnected from that layer and become part of the “ordinary” graphics of the document. And this means that you will no longer be able to make then invisible.

If you do want to remove a layer from a page, you will have to call Page.CreateShapes() for that page, and remove all the LayerShapes that are associated with that layer name.

So there you have it: PDF documents are not like Ogres.

Thursday, January 30, 2014

How to display PDF in a WPF app and stay responsive

Rendering a PDF page may take long (>100 ms). In order to keep the UI responsive, it should not be performed on the UI thread.

BackgroundWorker

At first glance WPF seems to have a nice solution for this: The BackgroundWorker is the recommended way to run time-consuming tasks on a separate, dedicated thread, leaving the UI responsive. A background worker is created as follows:
BackgroundWorker worker = new BackgroundWorker();
worker.DoWork += new DoWorkEventHandler(doWork);
worker.RunWorkerAsync();
Event handler doWork would do the actual rendering to WPF graphics. It turns out that it is not possible to create even a simple rectangle from the worker thread, let alone all possible elements of a PDF page. If we use the following doWork event handler:
void doWork(object sender, DoWorkEventArgs e)
{
  Rectangle rectangle = new Rectangle();
}
An InvalidOperationException is thrown: "The calling thread must be STA, because many UI components require this." It is not possible to change the background worker thread to STA. It uses a threadpool thread and these are always MTA and cannot be changed.

STA Thread

So let's create our own background thread and set the apartment state to STA and render the PDF pages to WPF graphical element from this.
Thread thread = new Thread(new ThreadStart(doWork));
thread.SetApartmentState(ApartmentState.STA);
thread.Start();

private void doWork()
{
  Rectangle rectangle = new Rectangle();
}
This code runs nicely. But now we need to display our rectangle in the UI.

Dispatcher

We need to pass the result to the UI thread. This is what the dispatcher is designed to do. Each thread in a WPF application has a dispatcher which queues a piece of code along with some data to the user interface thread. The user interface thread will schedule and run the code, which uses the data to update the screen.
private void doWork()
{
  Rectangle rectangle = new Rectangle();

  this.Dispatcher.Invoke( (Action) delegate ()
  {
    drawResult(rectangle);
  });
}

private void drawResult(Rectangle rectangle)
{
  canvas1.Children.Add(rectangle);
}
Now we get the exception "The calling thread cannot access this object because a different thread owns it". What you see here is a protection against a typical multi-threading problem: one thread can change data at the same time that another thread is reading it, which may give unpredictable results.
The usual locking using semaphores could be useful here but apparently this is not the WPF way of working. Instead a "freeze" must be performed to make objects unchangeable. The user interface thread will know then that a graphical element is not going to change anymore and can draw it safely on the screen. Unfortunately, not all graphical objects are freezable. E.g. a Brush is, but a Rectangle isn't.

Take an XPS or XAML detour

We could take a detour: the background thread creates visual elements and converts these to XPS which the user interface thread can read. It would have been convenient to be able to write XPS to a memory stream but this is not possible; Microsoft insists that XPS is written to a file first.
A bit less awkward is to use XAML instead of XPS which can use memory stream. Here is the code:
public void StartTextAndRectangleDrawingThread()
{
  var t = new Thread
        (new ThreadStart(this.TextAndRectanglesCreatingThread))
        { 
           IsBackground = true 
        };
  t.SetApartmentState(ApartmentState.STA);
  t.Start();
}

private void TextAndRectanglesCreatingThread()
{
  var canvas = new Canvas { Width = 300, Height = 300 };
  var text = new TextBlock { Text = "Hello", FontSize = 32 };
  canvas.Children.Add(text);
  Canvas.SetLeft(text, 100);
  Canvas.SetTop(text, 100);
  var brush = new SolidColorBrush(Color.FromRgb(200, 20, 50));
  var rectangle = new System.Windows.Shapes.Rectangle 
        { 
          Width = 40, 
          Height = 50, 
          Fill = brush, 
        };
  canvas.Children.Add(rectangle);
  Canvas.SetLeft(rectangle, 50);
  Canvas.SetTop(rectangle, 50);

  var stream = new MemoryStream();
  System.Windows.Markup.XamlWriter.Save(canvas, stream);
  Application.Current.Dispatcher.BeginInvoke(
        DispatcherPriority.Normal,
        (Action<MemoryStream>)
                        this.TextAndRectangleDrawingThreadHasData, 
                        stream);}

void TextAndRectangleDrawingThreadHasData(MemoryStream stream)
{
  stream.Seek(0, SeekOrigin.Begin);
  var textAndRextangle = 
       (Canvas)System.Windows.Markup.XamlReader.Load(stream);
  this.PdfLayer.Children.Add(textAndRextangle);
}
For code that draws some text and rectangles it works fine, however, converting a PDF first to WPF and then to XAML gives many problems which may be solvable, but ends up in a long and inefficient chain of conversions. Especially in the context of PDF documents with large amounts of complex content this is not the way to go.

Solution: Render to a bitmap in the background

The best solution that we found was to move as much as possible to the background thread, including the rendering to a bitmap for display on screen. We use RenderTargetBitmap for this purpose (which is "freezable").
The order of events is now:
  1. The user interface thread starts a background thread with a file name and the dimensions of the bitmap.
  2. This background thread opens a PDF document and:
    - parses its content,
    - performs TALLcomponents -> ConvertToWpf,
    creates a transformation matrix for scaling the PDF page into the bitmap,
    render it into a bitmap using WPF -> RenderTargetBitmap -> Render,
    Freeze the bitmap.
  3. The background thread dispatches a piece of code and the frozen bitmap to the user interface thread.
  4. This code is scheduled in the user interface thread and it draws the bitmap on a canvas.
A simple sample app is made that renders a PDF document while user interface handles the drawing on a ink canvas, which works smoothly whatever the size and complexity of the PDF document.

The code for the background thread is:
// Create bitmap
private void OpenPdf()
{
  var openFileDialog = new Microsoft.Win32.OpenFileDialog
      {
        DefaultExt = ".pdf", 
        Filter = "PDF files (*.pdf)|*.pdf|All files (*.*)|*.*"
      };
  bool? fileOpenResult = openFileDialog.ShowDialog();
  if (fileOpenResult == true)
  {
     _threadThatRendersThePdf = new Thread(() => RenderPdfDocument(
                       pdfFileName: openFileDialog.FileName,
                       imageWidth: this.CanvasWidth,
                       imageHeight: this.CanvasHeight));
     _threadThatRendersThePdf.SetApartmentState(ApartmentState.STA);
     _threadThatRendersThePdf.Start();
  }
}

void RenderPdfDocument(string pdfFileName, 
                       double imageWidth, 
                       double imageHeight)
{
  const double WpfDpi = 96.00; // WPF measures in 96 DPI
  const double PdfDpi = 72.00; // PDF measures in 72 DPI
  using (var pdfStream = new FileStream(pdfFileName, 
                       FileMode.Open, 
                       FileAccess.Read))
  {
    var pdfDocument = new TallComponents.PDF.Rasterizer.Document(pdfStream);
    var pdfPage = pdfDocument.Pages[0];

    // Get the scale to let it fit into the image size,
    // WPF measures things in 96th of an inch, not in pixels
    // PDF width and height are measured in points
    double scaleX = (imageWidth / WpfDpi) / (pdfPage.Width / PdfDpi);
    double scaleY = (imageHeight / WpfDpi) / (pdfPage.Height / PdfDpi);
    double scale = Math.Min(scaleX, scaleY);

    // Resize the bitmap area so that is has the same width/height ratio 
    // as the PDF page, this possibly makes either the width or the height
    // of the bitmap smaller. 
    // The rendered PDF will fit exactly in the image this way.
    double ratioPdf = pdfPage.Height / pdfPage.Width;
    double ratioImage = imageHeight / imageWidth;
    double bitmapWidth = ratioPdf > ratioImage 
                       ? imageHeight / ratioPdf 
                       : imageWidth;
    double bitmapHeight = ratioPdf > ratioImage 
                       ? imageHeight 
                       : imageHeight * ratioPdf;

    // render the page to WPF into the resized image
    var renderSettings = new RenderSettings();
    var convertToWpfOptions = new ConvertToWpfOptions();
    var summary = new TallComponents.PDF.Rasterizer.Diagnostics.Summary();
    convertToWpfOptions.ConvertToImages = false;
    var wpfPage = pdfPage.ConvertToWpf(renderSettings, 
                       convertToWpfOptions, 
                       summary);

    // make a bitmap renderer for the WPF page (but dont render it yet)
    // and wrap a container around the wpfPage to that we can scale it 
    // to dimensions of the bitmap then let WPF render it and finally 
    // freeze it so it can be passed to the UI thread.
    var wpfPageAsBitmap = 
                   new System.Windows.Media.Imaging.RenderTargetBitmap(
                       pixelWidth: (int)bitmapWidth,
                       pixelHeight: (int)bitmapHeight,
                       dpiX: WpfDpi,
                       dpiY: WpfDpi,
                 pixelFormat: System.Windows.Media.PixelFormats.Default);
    var container = new System.Windows.Media.ContainerVisual
         { Transform = new ScaleTransform(scale, scale) };
    container.Children.Add(wpfPage);
    wpfPageAsBitmap.Render(container);
    wpfPageAsBitmap.Freeze();

    // Transport the frozen bitmap of the page to the UI Tread 
    BitmapThreadDelegate delegateToDrawInUiThread = BitmapUIThreadTask;
    Application.Current.Dispatcher.BeginInvoke(
                       DispatcherPriority.Background, 
                       delegateToDrawInUiThread, 
                       wpfPageAsBitmap);
  }
}

public void BitmapUIThreadTask(
               System.Windows.Media.Imaging.RenderTargetBitmap bitmap)
{
  var image = new Image() { Source = bitmap, Stretch = Stretch.None };
  this.PdfLayer.Children.Add(image);
}

private delegate void BitmapThreadDelegate(
              System.Windows.Media.Imaging.RenderTargetBitmap bitmap);

private Thread _threadThatRendersThePdf;

The canvas that contains the PDF may be placed below a a ink panel so that a user can draw on top of a PDF page:
<inkcanvas 
  height="350" 
  horizontalalignment="Left" 
  margin="10,10,0,0" 
  name="ViewboxInkAndPDF" 
  strokes="{Binding Path=InkStrokes}" 
  verticalalignment="Top" width="300">
  <canvas>
    <contentpresenter content="{Binding PdfLayer}">
  </contentpresenter></canvas>
</inkcanvas>

Tuesday, January 21, 2014

.NET’s Platform Independence

One of the reasons for choosing .NET as a development platform has always been platform independence. In many ways, .NET indeed offers this. But there are some issues and the following article shows one of them.

A few days ago, one of our customers submitted an issue for PDFRasterizer.NET. A particular document rendered alright on Windows XP, but not on Windows 7. This is what we got on Windows 7:
rose1
While on good old XP:
rose2
Our software however, has no notion whatsoever about these different systems. It just renders to whatever Graphics instance .Net offers us. So this had to be an issue in .NET itself, and one that only manifests itself on particular Windows platforms.

When one of our developers picked this up, I asked him to focus on the newer Windows systems. Windows XP is quickly becoming obsolete now Microsoft is going to stop supporting it. If a fix was going to break rendering on Windows XP, then so be it.

It turned out that the problem was related to indexed images on 64-bit systems, If there are semi-transparent colors in the color table of an indexed image, these are sometimes rendered as an opaque color. Often white or some other basic color. Interestingly, the problem only occurred if the alpha value of a color was between zero and 240. We tried to nail this down further, but we were unable to discover any particular logic in it.

Now, this looks fairly nasty, but the solution to this issue was simple. If we encounter indexed images with these transparent colors, we simply convert it to a non-indexed variant. This is somewhat less efficient, because the entire image will need to be converted. Luckily, this situation does not happen all that often, so it is hardly noticeable in the grand scheme of things.

And for what it is worth, it also works on XP.