Tuesday, June 3, 2014

Printing PDF documents in WPF with a preview

Printing PDF documents in WPF applications

It is easy to print PDF documents in a application using the standard 'PrintDialog' in .NET. How to do so is described in this knowledge base article. However, the dialog being used is the standard one and is rather limited in it's functionally. It does not give you a preview either on how the document will look like when its printed and some common print options are missing.

Printing with a preview

What you may want instead is a preview of how the document will be printed on paper that is larger than the PDF document, like:


Or you may want to see what part will be printed on smaller paper, like:

Or, what will the output look like when the document is rotated to landscape and vertically centered and with automatic scaling to the printable area on the paper:


Extra print options

The preview is also useful when you want to print only one or a few pages of the document, like this:
Anyway, the extra print options like duplex printing, like shown above, should always be available to reduce the usage of paper sheets.

The WPF code sample

The code sample 'ScaledPrintingWPF' in PDFrasterizer.NET shows an implementation of the features above. It looks like this:




It is implemented using the following MVVM architecture:

The XAML view
A custom print dialog as shown above is implemented in XAML and has a menu to open a PDF file. It uses bindings for all printers settings and the preview box. It also has the usual code-behind part which contains only a small amount of code.

The view controller
The bindings connects the dialog to the view controller, which has the user interface logic and interfaces to the print model. 

The print model
The print model is where is starts to get more interesting, here is a list of what is does:
  • it gets a list of available printers and sets the selected one.
  • it gets a list of available paper formats and set the selected one
  • it gets a list of available paper sources and set the selected one
  • it creates a print ticket and makes any item of available to the view controller
  • it holds the opened PDF document
  • it renders the currently selected page of the PDF document, transform it according to the settings that the user has made, combines it with the printer paper and the printable area and scales it so it will fit in the preview image.
  • It presents the print ticket and a couple of other settings to the document paginator at the moment when the document is printed.
  • It passes the print progress information as delegates back to the view controller so the user can be notified on the print progress.
The document paginator
This renders the PDF document one page at the time and actually spools it to the printer queue along with the print ticket. It also notifies the print model on the progress being made.

Even more print options

Not all the possibilities of the printer ticket are implemented in this code sample, i.e. the settings for the staple and the page-order. However this can easily be implemented by just taking the duplex setting and replicate it to any other setting you might need. The duplex is a good example for these kind of extensions.

Conclusion

Implementing a such a WPF print dialog may not be that simple as there are a couple of hurdles you run into. For example the used DocumentPaginator class, which allows you to print large documents without too much memory usage can raise the infamous "FixedPage cannot contain another FixedPage" exception. Second, getting the transformations needed to scale and position the PDF in the preview tends to have a bit of complexity.

This code sample is part of the PDFRasterizer evaluation download.





Thursday, April 24, 2014

TechDays 2014, The Netherlands

The Hague has seen some interesting events lately. First, the World Forum centre hosted the Nuclear Summit, which Obama called ‘gezellig’ (cozy). And last week, the Microsoft TechDays 2014 flooded it with Microsoft developers. Since we are located in the Netherlands, the TallComponents engineering team took two days off from support and development and headed towards The Hague. As one of the members of this team, I am sharing my thoughts with you.

foto

Since Metro came out, I have had some doubts about the direction that Microsoft is taking with .Net, and in particular C#. It appeared a bit as though Microsoft started to promote C++ development a bit more, and it appeared as if it started to loose its commitment to businesses, pushing out a desktop environment that was basically a step back for doing serious work.

But being at the tech days takes you far away from these concerns. First of all because there are so many developments in terms of .Net and C# that it is impossible to talk about all of them here. Secondly, because there were so many developers there that it is hard to consider Windows as a platform that is in trouble.

Going to the TechDays is not specifically about hearing stuff that you never heard about. In that sense I have no real news here. Instead, it is mainly about getting some deeper understanding about various technologies, some of which only linger in the back of your head because you only read about them once or twice.

Just a few highlights:
  • We learned a bit more about ability to create universal applications for both windows 8.1, and windows phone 8.1. The gap between these has become quite small so it appears. And this is great news for those wishing to target both.
  • We saw developments in terms of compiler technology. Not only is Microsoft's step to open source a lot of code interesting, but also the efforts they are doing to generate faster code (RyuJIT for example). No sign of lack of interest in C# at all. 
  • We have seen some very informative presentations about intricate implementation details behind some constructs like async/await, closures in lambda’s and more.
  • TypeScript developments, and Microsofts apparent commitment in general to supporting javascript-based solutions (based on node.js and other popular frameworks). In essence, this is a move away from solutions that are completely centered around Microsoft technologies like Silverlight. And while this might appear to be a move away from C#, I cannot see this as a bad move at all. It is basically a sign that Microsoft is acknowledging that JavaScript complements the .Net platform. Apart from that, Microsoft is probably keen on attracting and keeping additional developers. Or at least on not loosing existing ones. In the end, it is not about the languages, it is about the platform they are offering. There is place for JavaScript next to C#.
  • Microsoft's partnership with Xamarin. Like the interest in JavaScript, this shows that Microsoft is no longer focusing on its ‘own’ hardware, and apparently not even exclusively on their own OS. I doubt that they really wanted to do this, but if Microsoft can offer a platform that is able to target all these devices, including Windows Phone, the better it is for developers. No doubt they are going to try and make their own platform just a little bit more attractive than the rest. Or – and that is not entirely unlikely either –, they aim at offering the best backend for all these devices. A connected portable device is after all, often nothing more than a fancy peripheral for the actual computer system.
All this together does give confidence that .Net is going to stay an extremely competent platform for offering serious business solutions.




Tuesday, April 22, 2014

Does Heartbleed vulnerability affect our libraries?

A customer noticed that we use Bouncy Castle internally and asked us whether the Heartbleed vulnariblity affects our libraries.

Short answer: No

Long answer:

We use Bouncy Castle for encrypting and decrypting data only, not for SSL connections. In addition, the Heartbleed bug is part of OpenSSL, not of Bouncy Castle. Although Bouncy Castle does implement SSL, it is a different implementation than OpenSSL. This means that the bug is part of a library we do not use.

The Heartbleed bug is a manifestation of the C buffer length checking issue. This bug allows attackers to read sensitive data that is located outside the bounds of the affected buffer.

Our code is 100% managed. We do not use unsafe blocks like these:

unsafe static void FastCopy(byte[] src, byte[] dst, int count)
{
  // unsafe context: can use pointers here.
}

This means that reading memory out of bounds is impossible. The CLR will do bounds-checking before accessing an array and it will throw an exception when anyone tried to access memory outside these bounds. This means that no sensitive information can be obtained this way.

Friday, April 4, 2014

Why jaggies around rasterized lines?

Jaggies

After a PDF document is rasterized, why do we see jaggies around lines?



The answer starts with a bit of theory on the coordinates: In a PDF document these coordinates denotes a positions on a plane using a X and an Y axis. These positions uses floating point numbers which result in the ability to point to any location, not only at the grid lines but also anywhere in between.

However, at the moment that a PDF document is printed or viewed on a screen, it has to be rasterized to something what we call the 'device space'. This device space is a bitmap which consist of discrete pixels, each with its own coordinate. So in this device space case the X and Y coordinates are integers.

So how exactly are the floating point coordinates mapped to the pixels then? 

If we forget about the transformations for a moment, the definition is as follows: "Pixel boundaries always fall on integer coordinates in device space. A pixel is a square region identified by the location of its corner with minimum horizontal and vertical coordinates. The region is half-open, meaning that it includes its lower but not its upper boundaries " In other words: it looks like this:

What happens if we draw a black vertical line with a width of 1 pixel wide positioned exactly at the PDF coordinates (2, 1) to (2,6)?
Then you get a line with a width of 2 pixels with all pixels grayed out, grey instead of black because all pixels are only 50% part of the line. However if you draw a line at exactly halfway each coordinates, you get what you might expect in the first place:
Note that you usually don't want to do this kind of exact positioning, as it is most likely that the PDF coordinates will be scaled by the one or more transformations. When you print an PDF document it will be scaled to A4, letter, or any other format. On screen it will depend on the 'dots per inch' and the size of the window in which it is displayed. So this these exact coordinates are most likely changed in something entirely different in the somewhere in process of viewing. So, just don't use halfway coordinates for this reason.

The jaggies 

The jaggies will become visible if the PDF is rendered to a black and white bitmap, e.g. for a printed page. It is not possible to use gray values then so dithering has to be used instead like:

Why do this dithering instead of moving the line half a pixel?

The question is: Is this dithering really what we want or can we do something better here? it is clear that there is a decision to be made here on what is most important:
  • Either the rasterizer tries to approach the exact position of the lines by means of dithering,
  • or it rounds the position of the line to the nearest pixel. 
Neither of these give perfect results, dithering will look strange if you zoom into the details, and rounding results in lines with variations in location and thickness (as a 2.5 pixel wide line may be rounded to either 2 of 3 pixels). So there is no way to render the following line to a thickness of 2.5 pixels without dithering:

The answer:

The reason that we have chosen in our implementation for these jaggies (is dithering) is that we believe that it results in better readability because:
  • we think that unintended variations in location and thickness are more noticeable than jaggies.
  • the dithering is supposed to be performed at such resolution that the individual pixels are indistinguishable, so you would not see it in normal situations. 

Therefore the answer is: we do dithering as it gives the appearance of sub-pixel precision at the cost of strange pixels, which you will only see if you zoom in too much.

Tuesday, March 11, 2014

Transformations change the world, or everything in it.

 

In most – if not all – graphical systems, it is possible to apply some transformation on graphical elements in order to render them in a certain way. These transformations can often be combined in a particular order to yield a new transformation.



We often notice however, that programmers perceive it as hard to correctly combine these transformations. Graphical transformations are not commutative, so the order of transformations matters. In addition, the actual correct order depends on the way that a graphical systems deals with them. One cannot just always interchange a particular order between various graphical systems, and this is not always explained very well. This typically leads to a lot of trial-and-error before the correct combination of transformations is found.

Below, we will have a look at the graphical transformation systems in PDF, WPF and PDFKit.NET.

Graphical Transformations in PDF

In PDF, as well as in Postscript, is is possible to transform graphical objects via matrix transformations. Here however one does not really transform the objects, but the coordinate system in which they get drawn.

This may seem weird at first sight. Why transform the coordinate system, and not the objects themselves? The main reason for this is that this allows one to draw a group of objects using the same transformation, without having to specify the transformation for each and every object. Within the group, all objects can be drawn relatively to each other as if they are drawn in an untransformed world. And by applying a coordinate system transformation upfront, e.g. a scaling transformation, the entire scene gets scaled.

The order of the transformations is important. The PDF reference manual shows the difference as follows:
image
In general, transformations in PDF should be done in the following order:
  1. Translate
  2. Rotate
  3. Scale or Skew

Transformation order in WPF

WPF also has transformations. MSDN however is not very clear about combined transformations. It suggests that WPF also transforms the coordinate system, and it contains some samples, but there is no discussion what happens exactly. See for example:

http://msdn.microsoft.com/en-us/library/ms750596(v=vs.110).aspx
http://msdn.microsoft.com/en-us/library/ms750975(v=vs.110).aspx

What we see however, is that XAML does not transform the coordinate system. Take for example the following XAML code.
<Window x:Class="WpfApplication1.MainWindow"
      xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
      xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="MainWindow" Height="350" Width="525">
    <Canvas Width="200" Height="200">
        <Rectangle Canvas.Left="0" Canvas.Top="0" Width="200" Height="200" Stroke="Black" Opacity="1.0"/>
        <Rectangle Canvas.Left="0" Canvas.Top="0" Width="50" Height="50" Stroke="RoyalBlue" Opacity="1.0">
          <Rectangle.RenderTransform>
             <TransformGroup>
               <TranslateTransform X="50" Y="75"/>
               <ScaleTransform ScaleX="3"/>
             </TransformGroup>   
          </Rectangle.RenderTransform>
        </Rectangle>
    </Canvas>
</Window>


This has the following effect. Note that the black rectangle has a size of 200 x 200 and that the blue rectangle has been translated much further to the right than 50 units. It has been translated by 150 units to be exact.

image

If however, we reverse the transformations, we get what we expect:

<TransformGroup>
       <ScaleTransform ScaleX="3"/>
       <TranslateTransform X="50" Y="75"/>
</TransformGroup>   
image

What we see here is that WPF applies both transformations in sequence on the object, without changing the coordinate system in which the object resides. In fact, if we want to rotate the object in its current location and without distorting, we will need to do this after scaling an before translating, which is exactly the opposite of the correct order in PDF.

<TransformGroup>
       <ScaleTransform ScaleX="3"/>
       <RotateTransform Angle="-30"/>
       <TranslateTransform X="50" Y="75"/>
</TransformGroup>   
image

If we rotate before scaling, the object gets distorted (skewed) because scaling then takes places on the rotated object along the untransformed, horizontal x-axis:

 image

Whereas, if we rotate at the end, after translating, the object gets placed at a different location because the center of the rotation is still at the original origin of the coordinate system the top left of the black rectangle.

image

This means that the information in MSDN about these transformations is actually incorrect.

There is however a reason to do things differently in WPF.

In PDF we do not really have graphical objects with Transform properties. We only have graphical operations that operate in a certain environment, of which the transformed coordinate system is just one of the aspects. The main reason to do this is compactness. By having an environment with all sorts of graphical context, the operations themselves can be kept simple and short.

In WPF however, we have an object-oriented system where we want all objects to be as independent as possible, and where – consequently – each object has its own transformation, next to properties like color and transparency, instead of these being part of some environment. And in that case it makes sense to not have transformations change the coordinate system but just the object itself.

This also has the advantage that transformations can be combined in a more natural order, and programmers can reason within a fixed coordinate system.

Shape Transformations in PDFKit.NET

In PDFKit.NET one can draw with shapes, and just as in WPF, each shape is a complete graphical element that has its own transformation property. This means that the correct order of doing transformations in PDFKit.NET is similar to WPF:
  1. Scale or Skew
  2. Rotate
  3. Translate
Below we have shown the effect of applying these transformations in that particular order:

Pen pen = new Pen(RgbColor.Black, 1);
FreeHandShape charAsShape = new FreeHandShape { Pen = pen, Brush = null };
charAsShape.Paths.AddRange(Font.TimesRoman.CreatePaths('n', 100));
charAsShape.Transform = new TransformCollection
{  
   new ScaleTransform(3,1),
   new RotateTransform(-30),
   new TranslateTransform(50, 75)
};

Original:
image

Step 1: scaling by a factor 3 in the x direction:
image

Step 2: Rotation by 30 degrees counterclockwise:
image

Step 3: Translation by 50 units in the x direction and 75 units in the y direction:
image

Wednesday, February 19, 2014

Splitting Hairlines

You probably all have seen PDF documents that have very fat graphics when viewed at a low zoom level:

image

And, when zooming in, the fat graphics disappear, and slim down to something more sensible.

image

The effect that you see here is that of a “hairline”. Hairlines can be surprisingly fat at low resolutions but thin at high resolutions. The PDF Reference manual says the following about them:

“A line width of 0 shall denote the thinnest line that can be rendered at device resolution: 1 device pixel wide. However, some devices cannot reproduce 1-pixel lines, and on high-resolution devices, they are nearly invisible. Since the results of rendering such zero-width lines are device-dependent, they should not be used.”

The reality is however that since the PDF specification allows this, documents with these hairlines do exist, and this leads to the effects seen above.

Luckily, this effect is not really that problematic in practice, because the rendering results are pretty much the same for all viewers.

Empty Rectangles

There is another related effect, which is more subtle, and a bit trickier because the PDF reference does not precisely say what to do in this case: rectangles with a zero width or height that are “filled” with a color.

At first sight, one would assume that these should not be painted at all. It seems logical. GDI in fact does not render anything for such zero-width fills. Adobe Reader however shows these rectangles as hairlines. The closest hint we get from the PDF Reference manual is the following information about the fill operator “f”:

“If a subpath is degenerate (consists of a single-point closed path or of two or more points at the same coordinates), f shall paint the single device pixel lying under that point; the result is device-dependent and not generally useful.”

Extending this reasoning to rectangles with a zero width, this seems to imply that indeed, filling these should result in a hairline. And in fact, we have encountered several PDF documents that confirm this.

PDFRasterizer.NET

But now take a look at the following graphics. Adobe reader renders them as follows:

image

Our software (PDFRasterizer.NET) however produces this:

image

Is this wrong? Well apparently. But what is happening here? Closer inspection reveals that this area is covered with zero-width rectangles that are all filled with black. The image below has been taken from the same area, but rendered at a higher resolution by PDFRasterizer.NET.

image

To make things more complex, this area is also covered with very narrow (non empty!) rectangles filled with white. The black and white rectangles are painted alternately.

Logically, the zero width black rectangles should not be visible, because the white rectangles are wider and painted over them. But officially - as these zero-width black rectangles must be painted as hairlines -, they will rendered “thicker” than the white rectangles at low resolutions, and thus become visible. That is exactly what you see in PDFRasterizer.NET.

Note that strictly, one can never render anything thinner than a pixel. What happens normally, is that elements smaller than a pixel get interpolated with their background pixel with a factor that reflects how much they cover that background pixel. At low resolutions however, the black hairlines cause the background pixels to become entirely black, while the thin white rectangles only cover that background for a small fraction, leading to a largely dark-gray appearance.

The question for us here is not so much why our software renders these lines at low resolutions, but rather why Adobe Reader does not. The PDF Reference manual does not give any clue. It could be that Adobe does not take into account hairlines when interpolating new pixels with the existing background. This in itself would make sense, as these hairlines do not really constitute any real area. Unfortunately, if that is the case here, we will probably not be able to solve this in GDI+, as it does not offer us that kind of control.

The good news is however, that these graphical constructs are very exceptional, and there is a workaround by rendering at higher resolutions.

And essentially, this kind of problem is exactly why Adobe indicates that hairlines should not be used.

Tuesday, February 11, 2014

Securing PDF Documents with Passwords

PDF files can be secured with a user password and an owner password. In a way these passwords are very related, as they both control the level of access that you have on a document. But there are some differences, and we sometimes get questions about these, in particular what it means for encryption.

In principle, the use of these passwords is rather straightforward:
  • The user password is needed to access a document at all. If a user password has been specified, you will not be able to open the document without providing this password. So typically, you will set such a password if you intend a document to be read by one specific person, or group, that knows the password.
  • The owner password controls the document restrictions. You will find these restrictions if you open the Document Properties of a document in Adobe Reader (ctrl-D) and select the Security tab. Amongst others it is possible to restrict printing of the document, or copying of content. Adobe Acrobat will not allow you to change the document restrictions without providing the owner password. The owner password however, does not restrict opening or viewing of a document.
image
In real life, we hardly encounter documents with a user password. PDF is mostly used for distributing information and the least that you normally want people to be able to do is to open a document and view it. So if you encounter a secured document, chances are that is has only specified an owner password.

Note that if a document has a user password, you will not be able to ignore this, because all readers will prompt you for it upon opening:

image

Encryption

As soon as one sets one of these password, the PDF document will be encrypted. This sometimes gives rise to confusion, because there is software that is able to remove the document restrictions -controlled by the owner password – without providing this password. PDFKit.NET is able to do this, as well as some other software packages.

Adobe Acrobat on the other hand does not allow you to do his. It will always require you to provide the owner password in order to change the permissions.
There are 2 aspects that need to be considered here.
  • Technically, there is an important difference between the role of the user password and the role of the owner password during encryption. In essence, only the user password plays a role in computing the encryption key. Without the correct user password the PDF file cannot be decrypted, and thus it cannot be viewed at all. If only an owner password is provided however, the file will still get encrypted, but in that case the encryption key will be computed from an number of other properties of the file itself, so in essence the encryption key is no secret then. This means that if no user password has been specified, anyone can decrypt the file, which is effectively as good as no encryption at all.
  • The owner password is meant as a lock on the document restrictions, but as it plays no role in encryption this is merely a convention. This comes as no surprise: if the owner password had played a role in encryption, it would have become impossible to open the document without providing it. So technically, there really is no way to secure the document restrictions better than by convention.
Users that set the document restrictions, need to be aware that technically there is no obstruction against removing these without providing the owner password, although many software applications will restrict this by convention.

TallComponents software

One of the questions you may have now, is why our software does not play by these conventions? The answer is actually quite simple: we do not provide end-user applications, but merely tools for building them, and there is no way that we can effectively enforce these conventions.

Take PDFRasterizer.NET for example. It provides methods for drawing PDF pages to a System.Drawing.Graphics instance. There is no way however that our software is able to control how the result will be used. It could be used for viewing, but also for printing. So if some printing restrictions are set in the document, it is up to the final application to deal with this. Our software cannot control this.

For PDFKit.NET the situation is similar, although less apparent at first sight. PDFKit.Net allows you to remove the Security settings of a document without providing the owner password. We could have made an API that only allows this when a correct owner password gets passed. This however would only seemingly have solved the issue. For even with the security settings intact it would still have been possible to create an unsecured copy of this document using all sorts of other API calls in PDFKit.NET. API calls that also can be used to do things that are allowed for a secured document.

This means that dealing with these document restrictions is not something that our software components can, or should enforce. It is up to the application developer to make sure that these conventions are met. And it is up to the creators of these PDF documents to be aware that the document restrictions cannot be enforced technically.