Thursday, August 20, 2015

Adobe Reader requires Asian font packs for Western Documents

We recently encountered reports that Adobe Reader XI wants to install Asian language packs for files that got generated by our software and that are in fact, completely “western”.

Adobe Reader X did not require such a thing, so we wondered what had happened.
East Meets West

It turns out that this happens for documents that have non-embedded TrueType fonts. In PDF, one must incorporate some information for each font that is used in a document, even if it is not embedded. This can be done in various ways. In the past, we chose to incorporate so-called “CID”-style font information. This has the advantage that it can address the complete range of all Unicode characters, so in principle one can safely write out all Unicode text with such a font, without worrying about it any further (mixing Western, Asian, and even Klingon, provided that the fonts contains definitions for these glyphs).

One of the requirements for this type of font that the PDF specification imposes, is that it needs to have an encoding from a predefined sets of encodings. This limited set happens to just contain encodings that are classified as “Eastern” (Chinese, Japanese, Korean, etc.). There is however nothing particular “Eastern” about these. An encoding just specifies which sequence of bytes in a text maps to which glyph ID in the font. Languages are not really involved at this point.

And so we used these “Eastern” encodings for many years, and this has never lead to any issues having to needlessly install eastern language packs and the like. For all that really matters is not the encoding, but the characters that are actually used in the document.

Well, Adobe Reader XI changed all that. It now confuses our customers, prompting them to install an Asian language pack for documents that do not have a single eastern character in them. All because the encoding for the used CID fonts is “Eastern” (it has to be, because Adobe requires it for CID fonts).

To avoid this, it appears that we will have to start avoiding CID fonts for western documents. This means that we will have to investigate each piece of text, check whether the characters fit within a particular western single-byte code page, and output a TrueType font description with that single byte encoding. It also means that we may have to output multiple of these font descriptions into a document if not all characters fit within a single western single-byte code page.

It seems like an awkward step to take, now that Unicode should actually have solved all these code page troubles many years ago.

Tuesday, September 2, 2014

Why Adobe LiveCycle DRM is not supported?

Adobe LiveCycle Rights Management is a server-based security system that provides dynamic control over PDFs. It is a complex solution which, besides PDF, supports many document formats e.g. Microsoft Word.

Its key capabilities, according to their website, are

  • Restrict access of a document to certain individuals only.
  • Monitor the usage of a given document.
  • And most notably, revise usage rights or revoke access to documents after they have been distributed.

All of these dynamic capabilities require some kind of server based solution; the usage rights and the access policy is, instead of being part of the document, provided by some external policy server.

In practice, that requires the clients (e.g. a PDF viewer application) and the policy servers to implement a communication protocol to obtain decryption keys, access policy and so on.
Obviously, Adobe has the technology for a relatively safe communication between the client and the policy servers. However, it is not public technology, third party software cannot implement it.

If you encounter the exception "Unsupported encryption found" in any of TallComponent's PDF components, it is very likely that you stumbled upon a PDF treated with this DRM. All you can do is to double check whether this is the case. To check it, open your document with Adobe Reader (it might take some time as it has to connect the policy server to check the access restrictions). Open Security Settings, then select Permission Details:

The Security Method field contains the applied encryption algorithm. If it says "Adobe LiveCycle Rights Management", you are unfortunately bounded to Adobe Reader.

Tuesday, June 3, 2014

Printing PDF documents in WPF with a preview

Printing PDF documents in WPF applications

It is easy to print PDF documents in a application using the standard 'PrintDialog' in .NET. How to do so is described in this knowledge base article. However, the dialog being used is the standard one and is rather limited in it's functionally. It does not give you a preview either on how the document will look like when its printed and some common print options are missing.

Printing with a preview

What you may want instead is a preview of how the document will be printed on paper that is larger than the PDF document, like:

Or you may want to see what part will be printed on smaller paper, like:

Or, what will the output look like when the document is rotated to landscape and vertically centered and with automatic scaling to the printable area on the paper:

Extra print options

The preview is also useful when you want to print only one or a few pages of the document, like this:

Anyway, the extra print options like duplex printing, like shown above, should always be available to reduce the usage of paper sheets.

The WPF code sample

The code sample 'ScaledPrintingWPF' in PDFrasterizer.NET shows an implementation of the features above. It looks like this:

It is implemented using the following MVVM architecture:

The XAML view
A custom print dialog as shown above is implemented in XAML and has a menu to open a PDF file. It uses bindings for all printers settings and the preview box. It also has the usual code-behind part which contains only a small amount of code.

The view controller
The bindings connects the dialog to the view controller, which has the user interface logic and interfaces to the print model. 

The print model
The print model is where is starts to get more interesting, here is a list of what is does:
  • it gets a list of available printers and sets the selected one.
  • it gets a list of available paper formats and set the selected one
  • it gets a list of available paper sources and set the selected one
  • it creates a print ticket and makes any item of available to the view controller
  • it holds the opened PDF document
  • it renders the currently selected page of the PDF document, transform it according to the settings that the user has made, combines it with the printer paper and the printable area and scales it so it will fit in the preview image.
  • It presents the print ticket and a couple of other settings to the document paginator at the moment when the document is printed.
  • It passes the print progress information as delegates back to the view controller so the user can be notified on the print progress.
The document paginator
This renders the PDF document one page at the time and actually spools it to the printer queue along with the print ticket. It also notifies the print model on the progress being made.

Even more print options

Not all the possibilities of the printer ticket are implemented in this code sample, i.e. the settings for the staple and the page-order. However this can easily be implemented by just taking the duplex setting and replicate it to any other setting you might need. The duplex is a good example for these kind of extensions.


Implementing a such a WPF print dialog may not be that simple as there are a couple of hurdles you run into. For example the used DocumentPaginator class, which allows you to print large documents without too much memory usage can raise the infamous "FixedPage cannot contain another FixedPage" exception. Second, getting the transformations needed to scale and position the PDF in the preview tends to have a bit of complexity.

This code sample is part of the PDFRasterizer evaluation download.

Thursday, April 24, 2014

TechDays 2014, The Netherlands

The Hague has seen some interesting events lately. First, the World Forum centre hosted the Nuclear Summit, which Obama called ‘gezellig’ (cozy). And last week, the Microsoft TechDays 2014 flooded it with Microsoft developers. Since we are located in the Netherlands, the TallComponents engineering team took two days off from support and development and headed towards The Hague. As one of the members of this team, I am sharing my thoughts with you.

Since Metro came out, I have had some doubts about the direction that Microsoft is taking with .Net, and in particular C#. It appeared a bit as though Microsoft started to promote C++ development a bit more, and it appeared as if it started to loose its commitment to businesses, pushing out a desktop environment that was basically a step back for doing serious work.

But being at the tech days takes you far away from these concerns. First of all because there are so many developments in terms of .Net and C# that it is impossible to talk about all of them here. Secondly, because there were so many developers there that it is hard to consider Windows as a platform that is in trouble.

Going to the TechDays is not specifically about hearing stuff that you never heard about. In that sense I have no real news here. Instead, it is mainly about getting some deeper understanding about various technologies, some of which only linger in the back of your head because you only read about them once or twice.

Just a few highlights:
  • We learned a bit more about ability to create universal applications for both windows 8.1, and windows phone 8.1. The gap between these has become quite small so it appears. And this is great news for those wishing to target both.
  • We saw developments in terms of compiler technology. Not only is Microsoft's step to open source a lot of code interesting, but also the efforts they are doing to generate faster code (RyuJIT for example). No sign of lack of interest in C# at all. 
  • We have seen some very informative presentations about intricate implementation details behind some constructs like async/await, closures in lambda’s and more.
  • TypeScript developments, and Microsofts apparent commitment in general to supporting javascript-based solutions (based on node.js and other popular frameworks). In essence, this is a move away from solutions that are completely centered around Microsoft technologies like Silverlight. And while this might appear to be a move away from C#, I cannot see this as a bad move at all. It is basically a sign that Microsoft is acknowledging that JavaScript complements the .Net platform. Apart from that, Microsoft is probably keen on attracting and keeping additional developers. Or at least on not loosing existing ones. In the end, it is not about the languages, it is about the platform they are offering. There is place for JavaScript next to C#.
  • Microsoft's partnership with Xamarin. Like the interest in JavaScript, this shows that Microsoft is no longer focusing on its ‘own’ hardware, and apparently not even exclusively on their own OS. I doubt that they really wanted to do this, but if Microsoft can offer a platform that is able to target all these devices, including Windows Phone, the better it is for developers. No doubt they are going to try and make their own platform just a little bit more attractive than the rest. Or – and that is not entirely unlikely either –, they aim at offering the best backend for all these devices. A connected portable device is after all, often nothing more than a fancy peripheral for the actual computer system.
All this together does give confidence that .Net is going to stay an extremely competent platform for offering serious business solutions.

Tuesday, April 22, 2014

Does Heartbleed vulnerability affect our libraries?

A customer noticed that we use Bouncy Castle internally and asked us whether the Heartbleed vulnariblity affects our libraries.

Short answer: No

Long answer:

We use Bouncy Castle for encrypting and decrypting data only, not for SSL connections. In addition, the Heartbleed bug is part of OpenSSL, not of Bouncy Castle. Although Bouncy Castle does implement SSL, it is a different implementation than OpenSSL. This means that the bug is part of a library we do not use.

The Heartbleed bug is a manifestation of the C buffer length checking issue. This bug allows attackers to read sensitive data that is located outside the bounds of the affected buffer.

Our code is 100% managed. We do not use unsafe blocks like these:

unsafe static void FastCopy(byte[] src, byte[] dst, int count)
  // unsafe context: can use pointers here.

This means that reading memory out of bounds is impossible. The CLR will do bounds-checking before accessing an array and it will throw an exception when anyone tried to access memory outside these bounds. This means that no sensitive information can be obtained this way.

Friday, April 4, 2014

Why jaggies around rasterized lines?


After a PDF document is rasterized, why do we see jaggies around lines?

The answer starts with a bit of theory on the coordinates: In a PDF document these coordinates denotes a positions on a plane using a X and an Y axis. These positions uses floating point numbers which result in the ability to point to any location, not only at the grid lines but also anywhere in between.

However, at the moment that a PDF document is printed or viewed on a screen, it has to be rasterized to something what we call the 'device space'. This device space is a bitmap which consist of discrete pixels, each with its own coordinate. So in this device space case the X and Y coordinates are integers.

So how exactly are the floating point coordinates mapped to the pixels then? 

If we forget about the transformations for a moment, the definition is as follows: "Pixel boundaries always fall on integer coordinates in device space. A pixel is a square region identified by the location of its corner with minimum horizontal and vertical coordinates. The region is half-open, meaning that it includes its lower but not its upper boundaries " In other words: it looks like this:

What happens if we draw a black vertical line with a width of 1 pixel wide positioned exactly at the PDF coordinates (2, 1) to (2,6)?

Then you get a line with a width of 2 pixels with all pixels grayed out, grey instead of black because all pixels are only 50% part of the line. However if you draw a line at exactly halfway each coordinates, you get what you might expect in the first place:

Note that you usually don't want to do this kind of exact positioning, as it is most likely that the PDF coordinates will be scaled by the one or more transformations. When you print an PDF document it will be scaled to A4, letter, or any other format. On screen it will depend on the 'dots per inch' and the size of the window in which it is displayed. So this these exact coordinates are most likely changed in something entirely different in the somewhere in process of viewing. So, just don't use halfway coordinates for this reason.

The jaggies 

The jaggies will become visible if the PDF is rendered to a black and white bitmap, e.g. for a printed page. It is not possible to use gray values then so dithering has to be used instead like:

Why do this dithering instead of moving the line half a pixel?

The question is: Is this dithering really what we want or can we do something better here? it is clear that there is a decision to be made here on what is most important:
  • Either the rasterizer tries to approach the exact position of the lines by means of dithering,
  • or it rounds the position of the line to the nearest pixel. 
Neither of these give perfect results, dithering will look strange if you zoom into the details, and rounding results in lines with variations in location and thickness (as a 2.5 pixel wide line may be rounded to either 2 of 3 pixels). So there is no way to render the following line to a thickness of 2.5 pixels without dithering:

The answer:

The reason that we have chosen in our implementation for these jaggies (is dithering) is that we believe that it results in better readability because:
  • we think that unintended variations in location and thickness are more noticeable than jaggies.
  • the dithering is supposed to be performed at such resolution that the individual pixels are indistinguishable, so you would not see it in normal situations. 

Therefore the answer is: we do dithering as it gives the appearance of sub-pixel precision at the cost of strange pixels, which you will only see if you zoom in too much.

Tuesday, March 11, 2014

Transformations change the world, or everything in it.


In most – if not all – graphical systems, it is possible to apply some transformation on graphical elements in order to render them in a certain way. These transformations can often be combined in a particular order to yield a new transformation.

We often notice however, that programmers perceive it as hard to correctly combine these transformations. Graphical transformations are not commutative, so the order of transformations matters. In addition, the actual correct order depends on the way that a graphical systems deals with them. One cannot just always interchange a particular order between various graphical systems, and this is not always explained very well. This typically leads to a lot of trial-and-error before the correct combination of transformations is found.

Below, we will have a look at the graphical transformation systems in PDF, WPF and PDFKit.NET.

Graphical Transformations in PDF

In PDF, as well as in Postscript, is is possible to transform graphical objects via matrix transformations. Here however one does not really transform the objects, but the coordinate system in which they get drawn.

This may seem weird at first sight. Why transform the coordinate system, and not the objects themselves? The main reason for this is that this allows one to draw a group of objects using the same transformation, without having to specify the transformation for each and every object. Within the group, all objects can be drawn relatively to each other as if they are drawn in an untransformed world. And by applying a coordinate system transformation upfront, e.g. a scaling transformation, the entire scene gets scaled.

The order of the transformations is important. The PDF reference manual shows the difference as follows:

In general, transformations in PDF should be done in the following order:
  1. Translate
  2. Rotate
  3. Scale or Skew

Transformation order in WPF

WPF also has transformations. MSDN however is not very clear about combined transformations. It suggests that WPF also transforms the coordinate system, and it contains some samples, but there is no discussion what happens exactly. See for example:

What we see however, is that XAML does not transform the coordinate system. Take for example the following XAML code.
<Window x:Class="WpfApplication1.MainWindow"
        Title="MainWindow" Height="350" Width="525">
    <Canvas Width="200" Height="200">
        <Rectangle Canvas.Left="0" Canvas.Top="0" Width="200" Height="200" Stroke="Black" Opacity="1.0"/>
        <Rectangle Canvas.Left="0" Canvas.Top="0" Width="50" Height="50" Stroke="RoyalBlue" Opacity="1.0">
               <TranslateTransform X="50" Y="75"/>
               <ScaleTransform ScaleX="3"/>

This has the following effect. Note that the black rectangle has a size of 200 x 200 and that the blue rectangle has been translated much further to the right than 50 units. It has been translated by 150 units to be exact.

If however, we reverse the transformations, we get what we expect:

       <ScaleTransform ScaleX="3"/>
       <TranslateTransform X="50" Y="75"/>

What we see here is that WPF applies both transformations in sequence on the object, without changing the coordinate system in which the object resides. In fact, if we want to rotate the object in its current location and without distorting, we will need to do this after scaling an before translating, which is exactly the opposite of the correct order in PDF.

       <ScaleTransform ScaleX="3"/>
       <RotateTransform Angle="-30"/>
       <TranslateTransform X="50" Y="75"/>

If we rotate before scaling, the object gets distorted (skewed) because scaling then takes places on the rotated object along the untransformed, horizontal x-axis:

Whereas, if we rotate at the end, after translating, the object gets placed at a different location because the center of the rotation is still at the original origin of the coordinate system the top left of the black rectangle.

This means that the information in MSDN about these transformations is actually incorrect.

There is however a reason to do things differently in WPF.

In PDF we do not really have graphical objects with Transform properties. We only have graphical operations that operate in a certain environment, of which the transformed coordinate system is just one of the aspects. The main reason to do this is compactness. By having an environment with all sorts of graphical context, the operations themselves can be kept simple and short.

In WPF however, we have an object-oriented system where we want all objects to be as independent as possible, and where – consequently – each object has its own transformation, next to properties like color and transparency, instead of these being part of some environment. And in that case it makes sense to not have transformations change the coordinate system but just the object itself.

This also has the advantage that transformations can be combined in a more natural order, and programmers can reason within a fixed coordinate system.

Shape Transformations in PDFKit.NET

In PDFKit.NET one can draw with shapes, and just as in WPF, each shape is a complete graphical element that has its own transformation property. This means that the correct order of doing transformations in PDFKit.NET is similar to WPF:
  1. Scale or Skew
  2. Rotate
  3. Translate
Below we have shown the effect of applying these transformations in that particular order:

Pen pen = new Pen(RgbColor.Black, 1);
FreeHandShape charAsShape = new FreeHandShape { Pen = pen, Brush = null };
charAsShape.Paths.AddRange(Font.TimesRoman.CreatePaths('n', 100));
charAsShape.Transform = new TransformCollection
   new ScaleTransform(3,1),
   new RotateTransform(-30),
   new TranslateTransform(50, 75)


Step 1: scaling by a factor 3 in the x direction:

Step 2: Rotation by 30 degrees counterclockwise:

Step 3: Translation by 50 units in the x direction and 75 units in the y direction: