Monday, January 28, 2013

Cross-platform limitations of Mono.

Some customers indicated that they were having issues running PDFRasterizer.Net on Mono. And although we do not officially support mono, we decided to do some experiments with it nonetheless.

Running a sample in MonoDevelop on Windows

So, we installed MonoDevelop, and we tried to run some of the PDFRasterizer.NET samples. This immediately led to the following error:

Error CS1566: Error reading resource file '…\TallComponents.PDF.Rasterizer_3.0.84.2\Code Samples\CS\ViewPDF\MainForm.resources' -- 'The system cannot find the file specified. ' (CS1566) (ViewPDF_vs2010)

In order to solve this issue we had to manually call resgen on all the .resx files. This allowed us to build the project, but when trying to debug it, we got the following:

image

Luckily, this appeared to be rather an issue with running the MonoDevelop debugger than anything else. Without debugging, things appeared to be fine, so we did not pursue debugging in Windows any further.

image

Running a sample in MonoDevelop on OSX

Then, we figured that we should also try things on Mac OSX. At first this gave some weird compile errors, like:

…/TallComponents.PDF.Rasterizer_3.0.84.2/Code Samples/CS/ViewPDF/MainForm.cs(35,35): Error CS0584: Internal compiler error: Method not found: 'TallComponents.PDF.Rasterizer.Document.ConvertToWpf'. (CS0584) (ViewPDF_vs2010)

This was strange because we used a sample that made no reference to any WPF functionality at all. It occurred to us however that this sample referenced a .Net 4.0 assembly, and we remembered some old blogs talking about MonoDevelop having issues with that. So we removed the .Net 4.0 PDFRasterizer.NET assembly and added the .Net 2.0 one instead. This looked quite hopeful at first, and interestingly we had no trouble running the debugger here:

image

But then we noticed that the evaluation banner was missing. The banner should have been there because we were running in evaluation mode. We started looking further. On OSX, PDFRasterizer.NET turned out to be plagued by basically 2 problems:

  • Some pages would render completely blank, or only partially.
  • Mono would just quit on some pages, leaving no further option to debug:image

Compiling in MonoDevelop

In order to track this down, we decided to try and get our PDFRasterizer.NET sources compiled under MonoDevelop. We started doing so on Windows, soon to discover that the MonoDevelop compiler  (3.0.5) has some limitations compared to standard Microsoft .Net. Amongst others:

  • it gives errors on vars.
  • it gives errors on extension methods
  • one needs to specify brackets for some #if evaluations

After “fixing” these, we went on to OSX with our sources. Here we discovered that some further issues were flagged. These were relatively minor, but it was certainly a surprise to find out that these differences exist between the PC and the Mac version of MonoDevelop. More so, because we also discovered that the Windows issues mentioned above were not flagged by the compiler on OSX at all. Both versions are the latest stable versions for both platforms, which at this point means 3.0.5 for windows and 3.1.1 for OSX.

Debugging

In any case, after having fixed these compilation issues, it became more or less possible to debug PDFRasterizer.Net on OSX. Apart from the occasional crash, we were able to identify the following 2 runtime issues:

  • there were runtime differences in casts from (negative) doubles to uints. Once we were aware of this, this was easy to fix.
  • The System.Drawing implementation on OSX turned out to behave differently with respect to clipping. In OSX the clipping area becomes empty at unpredictable times, leading to loss of graphics. If we turned off clipping altogether, we were able render additional graphics on some problematic pages , but part of the page would still be rendered incorrectly due to improper clipping handling.

The following pictures illustrate the latter. With clipping enabled, we would get the following on OSX:

image

Whereas with clipping entirely disabled, we would get:

image

Notice that the last result does show additional images (as it should), but also that the colors that are projected in the topmost images are wrong in both cases. This happens because clipping is used to draw the “intersections” of various colors. Clearly this goes wrong no matter whether clipping is turned on or not. For comparison, the same code – with clipping enabled - renders the following on MonoDevelop on Windows:

image

In itself this issue was already a showstopper for us. Unfortunately, having compiled the system on MonoDevelop itself also did not preclude the crashes that we encountered earlier. There were still PDF files that we were unable to debug at all, because Mono just quits on them.

Conclusion

All in all, it was very disappointing to discover that Mono is not as platform-independent as it claims to be, while also being very unstable on OSX. This means that it is currently impossible for us to have a workable solution for Mono on various platforms.

Apart for improvements in Mono itself, our hopes are now mainly on our own “TallBitmap” renderer, which will allow us to render PDF pages without relying on System.Drawing, and thus without depending on Mono’s implementation of this functionality on various platforms.

Tuesday, December 11, 2012

Making PDF rasterizer extensible - First design meeting

Starting with version 4.0 of PDFRasterizer.NET, we will make our PDF render engine extensible in the sense that you can plug-in your own output device. This should enable developers to render (or convert) PDF to e.g. SVG, HTML5 or something else. This also serves an internal purpose; conversions that we already offer (e.g. render PDF to a GDI bitmap or XPS) will be implemented using this same plug-in API.

This article reports about our first design meeting. Much of what is included here will evolve over time. We will make small iterations and implement mutliple output devices to continuously validate the plug-in API. We will try to release these iterations as often as possible. To join the beta program, send a request to sales@tallcomponents.com. We welcome any feedback obviously.

Whiteboard notes

OutputDevice

The starting point of the plug-in API will be the OutputDevice class that must be specialized (inherited). It represents the device/format to which you want to render. By overriding a set of methods you implement your own output device (class MyOutputDevice). These methods will likely fall into one of two categories: 1. resource factories and 2. drawing methods. We will make the model such that you may choose to not implement everything and rely on the base class to simplify things if desired. A typical example is to not draw glyphs but rely on the base class to convert them to paths and render those instead. Another example is to support RGB only and have the base class do the color conversions for you.

Resources

The PDF imaging model includes the following resource types:
  • Paths
  • Fonts
  • Images
  • Color spaces
  • Colors
  • Patterns
  • Shadings
  • Groups
The OutputDevice class has virtual methods that return instances of these resources. These resource factory methods will be called when the given resource is required during rendering. The API model includes a base class for each resource type. The implentation of the override returns a specialization of the resource (e.g. SvgFont). If the resource is not supported, the virtual is not overridden.

Drawing

The second category of methods are the drawing methods. These are virtual methods of OutputDevice and can be overridden in MyOutputDevice. The drawing methods include at least:
  • DrawText
  • DrawPath (stroke and fill)
  • DrawImage
When these methods are called, the corresponding resources (e.g. Path when DrawPath is called and Font when DrawText is called) will either be passed with the method, or it will be accessible through the graphics state. This is yet to be decided.

Graphics State

Like many other imaging models, PDF has the notion of a graphics state. Here is part of what is included in the graphics state:
  • current transformation matrix (CTM)
  • current path
  • current font and font size
  • current color space and color (both for stroking and filling)
Operations inside a PDF document modify the graphics state in order to achieve the desired result when a drawing operation is performed.

How the graphics state will be part of the plug-in API is not sure yet. There several options:

  1. We offer it as a base property of OutputDevice. 
  2. We offer overridables that will be called when the graphics state is modified. These methods can either be members of OutputDevice or of a class called GraphicsState that can be specialized.
  3. We pass the relevant information to the drawing methods.
  4. A combination of the three above.

Finally, there are operations that explicity save and restore the graphics state and there are operations that do this implicitly. This results in a stack of graphics states. Consequently, there may also be a property that reflects this stack while the current state is simply the top element of this stack.

Final words

As said, we will release iterations of the plug-in API in the coming weeks and months. To join the beta program, send a request to sales@tallcomponents.com.

Looking forward to your feedback!

Author: Frank Rem


Tuesday, December 4, 2012

TeamCity Security

We were busy setting up a TeamCity server and an agent in order to get some continuous integration going. The setup we wanted was:


  • Have the TeamCity server setup in the cloud. We are not all working in the same office, so we need to have general access to it.

  • Run the agents locally in one or more of our offices. Fast hardware is not that expensive, and machines in the cloud cost money too. And after setting up these local machines, there is basically no need to attend them further.

A colleague had already set up TeamCity on our server, and I was going to set up the first agent locally. I thought that it was going to be simple to do this securely, but unfortunately I hit a few bumps.


Service Account


The first bump was a minor one. During the installation, I was asked which account the agent was going to run in. I could either specify “system account” (the default), or some specific account. I did not want to use system account, because that would give the agent administrative rights. Specifying another account however, gave an authorization error.

The way to solve this in Windows is rather simple, but you have to know how to do it. Instead of specifying a specific account, you will have to tell the installer to use the system account and let it proceed. Then, after it is finished, you can go to the services control panel and change the logon account for the TeamCity Build Agent service. If you pass the right credentials, you will see a popup that mentions that the account has been given the right to run as a service.


Agent Connection


When I fired up the new agent, it did not get a connection to the server. I looked in the logs. It said that the connection was lost. I remembered that the installer asked for an agent port. I looked up the documentation:

http://confluence.jetbrains.net/display/TCD7/Setting+up+and+Running+Additional+Build+Agents

It says: “Server should be able to open HTTP connections to the agent”.

I could not find an option to turn this off. I can understand that it is sometimes useful to give the server a direct link to the agent. But in this case, I would have to open up a port in our office to a machine in our own local network. I expected that this was just an option, and that I could turn the connection to the agent off. After all, other web based app’s can also do their thing without opening connections to their clients. The agent would just have to poll for work. But alas, no option.

And then I read:  "Please note that by default, these connections are not secured and thus are exposing possibly sensitive data to any third party that can listen to the traffic between the server and the agents. Moreover, since the agent and server can send "commands" to each other an attacker that can send HTTP requests and capture responses can in theory trick agent into executing arbitrary command and perform other actions with a security impact."

The connection to the server can be configured to use https, but for the connection to the agent this option does not exist. All the settings that are configured in the TeamCity web GUI are passed over this link. This includes configured authentication information for the version control system.

In short: in order to do this securely, I would have to set up a VPN between the server and the agent.

This however would require changes at our remote server, and if not done properly it might disrupt it. And possibly I might have to configure our local router too, which could cause trouble on our local network. That was a bit too much for an initial setup.


Firewall


To be able to deal with this, I set up the agent firewall so that it only accepts connections for a single port from our TeamCity server. I also limited outgoing connections to one or two ports on our server. Our internal network is now invisible to the agent machine, and if it gets compromised, there is nowhere to go (assuming an attacker cannot intercept responses to our server).

Also, we avoided the standard TeamCity mechanism for checking out sources. Its server-side checkout might send sources to the agent out in the open, while for its agent-side checkout it might tell the world about some credentials used for our version control system.

Instead, we let the agent check out the sources over https via TortoiseSVN, using locally stored credentials. Sure, if the machine gets compromised over the agent link it may be possible to find them, but given the firewall rules it will be hard to get them out.

And finally, we made sure that the team city agent does not have write access to our version control system. A matter of damage control.


COMPROMISE


I know that this setup is not ideal, and it may well be that we go for a VPN solution after all. But a VPN raises questions too. Our TeamCity server is basically a “public” machine. If it gets compromised we will not be happy, but it will just be the server. If it were part of a VPN on the other hand, the situation might be worse: via the compromised server it may be possible to easily reach other machines within the VPN and compromise these too.

In any case, what amazes me the most is that a well-known system like TeamCity requires a connection to the agent. Without it, we could just have used https to the server and be done with it. No VPN setup, no special router configuration, no firewall restrictions (or at least less severe ones), and no restrictions on using TeamCity’s check-out mechanism.

It surely would have made things a lot simpler.

Author: Marco Kesseler

Thursday, November 15, 2012

WinRT edition of PDFRasterizer now 3-4 times faster


We have just released maintenance update 4.0.0.4 of PDFRasterizer.NET. The most significant changes are:
  • There is now an unsafe variant of the WinRT edition to speed things up. We measured 3-4 times faster rendering! We are still investigating implications for the Windows 8 App Store Certification process. Any feedback is welcome!
  • When drawing using the WinRT edition, you can now cancel draw jobs in progress. You typically do this when flipping away from a page before it has finished.

To join the beta program, just send a request!

Thursday, October 18, 2012

Rasterizing PDF on WinRT

Today we released a beta version of PDFRasterizer.NET 4.0 (4.0.0.3) that includes a WinRT edition. It took us a little bit longer than we hoped for. I want to explain why. In addition it also includes a Silverlight 4 edition. If you are interested, send a request to sales@tallcomponents.com and we will provide download instructions.

Known restriction of current beta (4.0.0.3)

This version does not support:
  • Soft masks. 
  • Non-embedded fonts. This is related to not having access to system fonts from WinRT.
  • Shadings other than gradient, radial, function based (shadings are slow)
  • IccProfiles. We revert to the alternate colorspace.
  • Special colors (such as CMYK) are not entirely correct. E.g. too bright or too pale.
  • Blendings 

So why did it take us so long?

The main problem with rasterizing PDF documents are the restrictions of the graphics API of WinRT itself. Amongst others, it only supports rectangular clipping paths. This alone is a showstopper.

We struggled with similar problems when using GDI+ or WPF as the graphics API. But their shortcomings always came down to not being able to render edge cases (exotic shadings and blend modes). But even then we could revert to a bitmap-based workaround. If a graphics API is able to render to bitmaps, one can implement extensions in terms of operations on these bitmaps. This is not an ideal solution (espcially w.r.t. printing), but it often works well for on-screen solutions.

This bitmap-based solution is unavailable when developing for WinRT. There is no equivalent for GDI+ code like this:

// Create bitmap
Bitmap bitmap = new Bitmap(500, 500, PixelFormat.Format32bppRgb);

// Create graphics object from bitmap
Graphics graphics = Graphics.FromImage(bitmap);

// Create pen
Pen blackPen = new Pen(Color.Black, 3);

// Create points that define line
Point point1 = new Point(100, 100);
Point point2 = new Point(500, 100);

// Draw line to bitmap
graphics.DrawLine(blackPen, point1, point2);

When developing for WinRT, you can create a WriteableBitmap which can only be manipulated by setting pixel values. There is no way to e.g. stroke or fill a path.

Because of the shortcomings of GDI+ and WPF, we had considered developing our own graphics engine before. WinRT gave us that last push.

On May 15, I sent off an internal e-mail to the development team announcing a new internal project. Here is the first part of this e-mail:
Until now we relied heavily on the graphics API as provided by Microsoft such as GDI+ and WPF for rendering PDF documents. Because of this we were not able to render all PDF features such as complex shadings and complex blend modes. With the introduction of new platforms such as WinRT and Silverlight, even less exotic graphical features have become unavailable. Because of this we have decided to develop an internal module (100% managed code) that frees us from the graphical capabilities that Microsoft decides to offer. Let's call this module TallBitmap.
The graphical feature set of TallBitmap should cover all graphical features of PDF such as:
- path construction and painting (PDF spec 4.4)
- fill rules (non-zero winding number, even-odd)
- clipping
- patterns (PDF spec 4.6). Including shading patterns.
- transpareny and blend modes (PDF spec Chapter 7)
- clipping
- dash patterns
- join style / cap style
- images (PDF spec 4.8). Note that the library does not have to include decoding. The library assumes that the images have been decoded.
 
For five months, two of our engineers spent 50% of their time implementing TallBitmap. They worked in sprints of five days and alternated this with support.

Today, this work paid off when we released a WinRT edition of our PDF render engine that is based on our internal module TallBitmap.

Frank Rem, CEO

Wednesday, October 17, 2012

WinRT edition of PDFRasterizer.NET available

We have just released a PDFRasterizer.NET 4.0 Beta that includes a WinRT edition. To join the beta group, send a request to sales@tallcomponents.com.

Monday, October 1, 2012

PDFWebViewer.NET Discontinued

As of today, we have discontinued PDFWebViewer.NET 1.0. We will not release version 2.0.

Here is why:

  1. PDFWebViewer.NET's sales was neglectable compared to flagship products such us PDFKit.NET and PDFRasterizer.NET while the support and maintenance effort was unproportionally high. 
  2. We strongly believe in separation of concerns when it comes to designing software components. PDFWebViewer.NET does not meet this design criterium.
The first reason is obvious. Let's elaborate on the second one. 

PDFWebViewer.NET is an ASP.NET control that provides a browser-side view on a server side PDF document. It implements features such as zooming, panning, interactive links, etc. 

On the browser side, we need to support different types of browsers and different versions of individual browsers. Nowadays this includes browsers on mobile devices in addition to browsers on desktops. But also, server-side we need to support different types of web application frameworks such as plain vanilla ASP.NET, MVC and what will be released next.  

In contrast to this dependency on browser flavors and web application frameworks, components such as PDFKit.NET and PDFRasterizer.NET have a fully UI-less API and thus are much less coupled to technologies that come and go. 

In order to not spread ourselves too thin, we need to focus on core functionality. That is why we decided to discontinue PDFWebViewer.NET 1.0 and not release its successor 2.0.


Existing Customers


Internally, PDFWebViewer.NET 1.0 and WebViewer.NET 2.0 are build on top of PDFKit.NET 4.0 and PDFRasterizer.NET 3.0. We have released the source code of both major versions on CodePlex under the Microsoft Public License. You will see that the downloads include the PDFKit.NET 4.0 and PDFRasterizer.NET 3.0 assemblies. Contact support to convert your PDFWebViewer.NET licenses to PDFKit.NET 4.0 and PDFRasterizer.NET 3.0 licenses at no cost.

Download source code: