Thursday, January 28, 2016

Making 3D Parts in VCarve, a 2D Editor

In my quest to get certified to use the MultiCam CNC router table at the local Makerspace, I need to create some kind of part that requires use of at least a couple different types of end mills, plus do various cuts such as pockets, profiles, V-carves, engraving, and so on.

First, a bit about the Makerspace movement, in case you haven’t heard: Makerspaces (or Hackerspaces) are places set up for community good that allow people to share knowledge, resources, and tools to help further DIY activities such as programming, electronics, 3D printing, machining, or other activities where people make something.  Makerspaces come in various flavors: some are set up as startup incubators, others as for-profit centers where paid employees build things for people, and still others where members mostly come to work on something completely different from work.  The tools one would find at a makerspace were traditionally owned by companies or by individuals who have spent a long time honing their craft; as such, they are left for very few people to use either in someone’s garage or during working hours.  Through the power of crowdsourcing money from makerspace members through dues and/or fundraising drives, makerspaces can afford tools to share with the community (that is, anyone willing to become a member) and offer training around getting the most out of these tools, not to mention proper use and safety.  People who live in small apartments or are otherwise constrained for space or don’t have thousands of dollars for tools now have access to all sorts of tools that would be impractical for them to own outright.  This attracts talent and people with ideas who often form groups that can do much more than any one individual can on their own, though there are still lots of individual projects happening at makerspaces as well.

Tabling VCarve for the moment


Our MultiCam 3000 CNC router table, like most other CNC milling machines, 3D printers, and other such devices, requires information sent to it in the form of G-Code.  This flavor of code specifies things like feed rate, spindle speed, and tool position to the machine so it will mill out (or extrude plastic or etc.) in the location you want at the time it needs to be there, hopefully without breaking the tool from too much stress or damaging the machine.  The toolchain we use at the Makerspace to produce the required G-code for the table mill involves use of a program called VCarve.  It is a nice program that allows you to design your part and produce G-code to run on the machine to make the part.

VCarve is great for designing fairly simple projects.  This can take you a long way, because what is “simple” in the world of milling can often yield astonishingly detailed and fantastic results, usually by use of the actual operation called “V Carve” (which of course the program VCarve can help you do).  Even a pinball playfield could count as a simple part using this metric.  However, the part I want to make for my test is essentially a replica of the classic Nintendo game controller for the NES, which involves several contoured buttons.  Look closely at the controller, and you will see that the A and B buttons are slightly concave so as to cradle your fingertip nicely.  The D-pad has directional arrows imprinted into the plastic and a hemisphere carved out of the center, not to mention each direction bends slightly upward out of the center to give you nice tactile feedback for exactly where to press to move in that direction.  After trying hard to make these types of 3D cuts (which VCarve doesn’t support inherently) using ramps and other effects, I temporarily gave up on VCarve.

Ok, so what else is there to make the 3D shape?


Previous versions of VCarve prior to 8.0 don’t support any 3D objects whatsoever.  Luckily, my Makerspace has VCarve Pro V8 available for us to use.  With its license, I am able to upload one STL file or bitmap image for it to translate into G-code.  I created the contoured buttons using Blender in three simple steps:
  • Use negative Boolean operations to subtract a large sphere from a small cylinder to create the slight contour of the A & B buttons (then import this button into VCarve and add two instances of it)
  • Use transformations to slightly elevate desired edges of basic cubes to make the contoured D-pad shape
  • Use transformations on cubes to create arrows, and then negative Boolean operations to subtract the arrows from the D-pad shape
While on the topic of Blender, here are two other quick hints about it:
  • When doing a negative Boolean operation, the negative shape does not immediately disappear (unlike other 3D rendering environments I’ve worked with).  You have to move the negative shape out of the way in order to see the imprint it made into the positive shape.  Otherwise, you’ll think the negative Boolean operation is not working, attempt to upgrade Blender to the latest version, and find that doesn’t even help either.
  • When exporting to STL format, you need to have all the desired elements of your object selected while in Object Mode.  Otherwise, Blender won’t export anything in your STL file and VCarve will complain that your STL file is invalid because it contains nothing.

Bringing It All Back In


[EDIT 2/2/16] IMPORTANT NOTE: When I imported the STL files into VCarve and ran the MultiCam router, the router attempted to drill deeply into the material all at once and move the tool around so forcefully that, despite the table vacuum being run, the part still moved around the table quite a bit.  This made it very difficult to tell if what it tried to do actually worked.  However, since the hole was so deep, I believe it went way farther into the part than I told it to.  I will slow down the feed rate of the tool in 3D mode and see what happens, but for now, use caution when milling 3D parts.

Once you have made the STL files with Blender, import them into VCarve by going to “Model” -> “Import Component / 3D Model".  Remember that with certain versions, you might only be able to import one STL file or bitmap per part — I would get around this by making n parts of exactly the same dimensions with n different STL files, placing each imported 3D object in the desired location on the part by itself, generating n different G-code outputs, and then either merging the G-code files by hand or just having the machine run all n files sequentially, possibly without even changing the tool.  Anyway, once you import the STL file, VCarve will give you several options regarding how to treat it.  The most interesting ones to me are:

  • Model Size: Allows you to scale your model if you used the wrong units in Blender or want to make a last-minute adjustment.
  • Zero Plane Position in Model Top: This allows you to describe how far into the material the 3D model should be cut.  If your model needs to be a specific height that is pre-defined by the Z eight it was imported with, then adjust this parameter so that the bottom of the model touches the bottom of the material.  To line this up exactly, you could use the formula [model height - (material height / 2)] to calculate the “Depth Below Top” value if your model is at least half the height of the material.
With your 3D object in the right place with its desired specifications, you can now treat it like it’s another vector object.  In my case, I actually made all the other vectors describing the controller’s profile, pockets for the buttons, etc. prior to importing the STL, so that I can finely tune how VCarve makes the G-code for the cuts.  Two things you might want to do with your imported 3D object:

[EDIT 2/2/16] IMPORTANT NOTE: Beware, as noted above, that the results I expected were not the same as what the mill produced.  What ended up happening was slightly dangerous and could have resulted in a broken tool.  If you wish to try it out, please slow down the feed rate of the tool during 3D Finishing, since it tends to plunge all the way in rather than slowly descending in passes like most other cuts do.
  • To actually mill the shape you imported, you need to select the 3D object in the drawing tab and then select the “3D Finishing Toolpath" operation in the Toolbox at right.  [EDIT 2/2/16] Until I can research other guidance to give you, make sure you set the feed rate of this operation really slow for reasons described above.
  • You might want to profile this shape (i.e. cut material around it so it exists by itself) as well.  If so, select it, create a vector of its outline by going to the “Modeling” tab at left, selecting the desired instance of your 3D model, and clicking the “Create vector boundary from selected components” button under “3D Model Tools".  Then, with that vector selected, select the “Profile” operation in the Toolbox at right.  Finally, you can describe all the desired parameters of your Profile operation as usual.  Remember to do Profile last; otherwise, you would be cutting into a piece that is now mostly disconnected from other material and it could possibly come loose.

What kind of 3D models did you import into VCarve and then have milled?  Take a moment to show them off here!

Thursday, January 7, 2016

Interrupts for the Arduino Uno - More than you might think!

Are you looking to make a program that requires a bunch of interrupts using an ATmega328 or ATmega168 chip (such as on the Arduino Uno, Nano, or Mini platforms)?  If so, you may have been disappointed by the basic documentation you can find on this matter, and tempted to buy a more advanced Arduino such as the Mega2560, Zero, or even Due.  But have you seen how much their chips cost on Mouser?  If you're looking to do a small run of boards with what you ultimately produce, you will be taken aback to find that ATmega2560-16AU chips cost over 5x more than ATmega168A-AU chips!  In fact, I just recently bought an Arduino Mega2560 for less than what you can buy its chip for.  Having seen this, I knew there had to be a better way to leverage a cheaper chip.

The Problem, In Short


I need an application that can read from multiple sensor inputs, each of which will pulse with a high-frequency square wave (i.e. toggle on & off very rapidly) upon activation of the particular magnetic device they sit next to.

Given the sensors' behavior, and that there's no guarantee each sensor will produce more than one pulse if the underlying magnetic device does not stay on for a very long time, the best way to look for the changing state of the sensors is to use interrupts that can quickly capture exactly which pin just changed state, and then during the CPU's downtime (when it's not handling interrupts), it can go and take the appropriate action given which sensor(s) just pulsed.

Another problem to battle on the road to solving my problem!


By reading the standard documentation, you might be led to believe the Arduino Uno or any other ATmega328-based platform only has two useful interrupts, existing on digital pins 2 & 3.  Specifically, it says that the attachInterrupt() function only works on those two pins.  This, however, is misleading.  In fact, any of the ATmega's I/O pins can be used as interrupts -- it only really makes a difference if you need to use an external interrupt versus simply a pin change interrupt.

An external interrupt has the capability to fire upon a state transition, such as the rising edge or falling edge of a signal.  It can also be triggered upon the change of value of an input pin, and by the signal going to logic level low.  Since external interrupts on pins 2 & 3 of the ATmega328 have different interrupt vectors (addresses where the routines related to these interrupts are stored), such interrupts on these pins are distinguishable from each other, and also distinguishable from pin change interrupts you might be listening for on those pins as well.  External interrupts also have a higher priority compared to pin change interrupts, so they will be processed first.

pin change interrupt happens when the value of a pin suddenly changes state from low to high or vice versa.  The interrupt does not tell you exactly what the new value is (and the value is subject to change again by the time the interrupt can be processed), and as you will read below, pin change interrupts tend to share just a few vectors, so you might need a way to reconcile exactly who caused the interrupt.

Since my application only really needs to know about pin changes, especially since there's a small probability these sensors might get stuck High instead of being pulled back to Low upon the end of the magnetic pulse, I can leverage any and all of the input pins for my purpose.

Nota Bene: For my particular sensor, it drives High given one magnetic polarity, Low given the other polarity, and can become Undefined in the absence of the magnetic field.  I thought the sensor would pulse on its own each time with no further action from me, but the pulses did not actually appear until I used a pullup resistor on the input pin.  This is easily achieved in Arduino-land by supplanting INPUT with INPUT_PULLUP, as such:

pinMode(p, INPUT_PULLUP);

Once this was done, I was ready to move on. However, I faced yet another problem: the documentation would lead you to believe there are only three interrupt vectors you can use across all the pins. Here's what the Arduino Playground says about the topic:

  • ISR (PCINT0_vect) pin change interrupt for D8 to D13
  • ISR (PCINT1_vect) pin change interrupt for A0 to A5
  • ISR (PCINT2_vect) pin change interrupt for D0 to D7
Unfortunately, in my application, I need to read from at least four sensors.  What can I do?

The Shining Light


The Arduino documentation suggests using libraries, but links to a very scant piece of code with little documentation.  A quick Google search for this, though, yielded me a much more up-to-date and comprehensive solution: the EnableInterrupt library.  I suspected there would be an answer in here as to how to reconcile exactly which pin fired the interrupt, and sure enough, I wasn't disappointed.  It just looked a little bit different than I expected:

// These two statements must be written in this exact order!
#define EI_ARDUINO_INTERRUPTED_PIN
#include <EnableInterrupt.h>

// It's OK to use 0
// since the library doesn't really support interrupts on pin 0
// because of the implications for serial TX/RX
volatile uint8_t pinChangeIndex = 0;

void interruptFunction() {
  pinChangeIndex = arduinoInterruptedPin;
}

void setup() {
  // put this in a loop, perhaps, to initialize more than just pin "p"
  pinMode(p, INPUT_PULLUP);
  // ...
}

Simple, right?

It turns out that the ATmega has an internal register storing the index of exactly what pin toggled the interrupt, and this library exposes that for your consumption.  Now in your loop() function, all you need to do is branch off (i.e. write an if statement utilizing) the pinChangeIndex variable, and you don't have to process any of the application logic in the interrupt at all.  If you want to listen for multiple devices at once, it's possible to replace the uint8_t with a bool[] array and then replace the interrupt function's contents with pinChanged[arduinoInterruptedPin] = true

Incidentally, the volatile keyword in this context is used as a helper to the compiler so that it doesn't assume any code dealing with use of the pinChangeIndex variable is anywhere near where it might get changed.  This way, the variable's value will always be copied into one of the microcontroller's registers right before any comparison or operation is done on it, such as branching to particular spots depending on the variable's value.  The register will never contain an old copy of what the variable contained a long time ago.

May your project dreams be more attainable by unlocking cheaper microcontrollers to handle many I/O devices!

Thursday, December 10, 2015

Is It Broken? Try Leaving It On

Recently, I’ve acquired a bunch of vintage computer hardware from various sources, whether donated to me personally or stumbled upon during scavenger hunts through crazy places that used to be companies whose owners have pretty much turned into hoarders.  It’s been quite a tedious process getting some of these things working, but surprisingly, there have been very few instances lately where my skills with a soldering iron or my cache of loose parts has actually come in handy to fix something.  In fact, most things have come back to life surprisingly by simply plugging them in and giving them some time.

Cases 1 & 2: Some Commodore 64 Computers


Last week, a kind fellow who was moving granted me three Commodore 64 computers belonging to him and his brothers.  They grew up with these machines, and had a large collection of games and utilities on floppy disk.  There were also accessories such as floppy disk drives, joysticks, plenty of power supplies & A/V cables, printers, and some original documentation.  On the first night I had them, I tested all three.  One wouldn’t even load BASIC.  The other two loaded BASIC just fine, but the keyboards were all messed up and I couldn’t type anything.  I assumed they needed to be cleaned, but went on to do something else that night before going to bed.

In following up with these machines the next day, I realized I’d left one of them plugged in overnight.  I would imagine it’s wise not to leave something plugged in that you don’t necessarily trust to be in good shape, but since I was still alive and the house hadn’t burned down, I tried turning it on.  Lo and behold, the keyboard worked flawlessly!  However, the second machine still exhibited many problems with its keyboard.  One most amusing issue was upon pressing the special Commodore “C” key, the text on screen would quickly oscillate between uppercase and lowercase, thus it was apparently random what case the letters would be in once you let up on the key.  I had an inkling of how to treat this one, and so the experiment commenced.

Sure enough, after the second Commodore 64 (a 64c to be specific) had been plugged in for some time, its keyboard also started working perfectly.  Without much effort at all, I suddenly had two working Commodore machines.  (The third one must have a more substantial problem; it never came back to life.)

In the end, I kept one of them (a standard 64 with a switch to choose between regular Commodore BASIC & JiffyDOS), and gave the other two away to members of the local Vintage Computer Club (these machines were a regular 64 and the 64c) along with some of the accessories.  (It’s good karma to give stuff to the Computer Club for free, because perhaps they’ll consider you later if you ask them for something.)

Case 3: An Amiga 500


Last month or so, the Computer Club was offered an Amiga 500 and a number of accessories and disks by someone looking to part with it.  I was the first one to respond to the listing, and also the pickup location was fairly convenient to my office, so it fell into my hands quite easily.  Never having really seen an Amiga prior to that point, I spent a while musing over the hardware and physical characteristics, not to mention trying to find a proper 23-pin RGB cable, before attempting to do anything with it.  (I finally gave up on the cable and am simply using a composite cable for a B&W picture at the time of this writing.)

To do anything with a stock Amiga 500, the chip that stores the initial program (the Kickstart) insists that you insert the Amiga Workbench OS disk.  After finding such a disk in the donated stash, I inserted it and BOOM, the whole system just shuts off.  This became a repeatable & invariant event, so I tried several other floppy drives in the system to no avail.  I even started to pick out floppy drives with long Eject buttons so that I might be able to poke them with the case on (the Eject button can’t be reached by hand on a standard drive if you just randomly put one in the case since it sits pretty far back).  After attempting and failing to modify a PC’s floppy disk drive to work on the Amiga (you have to modify the outputs of Pin 2 & 34 on the bus), and finding that the long-button drives I picked actually had the buttons in the wrong spot (dremel time, then, perhaps?) I took a step back and put the original drive in the system.

Mind you, I had just also cleaned the system with a can of compressed air, plus replaced the electrolytic capacitors that weren’t already Nichicon brand, but I don’t think the recapping actually helped anything.  Honestly, it was probably the light cleaning I did — even still, absolutely nothing technical went into restoring the system back to working order.

Actually, that’s a bit of a lie: the previous owner had, a long time ago, replaced the stock processor with a Motorola 68000 and modified the clock speed to run at 14MHz.  Since some of the wires had become brittle over time and had fallen out of place, I got to hunt for documentation as to how this modification worked and how to rebuild it for myself.  It required poring over really old forums and message boards and downloading files whose formats are foreign to modern computers anymore, but eventually I got the information I needed.  It is now happily running Amiga Workbench 1.3.2.

Yay, at last, it loads Workbench!  Now if only I could Retr0bright this whole picture... :-P

On the Flip Side…


The one project of mine that’s actually gotten worse over time is the IBM 5150.  The keyboard port must need cleaning, since with any of the keyboards I have that are compatible, it only ever outputs gibberish anymore.  I also attempted to use its floppy disk drives for the first time in probably 20 years, after having purchased a floppy disk controller (and then found two more for free shortly thereafter).  The drives are in poor shape, despite having cleaned the heads with a Q-tip and 99% isopropyl alcohol.  It is likely that the belts need replacement too, and possible that it needs to be recalibrated (which should be a lot of “fun”…)  One of the drives had its closing mechanism fall apart because it couldn’t hang onto the little plastic rice-like “bolts” that keep the mechanism together.  I’ll probably need to cast or 3D-print new ones, since they must be worn down to the point where they no longer fit properly.

I also found a 5151 MDA monitor (Monochrome Display Adapter) and an MDA adapter card.  Unfortunately, either that monitor or the controller doesn’t work, because I don’t get a picture.  I suspect it’s the monitor because the computer would (should?) emit unusual beep codes if the card were faulty.


[Edit 12/12] As a follow-up, after leaving the IBM 5150 & 5151 plugged in for a couple days (the 5151 has a special power plug that plugs into the 5150 power supply), sure enough, now that monitor works too!  I fiddled with the brightness & contrast knobs after seeing a green trail of "ooze" after turning it off when attempting to test it.  You know (or maybe you've forgotten) the remnants of the video signal left on the monitor in the brief flash of time when the electron gun finds its resting position and powers down.  That little green flash indicated it was doing something, and so after playing with the knobs, I now have a working green-screen monitor.


Yes it works now, but that doesn't fix the burn-in... :-(  But hey, can't really complain when it cost $0!

Thursday, November 19, 2015

Enough to be Dangerous: Open a different browser during a Protractor test

Those of you looking to test AngularJS apps may have particular use cases where multiple instances of the page need to be opened to simulate multiple instances of an application running.  Say you have a chat client, and you wish to simulate multiple users on different instances of the application.  Or, perhaps you want to run two separate windows so that one represents a user interacting with a service and the other represents an admin panel watching over the user.  No matter what your use case is, Protractor makes this easy.  Protractor is an end-to-end testing framework for Angular applications that integrates with the Selenium WebDriver for powerful browser automation and ties in tightly with Angular internals for very powerful testing possibilities.


A Simple Case: More of the Same


Current versions of Protractor as of this writing easily support the ability to add more browser instances of the type you defined in your configuration file's capabilities section.  Recall that your capabilities section might look like this (it's OK if you choose to use multiCapabilities instead):


// protractor-conf.js

exports.config = {
    ...
    capabilities: {
        'browserName': 'chrome'
    }
}

The interesting part exists in your spec (test code) file:

var browserCount = NUM_OF_BROWSERS_YOU_WANT_TO_SPAWN;
var browsers = [];
describe("a suite of tests", function() {
    it("should open multiple browser instances", function() {
        for (var i = 0; i < browserCount; i++) {
            // Make the new browser instance<
            browsers[i] = browser.forkNewDriverInstance();
            var newBrowser = browsers[i];
            newBrowser.get('http://www.example.com');
            // Make sure it is separate from the original browser
            expect(newBrowser).not.toEqual(browser);
            expect(newBrowser.driver).not.toEqual(browser.driver);
            // Go on manipulating the controls on your new instance
            newBrowser.element(by.id('firstThingToClick')).click();
            ...
        }
    })
})

The important part here is the use of the forkNewDriverInstance() function on browser.  This spawns the new browser instance.  Now, in case you want to properly close all your spawned browser instances, run this either at the end of your it() test case, or in your afterAll() or afterEach() function:


var closedCount;
var done = function() {
    // some kind of done routine, if you need it
};


for (b in browsers) {
    if (browsers[b]) {
        // if browsers[b] is actually a browser instance, close it
        console.log("Attempting to close browser #" + b);
        browsers[b].quit().then(function() {
            closedCount += 1;
            if (closedCount == browserCount)
                done();
        });
    } else {
        // in the event that you already wiped out your browsers after the 1st test case and haven't quite written subsequent test cases to reinitialize them, this'll save you from getting ugly errors
        // Just increment the closedCount so the test can end if we've reached the total of browserCount
        closedCount += 1;
        if (closedCount == browserCount)
            done();
    }
}

Pretty darn simple and elegant, right?


More Browsers, More Fun!


Of course, running on just one browser is less interesting.  Your users and perhaps your admins are going to run your application on whatever their favorite browser is, and you'd better be prepared.  However, the Protractor documentation doesn't seem to detail this anywhere.  To handle all possible interactions between browsers, you can actually initialize another Runner object in the middle of your test.  This Runner object will initialize a new driverProvider for the browser type you specify in the capabilities of the configuration you give to Runner.  I did quite a bit of poring over the Protractor code in order to figure out how to do this, but the solution ends up being even shorter:


it("Should open both Chrome & Firefox", function() {
    // browser is defined by the framework, so go ahead and use it
    browser.get('http://www.foo.com');
    // Find where runner.js is relative to your test case
    var Runner = require('./path/to/node_modules/protractor/lib/runner');
    var myConfig = {
        allScriptsTimeout: 30000,
        getPageTimeout: 30000,
        capabilities: { browserName: 'firefox', count: 1 } }
    var ffRunner = new Runner(myConfig);
    var ffBrowser = ffRunner.createBrowser();
    ffBrowser.get("http://www.bar.com"); 
    // Do various operations on your page to run the test case you want
    // Pretend like this is the last operation in your test; the important part is the .then()
 
   ffBrowser.sleep(SLEEP_TIME).then(function() {
        ffBrowser.quit().then(function() {
            ffRunner.shutdown_();
        });
    });
});

A couple things to point out about this code:

  1. 1. The contents of myConfig are the minimum contents I found in order to get the test case working and passing.  Now you could remove getPageTimeout, but I wouldn't recommend it in case the page under test goes down.
  2. Without putting ffRunner.shutdown_() inside all those Promises (.then()), it will run out of sync with the browser operations and actually shut down the browser before anything in your test happens.  If you put ffBrowser.quit() and ffRunner.shutdown_() in series, the framework will freak out that the browser was not uninitialized properly.  Might as well keep your output clean.
  3. The shutdown_() routine seems to me to be private, based on the underscore.  I feel bad calling it, but it's not exposed anywhere else, and it seems to work, so whatever.  Have a piece of cake or a bag of junk food if you feel so bad about using it. :-P

Now, of course, you could probably put new Runner(myConfig) in beforeAll() and ffRunner.shutdown_() in afterAll(), but for convenience sake, I put everything in the it() test case.

Enjoy running multiple browser instances in your Protractor tests!

Thursday, October 22, 2015

Hacking TrueType Fonts For Character Information

Those of you who have ever been curious about making your own font should know that doing so on the computer isn't easy.  Sure there are several good programs out there that can help you take your design and digitize it, but a well-made font has been crafted with much care and attention to detail by a computer scientist just as much as a designer.  Some considerations that need to be made on the technical side include, for instance, how to "hint" rendering at very large or small sizes, accounting for grayscale devices in such hinting, making characters by compositing glyphs to save on file size (e.g. fi = f + i), and dealing with different platforms and character encodings across different computer systems so the font can be portable across Windows, Mac, and others.

Now, think back to one of my long-time projects that relates to displaying text and images.  Yes, BriteBlox can certainly be capable of displaying messages set with TrueType fonts, and this has been supported in the development version for quite some time.  However, to make it scale well for any message at all, it is important to know what the width of each character is.  As such, the efforts described here were undertaken for the sake of improving BriteBlox.


Why?



The simplest way to render TTFs in Python is to use PIL (the Python Imaging Library).  With this, you can establish an Image object and then instruct PIL to render text with the desired typeface onto the image.  However, you need to know in advance what the width of each character is so you can make the correct-sized Image object before rendering text onto it only to discover that either it's too short and text is chopped, or you're out of memory.  In the BriteBlox PC Tools, this feature was disabled in releases for such a long time because I would manually have to guess and check the correct size for the bounding box for my text.  Soon, that will no longer be required!


The High-Level Solution



[Important note] There may be, in fact, a better solution for those of you using Qt, an application framework.  Unfortunately, my implementation of the Qt 5 libraries in PyQt5 seg-faults (or tries to access a null pointer) when I try to run the appropriate commands, so I will have to write about that in the future once I upgrade Qt and hopefully get it working.

Along with PIL (or Pillow) in Python, you can use the fonttools and ttfquery libraries (which depend on numpy) in order to fetch the width of a particular character glyph.  (The glyph is the artistic rendering of the character; the character is more of just a concept in the realm of typography.)  To get the required width (and height) for the container image, begin by using this code:



from ttfquery import describe, glyphquery
myfont = describe.openFont("C:\\Windows\\Fonts\\arial.ttf")
glyphquery.charHeight(myfont)

glyphquery.width(myfont, 'W')


Now you have the width of a character from your TTF file.  If you actually run this, though, you may notice the values seem really odd -- in fact, very large.  This is because the values being retrieved (I'll tell you exactly where these come from later) are scaled to "font units" or "EM units", which relate to the "em square".  Remember your em-dash and en-dashes from English class?  Well, turns out they're incredibly important in typography too.  The EM units are derived from the "EM square", which is a square the size of the em-dash.  Back when fonts were cast into metal stamps and then pressed into paper, the em-dash was typically the widest character you could have.  In digital media, though, characters are allowed to be wider than the em-dash, so you have to look at each character specifically to find out how wide each one is.  Nothing can be taken for granted.



EM units are simply little divisions of the EM square such that now the EM square is divided up into a grid.  There are several acceptable values for how many units exist along one single side -- in fact, any value (or power of two?) from 16 to 16384 is acceptable.  The typical "resolution" of the EM square, as defined by the "unitsPerEm" field in the TTF specification, is 2,048 units per side of the square.  However, again, this value cannot be taken for granted; I will explain ways to fetch it later.  Once you have the correct unitsPerEm value, put it into the following equations:


pixel_size = point_size * resolution / 72
pixel_coordinate = grid_coordinate * pixel_size / EM_size

[Source: http://chanae.walon.org/pub/ttf/ttf_glyphs.htm]

Remember that fonts are generally measured in points rather than pixels, a tradition that dates back to at least the 1700s.  Nowadays, a point is defined as 1/72 inch, thus the ratio of point_size / 72 in the first equation.  Now, you need to get rid of the "inch" in the unit by multiplying by some unit that is 1/inch (remember dimensional analysis from chemistry or physics?).  The perfect unit for this happens to be pixels per inch, which is defined differently on different computing platforms.  For instance, Microsoft typically defines an inch as 96 pixels in Windows, thus as monitors are made with ever-higher resolution, the distance on the monitor representing a physical inch gets noticeably smaller.  Now, if you consider the right edge of your glyph to be the grid coordinate of interest, you can finish off the equation.  Let's see how this would work for the capital letter "W" at size 12 point:

>>> glyphquery.width(myfont, 'W')
1933
>>> 1933.0 * 12 * (96.0/72.0) / 2048
15.1015625
And now at 24 point:
>>> 1933.0 * 24 * (96.0/72.0) / 2048
30.203125

IMPORTANT NOTE: To avoid rounding error, you must make special amendments to get Python to treat your numbers as floating-point values rather than integer values.  You can do this by simply adding ".0" to the end of an integer value, and the answer will automatically be "promoted" to the most detailed data type.  If I were to leave the first equation alone and simply write 1933 * 12 * (96/72) / 2048, I would get the answer 11 which is definitely wrong, as my empirical observation of the character "W" indicates that it needs at least 13 pixels of width at 12 point size, even with anti-aliasing turned off.

Finding the EM Size Of Your Font


To get the correct value for unitsPerEm (a.k.a. EM_size in the equation), there are some nice tools you can go search for.  Readytext.co.uk offers some nice suggestions, including SIL ViewGlyph for Windows.  Simply open the font file, go to View -> Statistics, then look for "units per Em".

If you have a hex editor handy, open your font file in the hex editor.  Toward the beginning of the file, look for the four characters "head" in plain ASCII (0x68 0x65 0x61 0x64).  Skip four bytes after this (the checksum of the table), and you will get to the table's starting address as indicated by the hex values (e.g. my version of Arial indicates the HEAD table offset is 0x00 0x00 0x01 0x8C, thus 0x18C).  Navigate to (this position in the file + 18 more bytes), and the next two bytes (representing an unsigned short integer, from 0 to 65535) are your unitsPerEm value.  Remember this value is typically 2048, or 0x800.

Trust Me, This Is Correct


I spent long enough simply trying to find out in the spec where this magical "EM_size" parameter could be found.  After spending days poring over the Apple TrueType Reference Manual and Microsoft TrueType documentation (warning: .DOC file), it finally became apparent.  This was just an exercise in being comprehensive, though, as Arial obviously had a unitsPerEm value of 2048.

Because I originally didn't know that Microsoft used a standard of 96 PPI rather than 72 PPI, my initial calculations in the formulas above always seemed wrong (too small).  I set out to find out another way to this data, so I read the TTF spec as well as some supporting documentation (including this page and the source of the equations listed above), and set out to find the bounding boxes (bbox) for each glyph, as defined by the xMin, yMin, xMax, and yMax values for each glyph in the GLYF table.  This proved to be unsatisfactory because they don't really tell you how to parse the GLYF table.
  • The raw data seems to just launch right into the 1st glyph without any nice header info as to what glyph(s) belongs to what character, or how many bytes define each glyph in advance.
  • The data I gleaned for the first glyph (which I don't even know what it is) seemed out of whack, with a total height of slightly over the EM size and a total width of almost 3 times the EM size!
I was leery of those results, and decided to take another route.  The "OS/2" table (its header is literally thus in the font file data) contains properties such as sTypoAscender, sTypoDescender, and sTypoLineGap.  Despite that OS/2 is used by Microsoft devices only, the values it contains should be platform-agnostic.  However, comparing my Arial font file to the documentation I had, something seemed fishy.  Maybe its OS/2 table is older and doesn't contain as much information, but because these three fields are so far down the table, I didn't want to take any chances with having counted incorrectly or misreading one of the data types.  I soon abandoned this idea too.

Yet another idea was to go to the CMAP table, which contains the mappings of characters to glyph indexes.  (I would have to sit and parse this table to figure out what the very first glyph is in GLYF, and there's no need for me to work backwards like that now.)  This table contains at least one sub-table (Arial has, in fact, three sub-tables here), so there is quite a lot of header data you need to go through before you get to the good stuff.  However, you still need to go through it carefully, otherwise you will be misled into meaningless data.  For Microsoft devices, you should look for the sub-table with the Platform ID of 3 and the Platform Encoding ID of 1.  After finding the byte offset to this table (which is relative to the start of CMAP, not just 0), I had to solve some equations in order to find what character (as defined by ASCII or compatible Unicode codes) mapped to which glyph.

I'm not going to go into the math here since it's described in the documentation, but I found out that in Arial, most printable characters we normally care about (specifically, those with ASCII codes between 0x20 and 0xFE) all exist sequentially and contiguously with glyph IDs ranging from 3 to 0x61.  The letters I cared about testing, the extreme-width cases of "W" and "i", happen to have glyph indices of 0x3A and 0x4C respectively, according to the algorithm.

With this information, it's time to scour the HMTX table for horizontal metrics.  The first thing in this table is an array of values pertaining to the advance width and left-side boundary of each glyph.  These values take two bytes apiece, thus from the beginning of the HMTX table, the offset to the glyph you care about is (glyph index * 4).  With the table at offset 0x268, the path to the letter W leads me down (0x3A * 4 = 0xE8) more bytes, to a total offset of 0x350.  Here, I quickly learn the advance width for the letter W is:

1933

That's exactly what the Python program said with ttfquery & fonttools!

By this time, I had (only just, by sheer coincidence, auspicious timing, serendipity, or whatever you want to call such good fortune) discovered that Microsoft scales its PPI to 96 rather than the 72 I had originally expected.  After trying (and failing) to see if there was a particular DPI used with image objects generated by PIL, I simply stuck (96.0/72.0) into the equation and confirmed visually that the values seen here in the HMTX table are in fact the values you can use to calculate the width of a TrueType font on a Microsoft Windows system.

It remains to be seen how this'll perform on Macs.  I anticipate the PPI will need to be something different; perhaps it will in fact be 72 on that platform.  We'll see...



An Aside


In researching the equation of fi = f + i, I stumbled across the notion of ligatures.  "" is in fact a ligature, which was designed so that parts of the "f" and the "i" that run together would look coherent.  This brought me back to a time when I was very young and concerned with Evenflo products -- I am not a parent at this time, thus I was indeed a child last time I dealt with them.  They had a very odd and poorly-designed "fi" ligature on their trademarked logo that led me to believe it was some kind of weird-looking "A".  It confused me, since it seemed odd anyone would name their product "EvenAo", as it's awkward to say, and wondered what special significance that A had to be written so much differently and more fancifully than the other letters.  Just to jog your memory, here it is:

The Evenflo logo from when I was little

In my Google search, it seems apparent that they have adopted a new logo anyway, ditching an awkward ligature for something with nicer aesthetics overall and a modern vibe.  However, then another logo struck my fancy, especially with what turned up next to it:

Oh, how titillating.

Obviously having seen all these baby products, not to mention the mother with child, led me to believe the Tous Designer House logo was being quite suggestive.  As it turns out, the Tous logo is in fact a teddy bear.  Google, stop offering such awkward juxtapositions!

Thursday, October 15, 2015

Observing OCR Technologies for PDF Parsing

I’ve gotten the opportunity to investigate some Java-based OCR technologies recently for the purpose of analyzing PDFs, and wanted to write about some aspects of them that aren’t very well-documented.  I hope to incorporate this into these tools' documentation at some point, but for now, here it is... in loooong prose.

TightOCR


Couldn’t get this one working at all.  Was hoping to run it on Python, but it tends to claim certain functions for parsing JPGs, TIFFs, and PNGs do not exist when obviously Tesseract on the command line knows how to handle these types of files adroitly.  It also has a dependency on CTesseract which seems not to be updated for the revised Tesseract APIs (function headers with more arguments) as updated in Tesseract version 3.03, so you have to install Tesseract 3.02 to work with CTesseract.

Tess4J


This was a real hassle to install on my Mac.  I first started by trying to compile everything from scratch and use GCC, but faced a number of weird compilation problems.  Here was the (backwards) dependency chart:

  • libtool
    • Leptonica
      • Tesseract
      • Ghostscript
        • Tess4J


Once I installed home-brew (brew) and set it up to install libtool, I was able to successfully compile the other libraries.  Then, Tess4J still required some dependencies in Java which weren’t easily resolved.  What did the trick is when I switched to using a Maven project and simply used that to install Tess4J by adding this to my pom.xml file:

<dependency>
  <groupId>net.sourceforge.tess4j</groupId>
  <artifactId>tess4j</artifactId>
  <version>2.0.1</version>
</dependency>

After simply allowing Maven to configure Tess4J, I was faced with configuring the location of Tess4J’s dependencies (various .dylib files on the Mac).  Since GhostScript & Tesseract ended up installing themselves in two different locations, preventing me from simply using a command-line variable (thanks to Eclipse not properly splitting on ; or : in the path used in  -Djava.library.path), I set up an environment variable on the VM called LD_LIBRARY_PATH, and set it to /opt/local/lib:/usr/local/Cellar/ghostscript/9.16/lib — the value I was hoping to put on the “command line” when running Java.

Once I reached this stage, it was time to utilize it to read from PDFs.  The results were very Tesseract-y (i.e. L’s tend to become |_), but luckily, it seemed to do a fairly good job overall.  However, it couldn’t read any data contained inside tables, which renders it relatively useless if you’re trying to parse data from, say, tax returns or product datasheets.  At first, I was thinking of finding a way to expose image-cropping tools from Leptonica to Java.  There is a nice solution for this in the Tess4J API, though, that’ll allow you to crop a PDF down to the specific area you care about:

File imageFile = new File("/path/to/my.pdf");
Tesseract instance = Tesseract.getInstance();
instance.doOCR(imageFile, rectangle);

Of course, one thing that’s not mentioned in the documentation at all about this bounding rectangle (yet is very important) is what units you actually need to specify in order to make this rectangle.  Want to know the Tess4J bounding box rectangle units?  They're in DPI.  As such, if you want a 2”x2" rectangle starting from (1”, 1”) down from the top left, and if your PDF is 300dpi, you would define your rectangle as follows:

instance.doOCR(imageFile, new Rectangle(300, 300, 600, 600));

Note that the rectangle is defined as (X distance from left, Y distance from top, width (to the right), height (downward)), all in "dpi-dots" (i.e. 300 "dpi-dots" per inch with a document of 300dpi).

Overall, once the installation headaches were solved, it works pretty nicely, and does exactly as expected when reading from fields.  However, reading from fields is Tesseract-y, slow in comparison, and fetches exactly what you ask for that happens to be within the rectangle — meaning that it may crop letters and symbols falling out of bounds.

Another interesting note is how some facets of this library appear to be aging: the argument taken by the Tesseract object’s doOCR() function is a File (java.io), which has been superseded by Files (java.nio.file) in Java 7.  This also seems to hold true for their slightly different Tesseract1 object.

iTextPDF


This is an extremely simple library to install if you have a Maven project.  All you need to do is add the following dependency:

<dependency>
   <groupId>com.itextpdf</groupId>
   <artifactId>itextpdf</artifactId>
   <version>5.0.6</version>
</dependency>

Then add these imports:

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.*;

It is fairly simple to read an entire document.  The Java code is a touch more complex to set up for reading from a particular user-defined rectangle, though:

PdfReader reader = new PdfReader("/path/to/my.pdf");
RenderFilter filter = new RegionTextRenderFilter(rectangle);
TextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
String output = PdfTextExtractor.getTextFromPage(reader, pageNum, strategy);

Nevertheless, it works flawlessly once you get it.  However, finding the correct specification for the bounding rectangle was a bit tricky on this because, of course, the units iText prefers have nothing to do with the ones Tess4J uses.  Also, like with Tess4J, the units to use in the rectangle are not specified in the documentation.  It's as if we're expected to read the minds of the original developers.  Through experimentation (which was made difficult because it returns all text from any object contained within the rectangle, rather than strictly the text within the rectangle), it was found that iText doesn’t want DPI-dots, but points (of which there are always 72 points per inch).  Also, the Y-origin is set at the bottom of each page, which is actually the standard for PDF files (rather than from the top, which is how Tess4J counts).

Also, as mentioned earlier, iText pulls all text contained within any object whose bounds overlap the rectangle you specify, rather than simply the text within the rectangle.  I imagine this is because they’re actually reading the data from the PDF and pulling text directly from the objects rather than doing OCR.  As such, I haven’t seen any errors in the results from iText (e.g. no “L” -> “|_”), and it runs much faster than Tess4J.

To specify the bounding box for the same area as above (1” from the top left corner, and 2” each side), now we must assume you have a page that’s 11” tall (US Letter size, Portrait orientation).  In that case, you would use:

...
RenderFilter filter = new RegionTextRenderFilter(new Rectangle(72, 576, 144, 144));
...

As these arguments go, 72 sets your X distance as 1 inch away from the left edge, 576 sets your Y distance as 8 inches up from the bottom edge, 144 is the width going to the right of X, and 144 is the height of the rectangle going up from Y.


Hopefully you find this useful in your quest to extract data from PDFs.  May your data-scraping activities go much smoother!