Cross-tab data visualisation

Parallel Sets

Parallel Sets (ParSets) is a visualization application for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a cross-tabulation. ParSets provide a simple, interactive way to explore and analyze such data.


123 Main Street, Anytown, USA – now with pictures


This is the most interesting (to me, that is) free map application I’ve seen.  You’re looking at my neighborhood in Appleton in several ways, simultaneously.

I found it because the city of Appleton links to it from their "My Appleton" site:  which is sitting on one of the canonical ESRI GIS systems (which I know only because I saw upgrade costs in the city budget docs.)


I’m updating Cygnipede and have been spending time looking at how this sort of data can be used.

BlackBerry – smartest brick on the block

I woke up my BB Storm yesterday and it said there was an OS update available, and would I pretty please install it?  It’s the 5.0 upgrade, which does have some nifty new features where Cygnipede is concerned, so I let the upgrade proceed.

After the “upgrade”, the BB powered up and…what?  It’s just sitting there with a scary icon and the message “Reload Software : 507”.  Oh no.  Not again.  Last year I bricked Chris Smith personal phone doing an OS upgrade from Verizon.

Except…you can’t.  The Desktop Manager doesn’t recognize that a device is plugged in, which is how you load software.  Catch 22.

As it turns out, Googling for this message turned up a frighteningly large number of people with the same problem.  The general response from carriers is seemingly to replace the phone, but several hours of searching on some phone hacking boards turns up useful advice.  Download the old operating system from the carrier and run the installer, which would be useful advice, except Verizon only posts the current one.

So I download it and run the installer and it hangs.  Reboot and unplug all my USB devices might look like a phone.  Run the installer, and 15 minutes later, the BB is alive again.  Pretty cool, even though Verizon support didn’t know anything about this (in their online resources), but that’s par for the course.

It looks as though the installer has some way to interrogate USB ports directly and that the device knows when it has no OS and is smart enough to be able to load the OS binaries via USB.

Software tools from AT&T Labs Research

found these while looking for something to generate C++ dependency graphs.  Found some tools that use graphviz (below).


You are welcome to download and use the software tools appearing on this page that have been developed by AT&T Labs researchers.  Please reference the individual project web pages for specific license agreements. If an available license agreement does not meet your needs, please contact for assistance with a customized license.

In addition to the software tools available through Open Source and Non-Commercial licenses as listed on this page, AT&T has additional software and technology solutions available for licensing.

Open Source Licenses

AST: Advanced Software Technologies Open Source Collection

Cdt: Container Data Types Library

ECharts: A state machine-based programming language

ECharts is a state machine-based programming language for event-driven systems derived from the standardized UML Statecharts language. ECharts distinguishes itself from other Statecharts dialects by focusing on implementation issues such as determinism and code re-use. Like Statecharts, ECharts supports hierarchical state machines, concurrent machines and a graphical syntax. Unlike Statecharts, ECharts supports a simple textual syntax, machine reuse, multiple transition priority levels to minimize non-determinism, machine arrays, and a new approach to inter- and intra- machine communication. ECharts is a hosted language which means that it is dependent on an underlying programming language such as Java. ECharts has a proven track-record in a large-scale commercial deployment.

GGobi: Data visualization for high-dimensional data

GGobi is an open source visualization program for exploring high-dimensional data. It provides highly dynamic and interactive graphics such as tours, as well as familiar graphics such as the scatterplot, barchart and parallel coordinates plots. Plots are interactive and linked with brushing and identification.

GSDjVu/DjVuDigital: Ghostscript driver to convert PS and PDF files to DjVu files

gsdjvu contains the source code for a Ghostscript driver that enables to convert PostScript(tm) and Portable Document Format (PDF) electronic document files into DjVu files.

Graphviz: Tools for viewing and interacting with graph diagrams

Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. Automatic graph drawing has many important applications in software engineering, database and web design, networking, and in visual interfaces for many other domains. Graphviz is open source graph visualization software. It has several main graph layout programs.

PADS: Processing Arbitrary Data Streams

PADS is a system that simplifies processing ad hoc data sources. Its users can declaratively describe data sources and then use generated tools to understand, parse, translate, and format data.

Sfio: Portable library for performing I/O

UWIN: Unix on Windows 95 and NT Machines

Vcodex: Software package for data transformation

WSP: Web Scraping Proxy

Programmers often need to use information on Web pages as input to other programs. This is done by Web Scraping, writing a program to simulate a person viewing a Web site with a browser. It is often hard to write these programs because it is difficult to determine the Web requests necessary to do the simulation. The Web Scraping Proxy (WSP) solves this problem by monitoring the flow of information between the browser and the Web site and emitting Perl code fragments that can be used to write the Web Scraping program. A developer would use the WSP by browsing the site once with a browser that accesses the WSP as a proxy server. He then uses the emitted code as a template to build a Perl program that accesses the site.

Yoix: The Yoix Scripting Language and Interpreter

The Yoix scripting language is a general-purpose programming language that uses syntax and functions familiar to users of C and Java. It is not an object oriented language, but makes use of over 150 object types that provide access to most of the standard Java classes.

iPlots: Interactive graphics for data analysis in R

iPlots is a package for the R statistical environment which provides high interaction statistical graphics, written in Java. It offers a wide variety of plots, including histograms, barcharts, scatterplots, boxplots, fluctuation diagrams, parallel coordinates plots and spineplots. All plots support interactive features, such as querying, linked highlighting, color brushing, and interactive changing of parameters.

vmalloc: Region Memory Allocator

Non-Commercial Binary Licenses

BoosTexter: A general purpose machine-learning program

BoosTexter is a general purpose machine-learning program based on boosting for building a classifier from text and/or attribute-value data.

Hancock: A language for processing large-scale data

Hancock is a C-based domain-specific language designed to make it easy to read, write, and maintain programs that manipulate large amounts of relatively uniform data. Because Hancock is embedded in C, it inherits all the functionality of C. Valid C programs are also valid Hancock programs, and Hancock programs can use libraries written for C. But Hancock is more than C. In addition to C constructs, Hancock provides domain-specific forms to facilitate large-scale data processing.

Non-Commercial Source Licenses

Hancock: A language for processing large-scale data

You are in a maze of twisty little #include statements, all alike

While sussing out the IPv4->IPv6 in the current code base, I’ve run across the problem of not knowing enough about the entire product to be able to easily track the include dependencies.  I’ve been using a neat tool that tracks these down and graphs them directly in Visual Studio.  Probably one of the best $40 I’ve ever spent.

In the tool diagram, all the drawing entities are “live”.  Square boxes are user header files and diamonds are system includes.  You can hover the mouse to see the full path, open the file, etc.  If you hover over a line connected two files, it will display the file position (line number) where one includes the other.

The most general use of the tool during building is to see which other files will rebuild if one is touched.  There’s also a “build impact” line graph, which they describe thusly:

“The build impact is essentially an approximation of the cost of including a file, relative to the total cost of compiling a base source file. The estimation is based on token counts from the preprocessor, as a compiler-centric equivalent to the popular ‘lines of code’ metric. Its primary use is in discovering build bottlenecks and determining why some files take a long time to compile. “


CygNet NET.H analyses

Here’s a high-level view of the include graph for CygNetSourceSupportNetNet.h (click to zoom in)

Click to view at full size

Here’s a slightly more useful zoomed-in view (click to zoom in)

Click to view at full size