Creating Vector Polygons with rendering performance like GISCloud?

I have seen this technique used in the past. It was explained to me by Zain Memon (from Trulia) who helped giving some input when Michal Migurski was creating TileStache. Zain went over it while explaining his Trulia demo that uses this technique at one of our older SF GeoMeetup meetings some time back. In fact, if you are in SF next week (this is my lame attempt at a plug, he will touch on this, so feel free to show up :)

OK, now to the explanation.

First, you are looking slightly in the wrong place when looking at the json files above.

Let me explain (as short as I can), why.

The tiles are being passed just as regular rendered tiles, no big deal there, we know how to do that and so I don't need to explain that.

If you inspect it in Firebug, you will see that you also get a whole bunch of images that seem to be blank, like this one.

Why is it blank? It is not. The pixels contain data - just not traditional visible image data. They are using a very clever technique to pass data encoded in the pixels themselves.

What has been going in the past decade, is that people have been trading off readability and portability data of formats at the expense of storage efficiency.

Take this example of xml sample data:

<data>

  <feature>
    <point>
      <x> -32.1231 </x>
      <y> 10.31243 </y>
    </point>
    <type> 
      sold
    </type>
   </feature>

  <feature>
    <point>
      <x> -33.1231 </x>
      <y> 11.31243 </y>
    </point>
    <type> 
      available
    </type>
   </feature>

</data>

OK, how many bites to transfer this? Provided that we are utf8 (1 byte per character when dealing with this content). Well, we have around 176 chars (without counting tabs or spaces) which makes this 176 bytes (this is being optimistic for various reasons that I will omit for the sake of simplicity). Mind you, this is for 2 points!

Still, some smart ass that doesn't understand what he is talking about, somewhere, will claim that "json gives you higher compression".

Fine, let's put the same xml nonsense as json:

{ "data": [
            "feature" : { "x" : -32.1231, "y" : 10.31243 , "type": "sold" },
            "feature" : { "x" : -33.1231, "y" :11.31243, "type": "avail" },
          ]
}

How many bytes here? Say ~115 characters. I even cheated a bit and made it smaller.

Say that my area covers 256x256 pixels and that I am at a zoom level so high that each feature renders as one pixel and I have so many features, that it is full. How much data do I need to show that 65,536 features?

54 characters (or utf bytes - and I am even ignoring some other things) per "feature" entry multiplied x 65,536 = 3,538,944 or about 3.3MB

I think you get the picture.

But this is how we transport data in a service oriented architecture. Readable bloated crap.

What if I wanted to transport everything in a binary scheme that I invented myself? Say that instead, I encoded that information in single band image (i.e black and white). And I decided that 0 means sold, and 1 means available, and 2 means I do not know. Heck, in a 1 byte, I have 256 options that I can use - and I am only using 2 or three of them for this example.

What is the storage cost of that? 256x256x 1 (one band only). 65,536 bytes or 0.06MB. And this doesn't even take in consideration other compression techniques that I get for free from several decades of research in image compression.

At this point, you should be asking yourself why do people not simply send data encoded in binary format instead of serializing to json? Well first, turns out, javascript sucks big time for transporting binary data, so that is why people have not done this historically.

An awesome work around has been used by some people when the new features of HTML5 came out, particularly canvas. So what is this awesome work-around? Turns out, you can send data over the wire encoded on what appears to be an image, then you can shove that image into an HTML5 Canvas, which allows you to manipulate the pixels directly! Now you have a way to grab that data, decode it on the client side, and generate the json objects in the client.

Stop a moment and think about this.

You have a way of encoding a huge amount of meaningful geo-referenced data in a highly compressed format, orders of magnitude smaller than anything else done traditionally in web applications, and manipulate them in javascript.

The HTML canvas doesn't even need to be used to draw, it is only used as a binary decoding mechanism!

That is what all those images that you see in Firebug are about. One image, with the data encoded for every single tile that gets downloaded. They are super small, but they have meaningful data.

So how do you encode these in the server side? Well you do need to generalize the data in the server side, and create a meaningful tile for every zoom level that has the data encoded. Currently, to do this, you have to roll your own - an out of the box open source solution doesn't exist, but you have all tools you need to do this available. PostGIS will do the generalization through GEOS, TileCache can be used to cache and help you trigger the generation of the tiles. On the client side, you will need to use HTML5 Canvas to pass on the special "fake tiles" and then you can use OpenLayers to create real client-side javascript objects that represent the vectors with mouse-over effects.

If you need to encode more data, remember that you can always generate RGBA images per pixel (which gives you 4 bytes per pixel or 4,294,967,296 numbers you can represent per pixel). I can think of several ways to use that :)

Update: Answering the QGIS question below.

QGIS like most other Desktop GISes, do not have a fixed set of zoom levels. They have the flexibility of zooming at any scale and just render. Can they show data from WMS or tiles based sources? Sure they can, but most of the time they are really dumb about it: Zoom to a different extent, calculate the bounding box, calculate the required tiled, grab them, show them. Most of the time they ignore other things, like http header caches that would make it so they did not have to refetch. Sometimes they implement a simple cache mechanism (store the tile, if you ask for it, check for the tile, don't ask for it). But this is not enough.

With this technique the tiles and the vectors need to be refetched at every zoom level. Why? Because the vectors have been generalized to accomodate zoom levels.

As far as the whole trick of putting the tiles to an HTML5 canvas so you can access the buffers, that whole thing is not necessary. QGIS allows you to write code in Python and C++, both languages have excellent support for handling binary buffers, so this work around is really irrelevant for this platform.

*UPDATE 2**:

There was a question about how to create the generalized vector tiles in the first place (baby step 1 before being able to serialize the results into images). Perhaps I did not clarify enough. Tilestache will allow you create effective "vector tiles" of your data at every zoom level (it even has an option that allows you to either clip or not clip the data when it passes the tile boundary). This takes care of separating the vectors into tiles at various zoom levels. I would choose the "not clip" option (but it will pick an arbitrary tile where it covers more area). Then you can feed every vector through GEOS generalize option with a big number, in fact, you want it big enough that polylines and polygons collapse onto themselves, because if they do, you can remove them from the zoom level since for that stage they are irrelevant. Tilestache even allows you to write easy pythonic data providers where you can put this logic. At that stage, you can choose to serve them as json files (like they do with some of the african map samples) or as serialized geometries into the pngs, like they do in other samples (or the Trulia one) I gave above.


Direct from the developer Dino Ravnic on a recent mailing list post:

It's not a big secret how we did it so I would be happy to share that with you..the key is in two things:

  1. removing from a tile all vectors which are to small to be visible i.e. their area when calculated into pixels is less than 1px. so we drop such a vector and instead of it place a pixel hence there is "pixels" property in our json tile

  2. vectors which will be actually visible are being generalized and then written into a tile with their coordinates in pixels

On the client part we render on canvas those static pixels and visible vectors. On top of vectors we implemented mouse event handling to achieve hovering i.e. interactivity. and that's it.

Our backend map engine does all the heavy-lifting because we don't use any precaching and all tiles are being generated on the-fly. it's very important to us to have a map that can be quickly refreshed.

So it sounds like the client side is the easy part. It's impressive that the data is rendered without any caching.

He also mentions a hosting service which may be of interest to you. You may want to weigh the cost of trying to recreate this with the cost of using a ready made service.


As I described on the OSGeo list the key is in delivering data as vector JSON tiles that have pixels for subpixel geometry and generalized geometry for those features that will be actually visible on a certain level. Performance is great because this technique eliminates all unnecessary vector information and leave only those vectors that will actually have a visual impact on the map. Pixels are there to fill the gaps and be placed instead of those subpixel vectors. That is it regarding the tile format.

On the backend side is the true heavy-lifting. We are not using TileStache or any other map engine since we wrote our own that can, with a number of optimizations, produce such vector graphics in real-time.

First we started with delivering map tiles as SWFs and lately we just enabled output to JSON so we could use HTML5 Canvas to render the graphics. You can find below a benchmark comparing this kind of vector technology with raster technology (mapnik). For fair comparison only look for results in CGI mode.

http://www.giscloud.com/blog/realtime-map-tile-rendering-benchmark-rasters-vs-vectors/

We are planning to provide this technology as a map tile hosting service. The idea is to host your geo data on the cloud and through HTML5 deliver it into any map client at high speed, without any need to precache the tiles. If you are interested to join this beta feel free to contact us here: http://www.giscloud.com/contact/