It only takes a little bit of time working in the geospatial industry before you hear the name, Paul Ramsey. His work on PostGIS and helping to cultivate an amazing community around the open source geospatial technology is humbling. Beyond that though, a simple scan of his blog posts and you’ll see that he thinks deeply about not only open source, but how it does and should affect many aspects of business and government. He can move from a conversation about personally identifiable information to a conversation about the procurement process of local governments with impressive ease.
We have been longtime fans of Paul and finally today, get to announce that he is joining us here at CartoDB. To share this moment with you, we thought we would present a special blog post… 5 Questions with Paul Ramsey.
PostGIS 2.2 is still a work in progress, but among the currently completed features, the most interesting are the raster overview features that allow you to build overviews in the database (previously you had to build them externally in GDAL). I also quite like the aggregate function for raster summary statistics (ST_SummaryStatsAgg), because it really cleans up the SQL needed to pull states from a raster collection.
I was excited to learn that CartoDB is actually using Foreign Data Wrappers (FDW) in production, since I just recently learned something about how they work, so I’m looking forward to get my hands dirty there, teaching the PostgreSQL native FDW wrapper how to speak spatial. I also have a big performance improvement to ST_Distance/ST_DWithin calculations that has been in my private tree for almost two years which I would love to unleash on the world (perhaps for PostGIS 2.2).
People who can take clean data and make a map or visualization are a dime a dozen. People who can clean up dirty data and make it tractable in the first place, those ones are really valuable. Knowing some kind of scripting language that can access files, database, and web services and apply regular expressions is the baseline to being able to work with data. Everything else is icing on the cake.
To my mind the key skills remain in scripting: you can explore a data manipulation process visually for a while in an interface, but after a while you need to automate it, if only so you can efficiently handle updates to the source data. During my vacation I did a big data analysis project using PostgreSQL and R. I ended up with a couple files of SQL and R commands, and could run the whole analysis from start to finish with one command. This was really useful when it turned out that some of the effects I was seeing at the end of the analysis in R were actually driven by mistakes I made early in the process during the spatial SQL steps of merging the data. A few small changes to the SQL, and 5 minutes of CPU time later, I was back in business.
PostgreSQL remains an incredible (and underappreciated) data integration environment, and PostGIS is the geospatial part of that, but still only one part. The sheer variety of data transformation and analysis that can be done in the database alone is amazing, and because PostgreSQL is so extensible the amount of options keeps growing. My little extension side projects, like pgsql-http, and pgsql-ogr-fdw are about adding to the reach of PostgreSQL as an integration point. So are language bindings like PL/R and PL/V8.
We got the general elephant theme from PostgreSQL, which in turn seems to have gotten it from a suggestion on a mailing list many years ago, noting that “an elephant never forgets”. The community PostGIS elephant – the friendly one balancing a globe – was drawn by the wife of one of the initial developers (he wrote the first cut of the Shape file loader) shortly after our first release and we’ve kept it ever since.
A huge welcome to Paul from both our New York and Madrid offices today. We are all excited to get this new collaboration started.
Simpler data workflows for Snowflake usersNews
Please fill out the below form and we'll be in touch real soon.