PGCon 2009The PostgreSQL ConferenceUniversity of OttawaOttawa2009-05-192009-05-235Final Release09:0000:3009:0003:00DMS 1110controlPostgreSQL Access Controls (AuthN, AuthZ, Perms)Controlling Access to your database- Roles; Kerberos, LDAP integration; Database, Schema, Table, and Column-level(!) permissionsotherenPostgreSQL offers many options for controlling access, from authentication and log in to the role system and finally the hierarchy of authorization to specific resources. System Administrators and Database Administrators need to understand these complexities to ensure their system is both robust and secure. With 8.4 there have been many changes to existing options (Kerberos mapping support, finally!) as well as whole new capabilities (column-level privileges, TRUNCATE privilege!). We will go through all of the authentication options that PostgreSQL offers, followed by how to integrate with existing single-sign-on and other infrastructure (Kerberos, LDAP), then walk through setting up roles following best practices and privilege separation, and finally go through the privilege system from database-level down to column-level.An introduction and thorough review of access control in PostgreSQL. All access control will be covered, but special attention will be paid to the new features in 8.4.
PostgreSQL offers many options in pg_hba.conf for how users can be authenticated to the database, including the ability to be integrated with existing authentication systems. We will go over the basics and then go through how to integrate PostgreSQL into an existing Kerberos implementation (including MIT Kerberos, Microsoft Active Directory, both, and the pitfalls) and how to use an LDAP directory for authentication. We will also cover changes to the options in postgresql.conf and pg_hba.conf which are in 8.4, including the new ability to map Kerberos principals to PostgreSQL users and cross-realm support.
PostgreSQL also has a robust Roles system for user and group management. We will go through some of the complexities involved in using roles (and how you can get yourself into trouble with them) as well as best practices for administrators and application developers. Stored Procedures using Security Definer will also be covered, what they are, and what to be careful of when using them. Also in this talk, we will cover the distinction between the actions an Owner can perform and those which can be granted to others, and how you can have multiple Owners of a single object.
PostgreSQL has recently gone through some changes to the privilege/GRANT system, especially with 8.4. The whole gambit will be covered in this discussion, including database, schema, table, and the new-in-8.4 column-level privileges. We'll go over the default permissions, the privileges which can be GRANT'd and REVOKE'd (including the new TRUNCATE permission and why it's a seperate and distinct permission), and the implications and side-effects of certain commands. We will also discuss what can, and can't, be done with column-level permissions including what happens in JOINs and partial INSERTs.Stephen Frost13:0003:00DMS 1110whackamolePerformance Whack-a-Mole IIThe Moles Are Back, and Bigger Than Everotheren"The database is slow!"
How often you've heard that plaintive cry from your co-workers, customers or client. In this tutorial, Josh Berkus will explain his simple methodology for resolving database application performance problems. He'll also demonstrate some of the OS and PostgreSQL tools available for troubleshooting performance problems and show how to use them."The database is slow!"
How often you've heard that plaintive cry from your co-workers, customers or client. In this tutorial, Josh Berkus will explain his simple methodology for resolving database application performance problems. He'll also demonstrate some of the OS and PostgreSQL tools available for troubleshooting performance problems ("Moles") and show how to use them.
Topics covered will include:
1. The Principles of Mole-hunting.
2. Baselining
3. How to start looking for Moles
4. Tools for Mole-hunting
a. Operating system tools
b. PostgreSQL tools
5. Examples of mole-hunting
In this second edition of the talk, Josh will also cover new tools available with PostgreSQL versions 8.3 and 8.4, including CSV logging, pg_stat_statements and auto_explain, as well as improved Linux tools for performance analysis.
Josh Berkus09:0003:00DMS 1110SkyTools: Queues (PgQ)otherenAn introduction into PgQ and SkyTools - efficient queue implementation on top of
PostgreSQL and the tools to manage it. Description of PgQ, the different kinds
of consumers, queue applications and their usage scenarios and how to tie it all
together with SkyTools.SkyTools is a batch processing framework, initially developed at Skype to
simplify the management of distributed databases. It's most prominent
features are PgQ queuing and londiste replication system. SkyTools has
been released to open source since 2007, and is now approaching release 3.0.
This presentation makes an introduction to PgQ - producing and consuming
events, different consumer types, batch processing. Solving common problems
with pre-packaged applications - replication, archiving and data loading.
Writing custom queue applications with SkyTools framework. Other programming
interfaces - SQL, Java, PHP.
We will also discuss upcoming features of SkyTools 3: distributed queues,
cooperative consuming and cascaded consumers.
Marko KreenMartin Pihlak
SkyTools PgFoundry project
SkyTools wiki, documentation and tutorials
13:0003:00DMS 1110hazWriting a Procedural LanguageI CAN HAZ P.L.Z PLZ?otherenPostgreSQL allows development in many different programming languages, to suit the tastes of each aspiring hacker. Learn how to write a PostgreSQL procedural language, using one of the latest, and certainly the least, of these languages, PL/LOLCODE.This talk covers how to write a procedural language for PostgreSQL, using PL/LOLCODE to provide examples. Not only will attendees be exposed to the workings of the procedural language system, they will also see how to make C code work within the PostgreSQL backend, how to access system catalogs and use the SPI interface, how Bison and Flex combine to create parser, and how to work with PostgreSQL's internal data types -- all topics of interest to those hacking within PostgreSQL, whether writing a new procedural language or not.
Although PL/LOLCODE's value as a programming language for production use is questionable at best, its simplicity makes it an excellent didactic tool. Attendees can see how to function in the PostgreSQL backend without being bogged down in syntactic difficulty or strange special cases.Joshua Tolley15:0003:00Royal OakregistrationRegistration pickupThe social way to register: at the pubotherenPick up your registration packStop by the Royal Oak Pub and get your registration pack. You'll help us avoid long line ups on Friday morning and you get to have a beer, and chat with your fellow attendees. We guarantee you'll spot someone famous.Dan Langille10:0001:00DMS 1110No More Waiting A Guide To PostgreSQL 8.4enAnother year, another PostgreSQL release, and once again this release packed full of new features that will help you whether you develop apps inside or outside the database, or you just need to keep them running. This talk will give an overview of the new features available in 8.4, and give you pointers to talks during the rest of the conference you'll want to focus on to get the most out of 8.4. full description pending stable tree, but we'll have something for sure. Robert Treat
audio
11:3001:00DMS 1110amazonrainforestSaving the Amazon with PostGISSpatial Manipulation in the Rain ForestlectureenSpatial databases are different from other databases through its requirement to store spatial data features, capable of describing its location on space and representation. Its highlighted in the projects the utilization of PostgreSQL with PostGIS. During this lecture it will be discussed the use of the aforementioned tools in the management of spatial database, and the application of spatial analysis techniques on data stored in environmental database. Spatial Database differs from regular databases for its necessity to store spatial data features, capable of describing its location and representation. Generally spatial databases are created seeking to support application of Geographic Information Systems - GIS.
For some time the closed source solution to database management systems – DBMS, dominated the implementation projects of spatial databases, but currently the utilization of the powerful combo: PostgreSQL and its spatial extension PostGIS appears as a excellent option. However the creation of a spatial database and later manipulation of stored data through GISI, goes beyond the choosing of a DBMS. Some of this factors will be discussed during the lecture.
Spatial data analysis it’s the process to search for patterns and associations in maps that help to characterize, understand and predict spatial phenomena. With PostGis it’s possible to apply new spatial analysis techniques directly on the stored data. This possibility can make the systems more flexible and create new ways to use spatial data.
This lecture will be shown the use of PostgreSQL, with PostGIS, to management and manipulation of spatial data in the context of applications used to monitor and protect the environment. It will be presented practical examples of the utilization of this tools concerning the case environmental surveillance of Brazilian Amazon.
Luis Bueno
audio
13:3001:00DMS 1110Database Hardware BenchmarkingHow big is the elephant in your room?lectureenPostgreSQL DBAs and hardware vendors seem like they're speaking different languages some days. Doing your own low-level hardware benchmarking can provide the secret decoder ring needed to translate all the way from database transactions to disk commits.You're a DBA, used to talking in terms of query runtime and transactions per second. Your server vendor babbles to you about cores and GHz. Your storage vendor, RPM and IOPS, and don't forget that all your problems would go away if only you had a SAN, right? Meanwhile, your boss wants an ROI on a server upgrade, and you're not even sure what the bottleneck on your existing one is. This is where benchmarking should help out, but who can you trust?
Nobody but yourself, that's who. It's about time you learned how to translate MIPS and IOPS into terms that make sense to you, so that you can convert between the language of your application and that of the hardware world. "Database Hardware Benchmarking" is all about how to measure the system you've got, compare against the system you might have, and confirm your shiny new system does what it's promised to do. All using easy to replicate hardware benchmarks, so that if you don't get what you expect, you can easily show your vendors what's wrong. Forget about SPEC and the TPC; it's simple tools like memtest86, dd, and bonnie++ that give you a real handle on what your hardware is doing, and you can tie that to PostgreSQL performance using generate_series and pgbench.
Greg Smith
audio
14:3001:00DMS 1110OpenStreetMap with PostgreSQLThe free wiki world map & PostgreSQLlectureenThe OpenStreetMap project is trying to build a free (as in beer) editable electronic map of the world. Data is collected by ordinary people running around with GPS units, tracing imagery and by integrating existing datasets. This talk will an introduction to OpenStreetMap and an overview of how the OpenStreetMap data can be used with PostgreSQL/PostGIS.
The OpensStreetMap project has over 100 thousand registered users who have mapped over 2 million road segments, 24 thousand pubs and at least 2 Elephants. The talk will start out explaining the OpenStreetMap project, the types of data in OpenStreetMap and an overview of how it is stored.
OpenStreetMap is believes that users should be able to apply any tag to any piece of data. The system and schema enforce very little structure on user data instead depending on community consensus for consistent tagging. OSM gives provides 3 basic data types, nodes, ways and relations and can be associated with each other. Any set of key/value tags can be added to instances of any of these types.
Steve Singer
Open Streetmap Wiki
The Map
audio
16:0001:00DMS 1110Spatial Analysis with PostGISlectureenIn this lecture we shall describe some use cases of PostGIS -- a brief overview of how it is used in various ways such as for location aware applications, property management, and environmental risk assessment. PostGIS is a very popular spatial database extender for PostgreSQL and the preferred database backend for Open Source GIS-based applications. It is well supported by most Open Source Geospatial products as well as many commercial products.
In this lecture we shall provide a survey of how it is used in academia, research and commercial applications for analyzing collected data, geo-coding collected data and creating derivative geo-datasets.
We shall end with doing some common case spatial queries -- extruding, fixing malformed geometries, morphing of geometries by spatial characteristics as well as attribute characteristics and some simple geo-coding exercises.
We shall demonstrate the use of OpenJump as a complementary geospatial desktop analysis tool for viewing PostGIS results from ad-hoc queries. In our OpenJump exercise we will do some statistical analysis queries and demonstrate how to view them and generate thematic maps from them using this visualization tool. We will also demonstrate exporting geospatial data for consumption by other tools. We will also demonstrate how the new functionality in PostgreSQL 8.4 complements and extends the power of PostGIS.
Leo Hsu and Regina Obe
Boston GIS site
PostGIS
OpenJump Desktop GIS
Postgres OnLine Journal
audio
17:0001:00DMS 1110SkyTools3: Replication (Londiste3)Overview of Londiste and what's new in version 3lectureenReplication tool in SkyTools package code named Londiste. General overview of Londiste. Whats new in version 3. Londiste in general
-- usability
-- setup
-- reference
New features in version 3
-- cascaded replicationPriit Kustala
skytools 3 info
18:0003:00DMS 1110ssdSolid State DrivesA BOFworkshopenSolid State Drives are the biggest change in storage technology since the hard drive was invented in the 1950s.While enterprise-class SSDs offer immediate improved performance, the database can help it perform even better. Come along and discuss ways to help solid-state storage perform at its best!Matthew Wilcox10:0001:00DMS 1150Some recent advances in full-text searchlectureenWe will present some recent improvements in full-text search, we developed during 8.4 development stage and after. The main goal of our talk is to explain some innovations in full-text search and present several advanced techniques from real-life applications - some of them are a new operator for full-text query, phrase search, prefix search and
new framework for lexeme processing for better test search management. Also, we'll explain how to write new dictionaries and present several useful dictionaries.
We'll show how to solve some typical problems, for example, how to correctly work
with diacritical signs (accents), which are common to european languages, how to scale full-text search. New B-tree emulation feature of GIN allows effective usage of multicolumn GIN indexes introduced in 8.4 and faster full-text search.Oleg BartunovTeodor Sigaev
audio
11:3001:00DMS 1150Visualizing PostgreslectureenPostgres provides a variety of metrics that are available via SQL queries. Much of this useful data is transient. By tracking these metrics over time, database, relation, and index trends can be easily visualized using simple graphs and charts, which in turn can be used to assess current server performance, assist in tuning, diagnose performance issues, and anticipate potential future issues. An application for such tracking and visualization will be presented.
Postgres database, relation, and index statistics are constantly updated by the statistics collector, providing useful data though Postgres system views and functions. Additional information is updated by the ANALYZE process. By collecting this data over time, database administrators can better quantify the behavior of their database installation. In OLTP environments that have loads which vary with time, this data can be particularly useful in noting correlations between overall system performance (such as CPU idle and IOWait) and specific database objects (e.g., heap hits, index utilization, and locks on particular relations).
The raw collected data can be unwieldy on its own given its volume and time-related nature. By taking advantage of a simple web framework and Javascript functionality, one can more easily view the data and quickly see relationships between the various database objects and system performance.
A simple example of a data-collecting and viewing application will be presented along with explanations of the various statistics gathered by Postgres and how they can be used to identify database server performance issues.
Michael Glaesemann
posuta-0.1 initial public release
audio
13:3001:00DMS 1150Post Facto, version control for PostgreSQLlectureenPost Facto is an open-source version control system for PostgreSQL databases. Unlike traditional VC tools, which operate on filesystems, Post Facto operates directly on databases, easing the management of multiple schema versions.I. Intro
I'll begin with a review of schema-versioning methods I'm aware of, including where and why they fall short. Postfacto's interface has some similarities with Subversion, and I'll draw some comparisons to help guide people into the topic.
II. Demonstration
Here, I'll do a basic demo: create a db (or at least show its DDL), place it under version control, make some changes, commit, checkout a second copy.
III. Basic Architecture
1. SQL code as VC deltas. I'll open up the repository that postfacto created for the demo, show the SQL it generated, and explain how that fits in the overall architecture with lots of oo-la-la diagrams.
2. Repository design. This will cover the nuts and bolts of the repository database, plus the "working copy" DB and scratch DB's used to diff, merge, etc.
3. Commands and Workflow. Here, I'll look over the complete command-line set and explain any outliers not covered in the above demo and discussion.
IV. Q & A
Robert Brewer
website
audio
14:3001:00DMS 1150pgtapUnit Test Your Database!Use a flexible database unit testing framework to assure database integrity and functionalitylectureenGiven that the database, as the canonical repository of data, is the most important part of many applications, why is it that we don't write database unit tests? This talk promotes the practice of implementing tests to directly test the schema, storage, and functionality of databases.We're all used to unit testing our applications by now. The Extreme and Agile programming movements have done a great deal to promote unit testing, to the extent that many of us are now dependent on tests to assure that our applications work reliably. But how often do we test the database underlying our applications? Given that the database, as the repository for all of the knowledge and data for an application, just might be the single most important part of that application, the time for standardized database unit testing has come.
This talk promotes the practice of writing and running unit tests that directly test the schema, storage, and functionality of application databases. Following a review of the available PostgreSQL unit testing frameworks, we'll review examples of testing tables, views, columns, constraints, indexes, triggers, and functions. The idea is to promote complete test coverage every aspect of a database, independent of application unit tests, to ensure reliably canonical data integrity.David E. Wheeler
pgTAP
PGUnit
Epic
Presentation on SlideShare.
audio
16:0001:00DMS 1150Introducing Windowing FunctionsInternals and ExternalslectureenPostgreSQL 8.4 has much of the functionality of SQL:2008 windowing functions.
In this presentation, you'll learn what windowing functions are for, how they
work, both from the end-user and the RDBMS implementation perspective, and
plans for future enhancements. You will also learn how to create
user-defined windowing functions in C.Before windowing functions, SQL had only rudimentary support for ordering and
lists, which made it complicated to do operations referring to other rows in
the result set. In OLAP contexts, this was a severe limitation, which
windowing functions address. You will get a more thorough grounding in the
theoretical background of the problems and of how windowing solves them.
Next, the Magic Inside: the internal design of PostgreSQL's window functions.
To implement windowing functions, it was necessary to make changes through
most of PostgreSQL, starting from the parser, on through the analyzer,
optimizer, planner, and executor. You will get a broad overview of how these
parts work in the windowing context.
Third, you'll learn how to use the built-in windowing functions, present and
future, with their keywords and syntax in light of the SQL:2008 standard and
the behavior of other RDBMSs, including things PostgreSQL does not yet
support.
No presentation would be complete without a demonstration of PostgreSQL's
extensibility, and you'll see how to implement your very own windowing
function in C, touching on the APIs you'll use to do so.
David FetterHitoshi Harada
audio
17:0001:00DMS 1150Power psqlAll about everyone's favorite utilitylectureenCovering everything there is to know about psql, the best database command-line utility in existence. Learn tips
and tricks to make your sessions more productive and make your coworkers envious.
psql is an extremely powerful and versatile tool that allows you talk to a Postgres database. We'll take a quick tour of its
features, and explore some corners and features you might not have known about. We'll also take a quick look at its history
and, more importantly, its future.Selena Deckelmann
audio
18:0003:00DMS 1150unitPostgreSQL Unit TestingA BOFworkshopenCome gather with your fellow testing fanatics to share the
latest tips and tricks for testing your database.
David E. Wheeler09:3000:30DMS 1160Opening SessionWelcomeotherenWelcome to PGCon 2009Be there early for the prizes!Dan Langille
audio
10:0001:00DMS 1160asterBuilding PetaByte Warehouses with Unmodified PostgreSQLlectureenThe data warehousing startup Aster Data will explain how they have built a petabyte-scale data warehousing platform on top of regular open source PostgreSQL.The data warehousing startup Aster Data will explain how they have built a petabyte-scale data warehousing platform on top of regular open source PostgreSQL.Emmanuel Cecchet11:3001:00DMS 1160Encrypted PostgreSQLlectureenNeed to store or access your PostgreSQL data with extra confidentiality? This talk will look at the options available for data and transport encryption in PostgreSQL.This talk will go into how to design and deploy encrypted solutions using PostgreSQL. It will cover setting up and using transport encryption options (using SSL/TLS or outside solutions), as well as well as the advanced cryptography functions available inside the database itself (using the pgcrypto module). We'll both look at the theoretical side and specific examples of how and when to use the different methods.Magnus Hagander13:3001:00DMS 1160pg_similarityFunctions and Operators for Executing Similarity QuerieslectureenSimilarity query is a fundamental operation in many application areas, such as data integration and cleaning, bioinformatics, and pattern recognition. pg_similarity is a tool that makes available user-friendly methods such as functions and operators for similarity queries. More than a dozen of functions are currently available.There has been considerable interest in similarity queries in the research community recently. Similarity query is a fundamental operation in many
application areas, such as data integration and cleaning, bioinformatics, and pattern recognition. pg_similarity is a tool that makes available user-friendly methods for similarity queries.
pg_similarity is a set of functions and operators for matching similar strings. The following functions are available: Block Distance, Cosine, Dice, Euclidean, Hamming, Jaccard, Jaro, Jaro-Winkler, Monge-Elkan, Needleman-Wunsch, q-Gram, Smith-Waterman, Smith-Waterman-Gotoh, and Soundex. A set of auxiliary functions are available too. They allows a flexible control over the similarity thresholds, tokenizer, and normalization of each function.
It will be released as BSD licensed at pgfoundry soon. The not-yet-released code could be downloaded from http://www.inf.ufrgs.br/~etoliveira/pg_similarity/Euler Taveira de Oliveira
source code
http://pgfoundry.org/projects/pgsimilarity/
14:3001:00DMS 1160PostgreSQL 8.4Beta Vs PostgreSQL 8.3A view from the Performance worldlectureenPostgreSQL 8.4beta will be released on February 2009. We do a quick analysis of how PostgreSQL 8.4B compares to the released 8.3.x product using various workloads and other opensource productsWith release of PostgreSQL 8.4beta, early adopters will be interested in data points on how PostgreSQL 8.4 compares to PostgreSQL 8.3. In this presentation we will compare 8.4B Vs PostgreSQL 8.3.x on same system using various workloads and other open source software
* pgbench
* sysbench
* iGen
* OpenBravo ERP
etc
Jignesh K. Shah16:0001:00DMS 1160The Lives of OthersOpen-Source Development Practices ElsewherelectureenA survey of current and emerging development practices in open-source projects, and ideas that the PostgreSQL project could pick up.The importance of actively managing and continuously reevaluating the development process has become increasingly apparent, particularly over the
last year. Interestingly, the PostgreSQL project is not alone in that trend. It can therefore be insightful to examine what kinds of process development other open-source projects have gone through, and what lessons have been learned.
I have examined various aspects that are involved in running an open-source project, such as release management practices and methodologies, choice of
development tools, team building, and legal aspects, in a range of other open-source projects and contrasted them with PostgreSQL. This session can
provide new ideas for how to organize and run the PostgreSQL project and also teach about various tools and software engineering practices.Peter Eisentraut
audio
17:0001:00DMS 1160upgradeUpgrade MeleeA free-for-all on PostgreSQL's biggest remaining problemmeetingenBy version 8.5, most of the major issues affecting PostgreSQL usability and adoption will be fixed. Except upgrading. Join our round-table of PostgreSQL hackers to discuss the project's plans to make upgrading PostgreSQL less painful.By version 8.5, most of the major issues affecting PostgreSQL usability and adoption will be fixed. Except upgrading. Join our round-table of PostgreSQL hackers to discuss the project's plans to make upgrading PostgreSQL less painful.
Panelists include:
Tom Lane, lead PostgreSQL hacker
Bruce Momjian, current lead for pg_migrator
Zdenek Kotala, author of pg_upgradeBruce MomjianJosh BerkusTom LaneZdeněk Kotala
audio
19:0003:30The Velvet RoomdinnerPGCon 2009 dinnersponsored by EnterpriseDBotherenThe PGCon 2009 dinner.EnterpriseDB are sponsoring this dinner for PGCon 2009 attendees.Dan Langille11:3001:00DMS 1110vacuumVACUUM StrategyAutovacuum, FSM, Visibility Map lectureenVACUUM is a an important topic for new and seasoned users of Postgres. This talk will focus on changes in Postgres from version 8.0 on, how to bast take advantage of configuration parameters related to VACUUM, autovacuum, the updated Free Space Map in 8.4, and the brand new Visibility Map. VACUUM is a an important topic for new and seasoned users of Postgres. This talk will focus on changes in Postgres from version 8.0 on, how to bast take advantage of configuration parameters related to VACUUM, autovacuum, the updated Free Space Map in 8.4, and the brand new Visibility Map.
This talk will cover some internals, but will focus on practical effects of changing parameters, and how to best tune VACUUM related parameters in Postgres for better performance. Selena Deckelmann
audio
13:3001:00DMS 1110PostgreSQL and Temporal DatalectureenPostgreSQL has had support for storage of temporal data, including a "time travel" capability, since even before it was called PostgreSQL, and PostgreSQL offers a fairly good set of operators to work on temporal data, at least when comparing with SQL standards.
This talk describes several of the ways that PostgreSQL may be used to record temporal data, along with advantages and disadvantages.
This talk will discuss:
- What sorts of functionality "base" PostgreSQL offers;
- What temporal functionality is added by the PGTemporal project (e.g. - the "period" type, comprising (from/to))
- Additional functionality that would be desirable (e.g. - temporally-aware foreign keys)
- Other useful representations of temporality, and their usage:
* Pointers to transactions
* Serial Number temporalityChris Browne
PGTemporal
audio
15:0001:00DMS 1110The design, architecture, and tradeoffs of FluidDBThe database with the heart of a wiki.lectureenFluidDB is a hosted database that Fluidinfo (http://fluidinfo.com) will
launch in alpha early this year. In this talk I will describe: the aspects
of FluidDB that make it novel, the reasoning behind this approach to
working with data, and the architecture of FluidDB. The system is
currently deployed on top of Amazon EC2 and S3, and we are using PostgreSQL
as a key component in the architecture.Heart of a wiki
---------------
We call FluidDB "the database with the heart of a wiki" because it is
writable by any application. That's to say that when an application (or
user) encounters an object in FluidDB, they can always add information to
it. No need to ask permission, no need for anyone to anticipate needs or to
decide in advance how information should be structured, organized, or
combined. There are no schema, and there is no distinction between metadata
and data. Unlike a wiki though: there are permissions on object attributes
to prevent damage existing content, and there is an extremely simple query
language - making new information directly searchable.
Wikis changed how people can create content online. FluidDB, I hope, will
do a similar thing for applications and the people who use them. It has the
wiki advantage of general writability and encouraging information sharing
and augmentation, but without the main problems of wikis.
Why is this interesting?
------------------------
Having a database that's fully writable in this way changes several
important things:
- It gives programmers the flexibility to easily change applications in
unpredictable ways after they have been deployed.
- It means that a successful application (e.g., Twitter) does not need to
write a specialized restrictive API for 3rd party apps - they use the
FluidDB API just like the original app.
- It means 3rd party apps can store their data together with the data of
the original app (imagine being able to add ratings to Twitter tweets,
and then search based on rating and tweet content).
- It allows us to build a family of apps that can share data. In this way
it makes it simple to build mashups that add information to the world and
then put it somewhere useful (i.e., onto the original objects), enabling
more mashups instead of making further hoops for subsequent programmers.
But wait, there's more!
-----------------------
Allowing anyone or any application to add information to objects also has a
big impact on control and organization of information.
- Control: UIs and APIs provide people and programs with access to
information. But OTOH they also limit what we can do. FluidDB changes
this because it allows search on anything - including whatever attributes
a normal user or app has put into the system. So you might do a search
for Amazon books that are out in paperback, and which have been read by
Fred, and which have been mentioned in Slashdot, and which Sally has
looked at on Amazon but not bought. That kind of search is impossible
today because we do not have an underlying information architecture that
lets us put all that disparate information in the one place and search on
it. In FluidDB no-one can stop anyone from adding whatever they like to
objects, or searching as they please, so this kind of thing is easy.
- Organization: Programmers have traditionally used data structures and
pointers to organize information. While very fast at runtime, that
approach is very rigid. People, especially different people, want to
organize things differently, or in multiple ways, and on-the-fly. Because
FluidDB allows you to essentially tag anything you want, you can use
unique tags and search instead of data structures fields and pointer
following. For example: want to know what's in a folder? Search for
objects tagged as belonging to the folder. You can build all data
structures in this way, and these organizations can co-exist
simultaneously without interfering with one another.
Terry Jones
Company site
Founder blog
Some thoughts on representation
Founder personal site
audio
10:0001:00DMS 1150Reconciling and comparing databasesUsing schemas, Slony, DBI-link, pgTAP and other tools to compare different databaseslectureenThe Millburn Corporation is a hedge fund which uses complex data-driven trading models based on the daily prices of various commodities, currencies and other inputs. As part of our application development process, we use independent stage and test instances of our production database to let us have a smoother and less mistake-prone deployment of new models and price streams. In this talk, I will discuss our setup, which makes heavy use of schemas, Slony, DBI-link, pgTAP and custom deployment tools in order to compare and reconcile data across databases.Most of our applications make use of various input tables (e.g., prices, market_hours) to write to output tables (e.g., signals). Using Slony, we replicate our production database to all the other nodes. Stage and test have output tables in independent schemas. Then by changing search paths, it's easy to compare the differences between our production database and each node.
For comparisons between stage and test, which cannot be compared in the same database, we use DBI-Link. I will show how using ROW() comparison and recordset casting allowed us to build a custom data deployment tool that allows the creation, deployment and reversal of changesets from one database instance to another.
I will also discuss more complex instances of data modeling-- where, for example, only certain prices in a large dataset should be changed. In these cases, we make use of inherited tables with non-overlapping sequences and custom accessor functions to access different price streams in different schemas.
Lastly, I will also discuss how we deploy schema changes, function changes and trigger changes, which are more complex and cannot be compared through DBI-link. This will include a discussion of our pgTAP test framework for functions, which is still under very active development at the moment, but I hope will be mature by PgCon.
Norman Yamada
audio
11:3001:00DMS 1150Predicting Postgres PerformancePractical Queueing Theory for Postgres DBAslectureenSo, you've solved all of your current performance problems, but are you ready for tomorrow? Suppose your website traffic increases three-fold, or your company has a huge end-of-year processing job to run, at what point will your system become unusable? In this case, is it a CPU problem, I/O problem, or both? Should you add additional CPUs or can you just get faster CPUs? How fast does your I/O subsystem need to be in order to handle your workload? How much would query tuning help? This session is designed to help you answer those questions.While often misunderstood and underutilized, queueing theory has long-since provided us with the ability to model behaviors and performance characteristics of complex computing systems. Unfortunately, many people shy away from queueing theory because it is often expressed using complex mathematical equations. However, in this session, we'll cover the basics of queueing theory and demonstrate how to apply its concepts and equations to practical, real-world situations.
Specifically, we'll cover:
- Queueing Theory Overview
- Response Times
- Baselines
- Distribution Patterns
- Standard Averages vs. Weighted Averages
- Workload Characterization
- Statistics
- Sampling
- Operating System Statistic Collection
- Postgres Statistic Collection
- Queueing System Notation
- Queueing System Equations
- Erlang C/Little's Law
- Modeling CPU subsystems
- Modeling I/O subsystems
- Determining when a server will no longer be able to handle a given workload.
- Determining whether a system could handle an increased workload.
- Determining whether more CPUs or faster CPUs are needed.
- Determining I/O subsystem needs.
- Gotcha's (Physical vs. Effective CPUs, ...)
Jonah H. Harris
audio
13:3001:00DMS 1150SQL/MEDDoping for PostgreSQLlectureenSQL/MED is Management of External Data, a part of the SQL standard that deals with how a database management system can integrate data stored outside the
database. The implementation of this specification has begun in PostgreSQL 8.4 and will over time introduce powerful new features into PostgreSQL.
Applications of this feature set include linking to other database systems (PostgreSQL or others), configuring proxy servers, presenting XML or other
structured data as SQL tables, and managing file system data through SQL.
This session will introduce the specification and interfaces, the current implementation plan, and will discuss applications and use cases.
Peter Eisentraut
design draft of the features in PostgreSQL 8.4
relevant part of the minutes of last year's developer meeting
audio
15:0001:00DMS 1150Introduction to GolcondeGolconde is a queue-based data distribution systemlectureenLearn about Golconde, a queue-based data distribution system developed at myYearbook.com. In implementing data distribution via external message queues, Golconde differentiates itself from traditional PostgreSQL replication tools, moving the workload for replication management outside of PostgreSQL itself. We will review the design decisions that resulted in Golconde as well as see it in action.Golconde is a queue based replication solution for PostgreSQL written in Python. It is designed to be loosely coupled and rely upon existing enterprise messaging systems that have STOMP protocol support. Designed to scale easily and with multi-data center implementations in mind, the application and message queues for distribution live outside of the database. By decoupling Golconde from PostgreSQL it is differentiated from existing replication solutions, moving the workload from the database tier, where CPU, RAM and IO overhead can be very expensive, to a commodity layer where the operational cost for performing the data distribution work is much less expensive.
In this session we will review the architecture and implementation details of Golconde and examples of its use at myYearbook.com, including a live demonstration that illustrates the ease of setup and operation.Gavin M. Roy
Golconde Homepage
audio
10:0001:00DMS 1160patchesHow to Get Your PostgreSQL Patch Acceptedpatch submission acceptedlectureenThis talk will cover the intricacies of getting your patch accepted by the PostgreSQL community and included in the next software release.Developing a patch for the PostgreSQL community is a fairly complex process. There are many ways a patch can fail, and only a few ways it can succeed. Many people have been through the patch submission process and learned what it takes to make a successful patch. This talk will cover many of the gotchas that lead to a patch being rejected in the hope that attendees will have more success in submitting patches in the future.Bruce MomjianTom Lane
audio
11:3001:00DMS 1160Converting 100 databases to PostgreSQLWisconsin State Courts implementationlectureenThe Wisconsin State Courts converted their databases, physically located at 72 county circuit court sites, as well several central sites, to PostgreSQL in a very short time without major problems. All county servers were converted over the course of a few months.Users, many of whom were very worried in advance, were surprised that they had no down time or disruption of their work. Many commented that the only change they noticed is that applications were "snappier" or "crisper". This session presents the environment in which this occurred, the preparation and planning involved, and the techniques used to roll out PostgreSQL to production. A major factor in the success of this effort was the amazing support of the PostgreSQL community through the PostgreSQL mailing lists.Kevin Grittner
Central replication of the 72 county circuit court databases accessible to the public
audio
13:3001:00DMS 1160Database RefactoringAn idea on how to redesign an existing database with minimal downtime lectureenMost DBA's have areas of the databases that they manage that they would like to change. However, very few are actually given the liberty, resources, and time to make these changes due to the impact it would have on the existing system's code and the impact on the customers. What if there were a way to get around these obstacles and make the structure changes you want and need without requiring an application rewrite?
Many DBA's inherit databases with less than optimal designs. Unfortunately, most of these database are servicing pre-existing application/websites where management can not or will not provide the resources to migrate the existing code to a more normalized and optimized database structure, or there is not enough downtime available to do this conversion/migration. However, for dba's stuck in this position, there may be hope.
This presentation will show you how you can redesign and normalize your database without long periods of downtime and without having to have your developers rework all of the existing sql. We will demonstrate a way to use many of the concepts from materialized views and other areas of Postgres to make your desired changes behind the scenes. These changes should be quick and relatively painless and will allow your application to function without having all of your data migrated to the new schema. Best of all, we will show you how you can do this one table/area at a time!
Chris Hoover
audio
15:0001:00DMS 1160Introduction to Recursive QueriesRecursion: see recursionlectureenIntroduction to new 8.4 feature "WITH RECURSIVE" with examples of querying recursive database data structures such as trees or graphs.Recursive Queries is just one of several long-awaited SQL features planned for PostgreSQL 8.4. But what is a recursive query anyways? What could it possibly mean to have a query refer to its own result set? What could such a thing be useful for?
Greg Stark16:3001:00DMS 1160lightningLightning talksShort sharp descriptions of short topicslightningenLightning talks - details to followLightning talks - details to follow
The closing session will immediately follow this session.Josh BerkusLeo Hsu and Regina Obe
audio
17:3000:30DMS 1160Closing sessionsprizes, auctions, fun, gamesotherenThe Traditional Closing SessionThis event will follow on directly after the Lightning Talks and therefore may start earlier than the scheduled time.Dan Langille
audio
18:0005:00Royal OakpubPub Night!Last chance for social intercourse before the Touristy stuff tomorrowotherenThe last big social event...Be there or miss out. :)Dan Langille10:0004:00Out and abouttouristTourist TimeSee the sights in and/or around OttawaotherenPlay TouristWe will organize a visit to one of the many local tourist attractions.Dan Langille