PGCon2007 - Confirmed

PGCon 2007
The PostgreSQL Conference

Speakers
Mark Wong
Rilson Nascimento
Schedule
Day 3
Room SITE G0103
Start time 13:30
Duration 01:00
Info
ID 15
Event type Lecture
Track Hackers
Language English
Feedback

Digesting an open-source fair-use TPC-E implementation.

TPC-E is the new On-Line Transaction Processing workload being developed by the Transaction Processing Performance Council (TPC). It is intended to replace the aging TPC-C.

This talk starts by describing the TPC-E specification in high-level terms and reasoning why TPC-E represents modern database server workload. It continues by explaining the benefits to the open source development community (particularly to the PostgreSQL community) of counting on an open-source workload based on the industry-standard TPC-E benchmark. The implementation and architecture of the workload are outlined and motivating experimental results are presented.

It seems appropriate that with these recent innovations in the field of database benchmarks, the open source development community take advantage of these innovations. Open source workloads, based on industry-standard benchmarks, enable people to measure system performance and verify resource utilization. In addition, by taking the guesswork out of performance analysis, through the use of benchmarks, people can save a tremendous amount of time that would be usually spent on ineffective trial-and-error solutions.

Some benefits to the FOSS community:

  • Enable more people to use the workload since it is more cost-effective (TPC-E is designed to be less IO dependent than TPC-C, i.e., it was engineered to provide a balanced mixture of disk input/output and processor usage).
  • Enable users to test modern database server workloads.
  • Help people to learn system tuning by offering a controlled environment to test new configurations.
  • Act as a sanity check on claims of performance leadership.
  • Provide fertile ground for engineering improvements.
The workload was written mostly in C++ and PL/pgSQL. The latter was used to implement the TPC-E transactions. Libpqxx was used as C++ client API for PostgreSQL. The database loader that accompanies TPC-E was extended to directly load a PostgreSQL database. This relieves the system of the burden of generating and loading auxiliary files. The data collection and reporting functionality for system resources were based on Mark Wong's DBT-2 workload, which provides a rich set of scripts to collect performance data and generate charts.