Monday 24 Aug at the VLDB 2009 conference

On Monday I attended the TPC workshop.

"First TPC Technology Conference on Performance Evaluation & Benchmarking (TPC TC)"

"Enterprise data and user generated data levels continue to grow exponentially. This has challenged researchers and industry experts to develop innovative techniques to evaluate and benchmark software and hardware technologies for 2010 and beyond."

They(TPC) are trying to find what the new areas are where benchmarks are needed.

There was 4 sessions during the day.
The papers will be available in LNCS by Springer Verlag in October.

Active benchmarks: TPC-C,TPC-E,TPC-H
TPC benchmarks in development: TPC-ETL, TPC-Energy

The question was asked if TPC are ready for multi-processing/multi-core processing? Also, should they be ready?

There are new areas: cloud computing, very large memory systems,etc

Benchmarks are needed in these new fields

My observation: It seems to me that TPC never mention open source rdbm's and object databases......why isn't there official TPC benchmark for oodb's?

A few of the companies that has submitted papers: IBM,HP,ORACLE,TPC,Microsoft, Teradata, Vertica(Stonebraker company..), Dell,VMware,etc

Keynote Speech: A New Direction for TPC?
Michael Stonebraker (MIT)
  • Nice link: http://www.openlinksw.com/weblog/oerling/?id=1576
  • He coins the term PAFS. This is the features needed to be a good and widely used benchmark.
  • He states that current benchmarks (or specifically TPC-H) don't include load performance which is important in data warehouses
  • Relational databases are not easy to use and to many knobs
  • TPC doesn't check out of box usage
  • Benchmarks need to be able to scale on the fly (ad more resources...extra nodes?)
  • Nobody recovers using the log. They use replication. The TPC benchmarks don't include replication
  • Gartner says TPC-H is irrelevant
  • Large vendors don't use these benchmarks
  • TPC-DS is worse than TPC-H
  • Looking at the PAFS criteria TPC-H and TPC-C fail
  • Remote sensing people hate relation databases
  • Science guys: Asstrology, Biology (DNA),Human Genome project - for them tables are wrong. rdbms is the wrong datamodel!!! :)
  • Arrays and graphs are needed in these fields
  • "Re-grid(?)" in rdbms: joins and projections are very hard
  • Relational databases have the wrong features: They need provenance (operations). They need repeatibility (no-overwrite storage, needs time stamps)
  • Science filed uses file systems. They don't use rdbms
  • TPC could change this
  • RDF -> not going in rows ...better in columns
  • WEB 2.0 -> xml data
  • Map reduce style computing (very good at for these type of applications)
  • ..there is a benchmark for this (?)....CMM?
  • Nobody uses relational databases in real time systems....
  • Madden/Abadi...RDF ..work/paper?
  • MR benchmark (MapReduce vs RDBMS...who won this?)
  • PAFS - He looked at the original benchmark work done by Jim Gray. The elements/features is: the benchmark is created by one person. There is a clear Application (find a pressing need/application). The benchmark FOCUSED vendors. System?
  • Must reinvent oneself every decade.
  • Benchmarks should be simple, is a real art, must capture the real essence
  • He is busy creating a science benchmark
  • Find a place where areas are inadequate and then improve
  • He hopes he has angered some people in the room....:)


The State of Energy and Performance Benchmarking for Enterprise Servers
Andrew Fanara (US EPA), Evan Haines (ICF International), Arthur Howard (ICF International)

  • -- presented by Liam Newcombe British Computer Society
  • This presentation looked at saving power and looked at how to benchmark this...green issues
  • Mentions SPEC JBB

Overview of TPC Benchmark E: The Next Generation of OLTP Benchmarks
Patricia Hogan (IBM)

  • TPC-E uses real data. Census data of 2000 US + Canada. Actual listings of the NY exchange
  • If TPC-E add energy metric it will grow...

Measuring Database Performance in Online Services: a Trace-Based Approach
Swaroop Kavalanekar (Microsoft), Dushyanth Narayanan (Microsoft Reserach), Sriram Sanka (Microsoft), Eno Thereska (Microsoft Research), Kushagra Vaid (Microsoft), Bruce Worthington (Microsoft)
C-E
  • Microsoft research at Cambridge
  • This was a nice paper/presentation
  • They traced runs of TPC-C, TPC-E and TPC-H
  • They traced MS IM db and MSN db
  • TPC does not match apps in the real world
  • Traces gives you a more realistic benchmark
  • Pvz: Traces was mentioned already in 1994 in a benchmark workshop related odbms's

Issues in Benchmark Metric Selection
Alain Crolotte (Teradata)
  • Discussed the difference between arithmetic mean, geometric mean and harmonic mean
  • Did not like geometric mean
  • Recommends using arithmetic mean
  • Must read this paper...

Benchmarking Database Performance in a Virtual Environment
Sharada Bose (Hewlett-Packard), Priti Mishra (VMware), Priya Sethuraman (VMware), Reza Taheri (VMware)

  • Used TPC-C and TPC-E...?
  • Must enable cloud computing and live load balancing
  • Use shadow paging....guest paging...
  • Paravirtualization
  • Benchmarks that exists: SpecVirt,VMmark (mix of workloads...like SPEC?)
  • They ask if virtualization is ready for a TPC benchmark?
  • They state that users want to put db's on VM. They want to know the performance on these VM
  • There is a need need for a benchmarks of db's on VM's
The Star Schema Benchmark and Augmented Fact Table Indexing
Patrick O'Neil (University of Massachusetts at Boston), Elizabeth O'Neil (University of Massachusetts at Boston), Xuedong Chen (University of Massachusetts at Boston), Stephen Revilak University of Massachusetts at Boston)

  • Star schema is related to data marts
  • Kimbal and Ross (...the bible :) )
  • TPC-H has joins
  • TOC-DS is a snowflake
  • clustering is very important
  • Cube, dimensions, Vertica dbms
  • ADC weakness (what is this?)
  • 2/3 papers presented at this workshop needed to use Product A,B,C because of lisence issues....so sad...
  • One size fits all - Stonebraker 2007....
  • They also mention SSD solid state disks...

Comments

Popular Posts