Wednesday 26 Aug at the VLDB 2009 conference

August 28, 2009

Wednesday 26 Aug at the VLDB 2009 conference

In my blogs about VLDB I will mention:

what was discussed,
some of my thoughts
and some key words that could be used as a search for further information.

The talks on Wednesday:

09:00-10:30
Bringing Database Research to Computer Games and Simulations
Johannes Gehrke (Cornell Univ.)

They have created SGL to make scripting easier in games.
Multi-scrip optimization techniques.

For me its seems as if there is a object-relational impedance mismatch...or did he mention it as well...game uses objects and how do you store this on disk? Do
you use and rdbms?

SecondLife uses StreamBase: What is StreamBase?
StreamBase’s Event Processing Platform™ is high-performance software for rapidly building systems that analyze and act on real-time streaming data. StreamBase combines a rapid application development environment, a low-latency high-throughput event server, and enterprise connectivity to real-time and historical data. With StreamBase, organizations rapidly build real-time systems that can generate millions of dollars in new profits and are deployed at a fraction of the cost and risk of alternatives

Isn't SecondLife using Versant as well? I need to check.

SGL - uses the state effect pattern, is an imperative language

algebraic optimization, improve query plans

mmm Must write parallel programs today to make use of multiple cores

11:00-12:30 Query Processing
Enhanced Subquery Optimizations in Oracle
Srikanth Bellamkonda (Oracle), Rafi Ahmed (Oracle), Andrew Witkowski (Oracle), Mohamed Zait (Oracle), Angela Amor (Oracle), Chun Chieh Lin (Oracle)

They used TPC-H queries.

View merging, sub query removal, NAAJ ANTI-JOIN

Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs
Changkyu Kim (Intel Corporation), Eric Sedlar (Oracle), Jatin Chhugani (Intel Corporation), Tim Kaldewey (Oracle), Anthony Nguyen (Intel), Andrea Di Blas (Oracle, UCSC), Victor Lee (Intel Corporation), Nadathur Satish (Intel Corporation), Pradeep Dubey (Intel Corporation)

Must read this paper

They compared sort vs hash. Is there anything using trees that they could compare it against?

TLB misses and L2 cache is mentioned

Qeury load balancing.

Mm:

Single Query on many cores -> ( focus on in memory, Intel seems to recommend this?? read paper to make sure)

Multiple queries on many cores -> other paper shows cache misses. so not the best option...

O(n) -> hash sort better for large data

Efficient Outer Join Data Skew Handling in Parallel DBMS
Yu Xu (Teradata), Pekka Kostamaa (Teradata)

Paralle databases, OJSO, SMP nodes, skewed processing.

14:00-15:30: Research sessions - Parallelism

I wanted to attend the MapReduce session as well but had to choose :)

Adaptively Parallelizing Distributed Range Queries

Ymir Vigfusson (Cornell Univ.), Adam Silberstein (Yahoo! Research), Brian Cooper (Yahoo! Research), Rodrigo Fonseca (Yahoo! Research)

Must read this paper

This is more an OLTP talk. Yahoo research related to PNUT

They looked at range queries and how parallel should the queries be?

ASA: adaptive server allocation

Used a synthetic benchmark. Used data from flickr

Schedulars

Mining Tree-Structured Data on Multicore Systems

Shirish Tatikonda (Ohio State Univ.), Srinivasan Parthasarathy (Ohio State Univ.)

Related to the bioinformatics field.

Frequent subtree mining algorithm

Predictable Performance for Unpredictable Workloads

Philipp Unterbrunner (ETH Zurich), Georgios Giannikis (ETH Zurich), Gustavo Alonso (ETH Zurich), Dietmar Fauser (Amadeus IT Group SA), Donald Kossmann (ETH Zurich)

I liked the ETH Zurich talks. They where clear and confident.

Check Cresendo.

RDBMS SENSITIVE TO:

full table scan
read-write concurrency

They worked with Amadeus travel agency system and data.
Partitioning andclustering of data
INDEX QUERIES RATHER THAN DATA - this was mentioned in other talks as well. I think also in column databases

Panel 2

How Best to Build Web-Scale Data Managers? A Panel Discussion.

Daniel J. Abadi (Yale), Michael J. Cafarella (U. of Washington), Joseph M. Hellerstein (U.C. Berkeley), Donald Kossmann (ETH Zürich), Samuel Madden (Massachusetts Institute of Technology). Moderator: Philip A. Bernstein (Microsoft)

Read abstract: http://vldb2009.org/?q=node/23/index72dc.html?q=node/23

Philip A. Bernstein mentions that the db field is late to this field: Web Scale Data Managers. WDM as he has coined it.
WDMs: are self managing, no transactions, WDM's came/started from the operating systems community
He asks what the db community can add to WDM
Seems WDM systems do not use transactions
Examples of these WDM systems: Google's BigTable, HBase, PNUTS from Yahoo
SAP uses the db as a key value store
WDM systems have abandoned ACID
So, one question is should we improve classical databases to include these WDM features (I assume map-reduce, key-value store features)
Or add to WDM systems? Add transaction features? Like HadoopDB.
Madden mentions http://highscalability.com
http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster ...twitter uses memcache. Not using transactions. 300 inserts/tweets per second
....mentions between the lines that open source dbs don't scale....seems to be the perception in some of the talks...
MySql and PostgreSQL doesn't scale out of the box
Streaming databases
twitter partitions the database. Mm read more.
propriety databases to expensive
mentions Parallel Databases
http://pgfoundry.org/projects/bizgres/
provides this classification:
FN execution (low level): Google File System + MapReduce
Stores: PNUTS,BigTable,Dynamo
Analysis: Pig,Hive,SCOPE
WDM FN exec features: Easy scaling, costs less to scale, unix-like, fast, easy data loading
These systems mainly lack db features wich is good for the db community as we can add it
Adding: queries , indexing, tx support, schema management
Data loading is extremely under appreciated barrier to everyday dn usage
Hellerstein.
You need developers to support the cloud platform
Cloud is a data-centric programming challenge
Read: the Cloud goes boom
Brain Cooper
Drop ACID :)
Drop SQL
Many web apps don't need ACID
ACID does not scale
today there is cross continental communication
need weak consistence
simple queries should use a restful API
complex queries: write in PIG http://hadoop.apache.org/pig/
get a good benchmark to test...
hierarchical db where a good idea, persistent queues, materialized views?....

Search This Blog

Database research, software development and stuff