Wednesday 26 Aug at the VLDB 2009 conference

In my blogs about VLDB I will mention:
  • what was discussed,
  • some of my thoughts
  • and some key words that could be used as a search for further information.

The talks on Wednesday:

09:00-10:30
Bringing Database Research to Computer Games and Simulations
Johannes Gehrke (Cornell Univ.)


They have created SGL to make scripting easier in games.
Multi-scrip optimization techniques.

For me its seems as if there is a object-relational impedance mismatch...or did he mention it as well...game uses objects and how do you store this on disk? Do
you use and rdbms?

SecondLife uses StreamBase: What is StreamBase?
StreamBase’s Event Processing Platform™ is high-performance software for rapidly building systems that analyze and act on real-time streaming data. StreamBase combines a rapid application development environment, a low-latency high-throughput event server, and enterprise connectivity to real-time and historical data. With StreamBase, organizations rapidly build real-time systems that can generate millions of dollars in new profits and are deployed at a fraction of the cost and risk of alternatives


Isn't SecondLife using Versant as well? I need to check.

SGL - uses the state effect pattern, is an imperative language

algebraic optimization, improve query plans

mmm Must write parallel programs today to make use of multiple cores

11:00-12:30 Query Processing
Enhanced Subquery Optimizations in Oracle

Srikanth Bellamkonda (Oracle), Rafi Ahmed (Oracle), Andrew Witkowski (Oracle), Mohamed Zait (Oracle), Angela Amor (Oracle), Chun Chieh Lin (Oracle)

They used TPC-H queries.

View merging, sub query removal, NAAJ ANTI-JOIN


Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs
Changkyu Kim (Intel Corporation), Eric Sedlar (Oracle), Jatin Chhugani (Intel Corporation), Tim Kaldewey (Oracle), Anthony Nguyen (Intel), Andrea Di Blas (Oracle, UCSC), Victor Lee (Intel Corporation), Nadathur Satish (Intel Corporation), Pradeep Dubey (Intel Corporation)


Must read this paper

They compared sort vs hash. Is there anything using trees that they could compare it against?

TLB misses and L2 cache is mentioned

Qeury load balancing.

Mm:

Single Query on many cores -> ( focus on in memory, Intel seems to recommend this?? read paper to make sure)

Multiple queries on many cores -> other paper shows cache misses. so not the best option...

O(n) -> hash sort better for large data


Efficient Outer Join Data Skew Handling in Parallel DBMS
Yu Xu (Teradata), Pekka Kostamaa (Teradata)

Paralle databases, OJSO, SMP nodes, skewed processing.


14:00-15:30: Research sessions - Parallelism


I wanted to attend the MapReduce session as well but had to choose :)

Adaptively Parallelizing Distributed Range Queries

Ymir Vigfusson (Cornell Univ.), Adam Silberstein (Yahoo! Research), Brian Cooper (Yahoo! Research), Rodrigo Fonseca (Yahoo! Research)

Must read this paper

This is more an OLTP talk. Yahoo research related to PNUT

They looked at range queries and how parallel should the queries be?

ASA: adaptive server allocation

Used a synthetic benchmark. Used data from flickr

Schedulars


Mining Tree-Structured Data on Multicore Systems

Shirish Tatikonda (Ohio State Univ.), Srinivasan Parthasarathy (Ohio State Univ.)

Related to the bioinformatics field.

Frequent subtree mining algorithm

Predictable Performance for Unpredictable Workloads

Philipp Unterbrunner (ETH Zurich), Georgios Giannikis (ETH Zurich), Gustavo Alonso (ETH Zurich), Dietmar Fauser (Amadeus IT Group SA), Donald Kossmann (ETH Zurich)

I liked the ETH Zurich talks. They where clear and confident.

Check Cresendo.

RDBMS SENSITIVE TO:

  • full table scan
  • read-write concurrency
They worked with Amadeus travel agency system and data.
Partitioning andclustering of data
INDEX QUERIES RATHER THAN DATA - this was mentioned in other talks as well. I think also in column databases

Panel 2

How Best to Build Web-Scale Data Managers? A Panel Discussion.

Daniel J. Abadi (Yale), Michael J. Cafarella (U. of Washington), Joseph M. Hellerstein (U.C. Berkeley), Donald Kossmann (ETH Zürich), Samuel Madden (Massachusetts Institute of Technology). Moderator: Philip A. Bernstein (Microsoft)

Read abstract: http://vldb2009.org/?q=node/23/index72dc.html?q=node/23
  • Philip A. Bernstein mentions that the db field is late to this field: Web Scale Data Managers. WDM as he has coined it.
  • WDMs: are self managing, no transactions, WDM's came/started from the operating systems community
  • He asks what the db community can add to WDM
  • Seems WDM systems do not use transactions
  • Examples of these WDM systems: Google's BigTable, HBase, PNUTS from Yahoo
  • SAP uses the db as a key value store
  • WDM systems have abandoned ACID
  • So, one question is should we improve classical databases to include these WDM features (I assume map-reduce, key-value store features)
  • Or add to WDM systems? Add transaction features? Like HadoopDB.
  • Madden mentions http://highscalability.com
  • http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster ...twitter uses memcache. Not using transactions. 300 inserts/tweets per second
  • ....mentions between the lines that open source dbs don't scale....seems to be the perception in some of the talks...
  • MySql and PostgreSQL doesn't scale out of the box
  • Streaming databases
  • twitter partitions the database. Mm read more.
  • propriety databases to expensive
  • mentions Parallel Databases
  • http://pgfoundry.org/projects/bizgres/
  • provides this classification:
  • FN execution (low level): Google File System + MapReduce
  • Stores: PNUTS,BigTable,Dynamo
  • Analysis: Pig,Hive,SCOPE
  • WDM FN exec features: Easy scaling, costs less to scale, unix-like, fast, easy data loading
  • These systems mainly lack db features wich is good for the db community as we can add it
  • Adding: queries , indexing, tx support, schema management
  • Data loading is extremely under appreciated barrier to everyday dn usage
  • Hellerstein.
  • You need developers to support the cloud platform
  • Cloud is a data-centric programming challenge
  • Read: the Cloud goes boom
  • Brain Cooper
  • Drop ACID :)
  • Drop SQL
  • Many web apps don't need ACID
  • ACID does not scale
  • today there is cross continental communication
  • need weak consistence
  • simple queries should use a restful API
  • complex queries: write in PIG http://hadoop.apache.org/pig/
  • get a good benchmark to test...
  • hierarchical db where a good idea, persistent queues, materialized views?....




Comments

Popular Posts