Wednesday 26 Aug at the VLDB 2009 conference
- what was discussed,
- some of my thoughts
- and some key words that could be used as a search for further information.
The talks on Wednesday:
09:00-10:30
Bringing Database Research to Computer Games and Simulations
Johannes Gehrke (Cornell Univ.)
They have created SGL to make scripting easier in games.
Multi-scrip optimization techniques.
For me its seems as if there is a object-relational impedance mismatch...or did he mention it as well...game uses objects and how do you store this on disk? Do
you use and rdbms?
SecondLife uses StreamBase: What is StreamBase?
StreamBase’s Event Processing Platform™ is high-performance software for rapidly building systems that analyze and act on real-time streaming data. StreamBase combines a rapid application development environment, a low-latency high-throughput event server, and enterprise connectivity to real-time and historical data. With StreamBase, organizations rapidly build real-time systems that can generate millions of dollars in new profits and are deployed at a fraction of the cost and risk of alternatives
Isn't SecondLife using Versant as well? I need to check.
SGL - uses the state effect pattern, is an imperative language
algebraic optimization, improve query plans
mmm Must write parallel programs today to make use of multiple cores
11:00-12:30 Query Processing
Enhanced Subquery Optimizations in Oracle
Srikanth Bellamkonda (Oracle), Rafi Ahmed (Oracle), Andrew Witkowski (Oracle), Mohamed Zait (Oracle), Angela Amor (Oracle), Chun Chieh Lin (Oracle)
They used TPC-H queries.
View merging, sub query removal, NAAJ ANTI-JOIN
Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs
They compared sort vs hash. Is there anything using trees that they could compare it against?
TLB misses and L2 cache is mentioned
Qeury load balancing.
Mm:
Single Query on many cores -> ( focus on in memory, Intel seems to recommend this?? read paper to make sure)
Multiple queries on many cores -> other paper shows cache misses. so not the best option...
O(n) -> hash sort better for large data
Yu Xu (Teradata), Pekka Kostamaa (Teradata)
Paralle databases, OJSO, SMP nodes, skewed processing.
14:00-15:30: Research sessions - Parallelism
I wanted to attend the MapReduce session as well but had to choose :)
Adaptively Parallelizing Distributed Range Queries
Must read this paper
This is more an OLTP talk. Yahoo research related to PNUT
They looked at range queries and how parallel should the queries be?
ASA: adaptive server allocation
Used a synthetic benchmark. Used data from flickr
Schedulars
Mining Tree-Structured Data on Multicore Systems
Related to the bioinformatics field.
Frequent subtree mining algorithm
Predictable Performance for Unpredictable Workloads
I liked the ETH Zurich talks. They where clear and confident.
Check Cresendo.
RDBMS SENSITIVE TO:
- full table scan
- read-write concurrency
Partitioning andclustering of data
INDEX QUERIES RATHER THAN DATA - this was mentioned in other talks as well. I think also in column databases
Panel 2
How Best to Build Web-Scale Data Managers? A Panel Discussion.
Read abstract: http://vldb2009.org/?q=node/23/index72dc.html?q=node/23
- Philip A. Bernstein mentions that the db field is late to this field: Web Scale Data Managers. WDM as he has coined it.
- WDMs: are self managing, no transactions, WDM's came/started from the operating systems community
- He asks what the db community can add to WDM
- Seems WDM systems do not use transactions
- Examples of these WDM systems: Google's BigTable, HBase, PNUTS from Yahoo
- SAP uses the db as a key value store
- WDM systems have abandoned ACID
- So, one question is should we improve classical databases to include these WDM features (I assume map-reduce, key-value store features)
- Or add to WDM systems? Add transaction features? Like HadoopDB.
- Madden mentions http://highscalability.com
- http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster ...twitter uses memcache. Not using transactions. 300 inserts/tweets per second
- ....mentions between the lines that open source dbs don't scale....seems to be the perception in some of the talks...
- MySql and PostgreSQL doesn't scale out of the box
- Streaming databases
- twitter partitions the database. Mm read more.
- propriety databases to expensive
- mentions Parallel Databases
- http://pgfoundry.org/projects/bizgres/
- provides this classification:
- FN execution (low level): Google File System + MapReduce
- Stores: PNUTS,BigTable,Dynamo
- Analysis: Pig,Hive,SCOPE
- WDM FN exec features: Easy scaling, costs less to scale, unix-like, fast, easy data loading
- These systems mainly lack db features wich is good for the db community as we can add it
- Adding: queries , indexing, tx support, schema management
- Data loading is extremely under appreciated barrier to everyday dn usage
- Hellerstein.
- You need developers to support the cloud platform
- Cloud is a data-centric programming challenge
- Read: the Cloud goes boom
- Brain Cooper
- Drop ACID :)
- Drop SQL
- Many web apps don't need ACID
- ACID does not scale
- today there is cross continental communication
- need weak consistence
- simple queries should use a restful API
- complex queries: write in PIG http://hadoop.apache.org/pig/
- get a good benchmark to test...
- hierarchical db where a good idea, persistent queues, materialized views?....
Comments
Post a Comment