Tuesday 25 Aug at the VLDB 2009 conference

The day started with a great key note:

9:00-10:30

Keynote 1

Cloud Data Serving: Key-Value Stores to DBMSs
Raghu Ramakrishnan (Yahoo! Research)


  • He is part of the cloud computing effort at Yahoo
  • This talk was amazing. Wish I could find the presentation.......
  • Mentions his colleague Brian Cooper
  • Key stores -> write a map reduce system...?
  • Typical apps: User login, posting in message boards
  • His presentation included a few graphs.
  • One of the graphs shows the following architecture components:
  • #Pnuts/Sherpa db. There can be a batch export to the Grid form the db for further processing?
  • #Storage: Mobster
  • # Search: Vespa
  • # cache: memcached
  • # messaging: Tribble
  • Demand of cloud storage have led to simplified KV stores(Key-Value)
  • Horizontal "platform" cloud services -> Amazon
  • Functional "platform" cloud services ->
  • Cloud stack: YCS+vm/os+pnuts+hadoop (and YDOT FS, zookeeper)
  • Yahoo Search Index (Popular Searches) uses Hadoop
  • Yahoo content optimizer (Main foto on the Yahoo main page) uses a key store
  • Spam detection uses machine learning
  • New conf in 2010: ACM symposium on Cloud Computing (http://www.nitc.ac.in/nitc/sac2010/cc.htm or SIGMOD 2010 ACM symposium on Cloud Computing)
  • Need for flexible schemas (relational databases not good at this...)
  • Shows graph that has these 3 components: Large Hadoop, Structured Record Storage Pnuts, Blob storage
  • Mentions CAP
  • Brewer paper on CAP's: http://portal.acm.org/citation.cfm?id=343502&dl=GUIDE&coll=GUIDE&CFID=51841926&CFTOKEN=44318240
  • Restrict transaction (shared MySQL)
  • Object timelines (pnuts)
  • Eventual consistency (Amazon Dynamo)
  • http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
  • http://glinden.blogspot.com/
  • NB: http://glinden.blogspot.com/2006/03/i-want-big-virtual-database.html
  • Pnuts/Sherpa -> parallel dbs
  • web services (restful)
  • Index management ...async..there is a paper on this...
  • Bulk read, Range Queries in YDOT, Bulk load-pre allocated tablets
  • Clustering -> asynch consistency model ...in Pnuts...
  • Pnuts uses versions
  • Distrubution factors: load balancing, overfull tables grow or split...moving of "tablets"
  • Consistency check: per record mastering
  • He gave an example of where one person is living on the west coast and the other on the east coast. He showed how data could be moved from one server in the west coast to the east coast if the person for example moved to the east coast. He explained how fail over could be done in such an environment. The system/app performs these tasks automatically
  • Yahoo compared Pnuts, Cassandra(Facebook: http://www.facebook.com/note.php?note_id=24413138919) and HBase (). MySql shared server, Azure, Google MegaStore
  • Also see http://www.mail-archive.com/hbase-user@hadoop.apache.org/msg05800.html
  • Sharding, resharding....need to read up on this
  • Router in Pnuts
  • MySql sharded performed well.


Research sessions - Stream Processing I

Chair: Yanlei Diao (UMass Amherst) -- Room: Rhône 3A
Tagging Stream Data for Rich Real-Time Services

Rimma Nehme (Microsoft Jim Gray Systems Lab), Elke Rundensteiner (WPI), Elisa Bertino (Purdue Univ.)


  • Mentions solid state storage: nomechanical latency
  • need to decide what to place on the hdd or ssd for the right cost
  • One better at random access and the other at sequential access
  • Placement plan
  • Used the TPC-H and TPC-C benchmarks (20 warehouses of 2.2 GB...)
  • MTTF?

StatAdvisor: Recommending Statistical Views
Amr El-Helw (Univ. of Waterloo), Ihab Ilyas (Univ. of Waterloo), Calisto Zuzarte (IBM Toronto)

Better to read the paper. Was a bit to technical for me and not in my field

An Object Placement Advisor for DB2 Using Solid State Storage
Mustafa Canim (Univ. of Texas at Dallas), Bishwaranjan Bhattacharjee (IBM T.J. Watson Research Center), George Mihaila (IBM T.J.Watson Research Center), Christian Lang (IBM T.J.Watson Research Center), Ken Ross (Columbia Univ.)

Better to read the paper. Was a bit to technical for me and not in my field

Research sessions - Cloud Computing and Data Warehousing

Chair: Daniel Abadi (Yale U., USA)

Consistency Rationing in the Cloud: Pay only when it matters
Tim Kraska (ETH Zurich), Martin Hentschel (ETH Zurich), Gustavo Alonso (ETH Zurich), Donald Kossmann (ETH Zurich)


Locking Key Ranges with Unbundled Transaction Services
David Lomet (Microsoft Research), Mohamed Mokbel (Univ. of Minnesota)

  • key ranges have been in SQL Server for the last 10 year
  • they wanted to modify it
  • multi-core architecture, cloud datastore with transactions
  • new locking protocol in this field. partition lock protocol
  • Shared lock of ranges
  • As in most talks at VLDB this paper included an performance analysis

On-the-fly Progress Detection in Iterative Stream Queries
Badrish Chandramouli (Microsoft Research), Jonathan Goldstein (Microsoft Research), David Maier (Portland State Univ.)


  • Involved pattern matching
  • DAG's
  • Iterative queries
  • Reachability will also work with this....
  • FFP
  • high watermark
  • Work included in MS-Stream
  • Recursive queries

Research sessions - Query Processing on Modern Hardware

Chair: Anastasia Ailamaki (EPFL)
Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices
Devesh Agrawal (UMass Amherst), Deepak Ganesan (UMass Amherst), Ramesh Sitaraman (UMass Amherst), Yanlei Diao (UMass Amherst), Shashi Singh (UMass Amherst)
  • Indexing over flash
  • SSD fast random reads
  • Out-of-place updates
  • Expensive random writes - Birrell
  • May 2007. Andrew Birrell, Michael Isard, Chuck Thacker and Ted Wobber. A Design for High-performance Flash Disks. ACM SIGOPS Operating Systems Review
  • LA-Tree -> B+Tree
  • Lazy updates
  • used micro benchmarks (I have info on this in my MSc dissertation)
  • Flash increases GC
  • They used the TPC-C
  • index trace
  • FlashDB based on B+tree
  • BFTL?
  • Interesting paper and research

MCC-DB: Minimizing Cache Conflicts in Multi-core Processors for Databases
Rubao Lee (The Ohio State Univ.), Xiaoning Ding (The Ohio State Univ.), Feng Chen (The Ohio State Univ.), Qingda Lu (The Ohio State Univ.), Xiaodong Zhang (The Ohio State Univ.)

  • Check IEEESpectrum May 2008
  • There is an increase in the number of cores
  • Star Schema benchmark
  • Hash join, index join
  • Modifcation made to both PostgreSQL and Linux kernel so that cache conflicts can be minimized and the cach balanced between the cores. Cache allocation.
  • My Question: How do you determine if a query has a good or bad locality.....in the cache...?
  • Page colouring for cache in the OS
  • MUST READ THIS PAPER
SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units
Thomas Willhalm (Intel GmbH), Nicolae Popovici (Intel GmbH), Yazan Boshmaf (SAP AG), Hasso Plattner (Hasso-Plattner-Institut), Alexander Zeier (Hasso-Plattner-Institut), Jan Schaffner (Hasso-Plattner-Institute)
  • Column store is mentioned...
  • Data is stored in memory as columns and compressed
  • SAP Netweaver BWA........
  • SIMD +Intel SSE

Comments

Popular Posts