Database week 13

Comp 305/488: Database Administration, Corboy 208, 4:15 Tuesdays

Week 13

April 14

Read in Elmasri & Navathe (EN)

Chapter 19

Facebook

A couple of their blog posts:

Data-center fabric, with some great diagrams
facebook data-center fabric

Scaling the FB datacenter to 300 PB

Explain output for linux versus Windows systems

Apparently any of several things may be going on:

The size of the tables is used in the query-plan calculation.
The history of past queries may be used, if statistics are being collected
There are differences in versions (and, more likely, sub-versions with different compilation options)
There is at least one documented bug filed over this.

MySQL Cluster

I'm still working on this, but I have a basic system set up:

One management node
One mysqld node
Two data nodes

The straightforward way to use MySQL cluster is to use the mysqld node as an SQL interface to the data, just like using MySQL itself. However, it is also possible to access the data directly, using other APIs; this is where MySQL Cluster becomes a "NoSQL" (here pretty definitively Not only SQL) environment.

The data nodes can be used for

"sharded" data: different subsets of the data, partitioned by key range, are kept on different nodes
replicated data
load balancing

MySQL Cluster does transparent auto-sharding; the DBA can supposedly remain unaware of it.

The mysqld node is just one of many possible "application layers" used to access the data nodes. It uses the NDB/NDB-Cluster storage engine, which is somewhat different from InnoDB.

Other APIs include:

Native NDB library in C++
ClusterJ/JNI
Javascript/Node.JS
Apache/mod_ndb (for interfacing with Apache web servers)
memcached/ndb_eng

Transactions.html

databases.html#pwhash

Permissions and Security

Recovery and ARIES