Comp 305/488: Database Administration, Corboy 208, 4:15
Tuesdays
Week 13
April 14
Read in Elmasri & Navathe (EN)
Facebook
A couple of their blog posts:
Data-center
fabric, with some great diagrams
Scaling
the FB datacenter to 300 PB
Explain output for linux versus Windows systems
Apparently any of several things may be going on:
- The size of the tables is used in the query-plan calculation.
- The history of past queries may be used, if statistics are
being collected
- There are differences in versions (and, more likely, sub-versions with
different compilation options)
- There is at least one documented bug
filed over this.
MySQL Cluster
I'm still working on this, but I have a basic system set up:
- One management node
- One mysqld node
- Two data nodes
The straightforward way to use MySQL cluster is to use the mysqld node as
an SQL interface to the data, just like using MySQL itself. However, it is
also possible to access the data directly, using other APIs; this is where
MySQL Cluster becomes a "NoSQL" (here pretty definitively Not only
SQL) environment.
The data nodes can be used for
- "sharded" data: different subsets of the data, partitioned by key
range, are kept on different nodes
- replicated data
- load balancing
MySQL Cluster does transparent auto-sharding; the DBA can supposedly
remain unaware of it.
The mysqld node is just one of many possible "application layers" used to
access the data nodes. It uses the NDB/NDB-Cluster storage engine, which
is somewhat different from InnoDB.
Other APIs include:
- Native NDB library in C++
- ClusterJ/JNI
- Javascript/Node.JS
- Apache/mod_ndb (for interfacing with Apache web servers)
- memcached/ndb_eng
Transactions.html
databases.html#pwhash
Permissions and Security
Recovery and ARIES