Comp 305-001/488-305: Database Administration

Spring 2018: Tuesdays, 4:15-6:45, Comm 013

Week 14

April 24


Amazon transactions per second

The number isn't quite as high as one might think, though Amazon doesn't like to publish figures. In 2016, Amazon hit 600 items per second. In 2017, during the post-Thanksgiving five-day sales period they hit maybe 200 million items in 120 hours, for about 400-500 items/second. Other sources suggest 300 transactions per second for a peak rate.


SQL naming conventions

    launchbylunch.com/posts/2014/Feb/16/sql-naming-conventions

Note the observation that the lifetime of a naming schema may be decades, and the recommendation that tables be named in the singular.


Simple examples of normalization by factoring

Suppose you have table (K,A,B,C) and a functional dependency A→B. How do you factor this to eliminate the dependency?

Normalization

    Facebook timeline and denormalization:
    https://www.facebook.com/note.php?note_id=10150468255628920



Tuning postgres to handle SSDs vs Spinning Iron: see postgresql.conf, Planner Cost Constants

Documentation at postgresql.org/docs/9.5/static/runtime-config-query.html.


    select * from pg_locks where transactionid = X;

Postgres serializable

Locking and deadlocks

From the Postgres manual:

PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete. (Exactly which transaction will be aborted is difficult to predict and should not be relied upon.)

The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order.

Deadlock detection via wait-for graph loops

    How Postgres detects deadlock

    Wait-for graphs versus transaction-conflict graphs (Ti→Tj if Ti accesses X and later Tj does and at least one of the accesses is a write)

Three-way deadlock:

T1:

begin;
update part set quan_in_stock = quan_in_stock -1 where partnum = 101;
update part set quan_in_stock = quan_in_stock -1 where partnum = 102;

T2:
update part set quan_in_stock = quan_in_stock -1 where partnum = 102;
update part set quan_in_stock = quan_in_stock -1 where partnum = 103;

T3:
update part set quan_in_stock = quan_in_stock -1 where partnum = 103;
update part set quan_in_stock = quan_in_stock -1 where partnum = 101;


While we're at it, here are two queries that do the same thing:

-- really SLOOOW query:
select distinct pu.invnum from partupdate pu where (select count(*) from partupdate p where p.invnum = pu.invnum) <> 10 order by pu.invnum limit 200;

                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Limit  (cost=9633821.86..9633822.36 rows=200 width=4)
   ->  Sort  (cost=9633821.86..9633826.83 rows=1990 width=4)
         Sort Key: pu.invnum
         ->  HashAggregate  (cost=9633715.95..9633735.85 rows=1990 width=4)
               Group Key: pu.invnum
               ->  Seq Scan on partupdate pu  (cost=0.00..9633666.43 rows=19808 width=4)
                     Filter: ((SubPlan 1) <> 10)
                     SubPlan 1
                       ->  Aggregate  (cost=483.88..483.88 rows=1 width=0)
                             ->  Seq Scan on partupdate p  (cost=0.00..483.85 rows=10 width=0)
                                   Filter: (invnum = pu.invnum)


-- faster:
select pu.invnum from partupdate pu group by pu.invnum having count(*) <> 10 order by pu.invnum limit 200;

                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Limit  (cost=645.06..645.56 rows=200 width=4)
   ->  Sort  (cost=645.06..650.06 rows=2000 width=4)
         Sort Key: invnum
         ->  HashAggregate  (cost=533.62..558.62 rows=2000 width=4)
               Group Key: invnum
               Filter: (count(*) <> 10)
               ->  Seq Scan on partupdate pu  (cost=0.00..434.08 rows=19908 width

The second took 10 ms; the first took 92403 ms.

But the real reason the first was so slow was that there was no primary key, because I forgot to declare in partupdate that (invnum, partnum) was a key. With the key (and thus index) declared, we get

                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..70941.06 rows=200 width=4)
   ->  Unique  (cost=0.29..705860.96 rows=1990 width=4)
         ->  Index Only Scan using partupdate2_pkey on partupdate2 pu  (cost=0.29..705811.44 rows=19808 width=4)
               Filter: ((SubPlan 1) <> 10)
               SubPlan 1
                 ->  Aggregate  (cost=35.39..35.40 rows=1 width=0)
                       ->  Bitmap Heap Scan on partupdate2 p  (cost=4.37..35.36 rows=10 width=0)
                             Recheck Cond: (invnum = pu.invnum)
                             ->  Bitmap Index Scan on partupdate2_pkey  (cost=0.00..4.36 rows=10 width=0)
                                   Index Cond: (invnum = pu.invnum)

    



NoSQL and CAP