April 24
The number isn't quite as high as one might think, though Amazon doesn't like to publish figures. In 2016, Amazon hit 600 items per second. In 2017, during the post-Thanksgiving five-day sales period they hit maybe 200 million items in 120 hours, for about 400-500 items/second. Other sources suggest 300 transactions per second for a peak rate.
launchbylunch.com/posts/2014/Feb/16/sql-naming-conventions
Note the observation that the lifetime of a naming schema may be decades, and the recommendation that tables be named in the singular.
Suppose you have table (K,A,B,C) and a functional dependency A→B. How do you factor this to eliminate the dependency?
Facebook timeline and denormalization:
https://www.facebook.com/note.php?note_id=10150468255628920
Tuning postgres to handle SSDs vs Spinning Iron: see postgresql.conf, Planner Cost Constants
Documentation at postgresql.org/docs/9.5/static/runtime-config-query.html.
Postgres serializable
Locking
and deadlocks
From the Postgres manual:
PostgreSQL automatically detects deadlock situations and resolves them by aborting one of the transactions involved, allowing the other(s) to complete. (Exactly which transaction will be aborted is difficult to predict and should not be relied upon.)
The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order.
Deadlock detection via wait-for graph loops
How Postgres detects deadlock
Wait-for graphs versus transaction-conflict graphs (Ti→Tj if Ti accesses X and later Tj does and at least one of the accesses is a write)
Three-way deadlock:
T1:
begin;
update part set quan_in_stock = quan_in_stock -1 where partnum = 101;
update part set quan_in_stock = quan_in_stock -1 where partnum = 102;
T2:
update part set quan_in_stock = quan_in_stock -1 where partnum = 102;
update part set quan_in_stock = quan_in_stock -1 where partnum = 103;
T3:
update part set quan_in_stock = quan_in_stock -1 where partnum = 103;
update part set quan_in_stock = quan_in_stock -1 where partnum = 101;
While we're at it, here are two queries that do the same thing:
-- really SLOOOW query:
select distinct pu.invnum from partupdate pu where (select count(*) from
partupdate p where p.invnum = pu.invnum) <> 10 order by pu.invnum
limit 200;
QUERY PLAN ----------------------------------------------------------------------------------------------- Limit (cost=9633821.86..9633822.36 rows=200 width=4) -> Sort (cost=9633821.86..9633826.83 rows=1990 width=4) Sort Key: pu.invnum -> HashAggregate (cost=9633715.95..9633735.85 rows=1990 width=4) Group Key: pu.invnum -> Seq Scan on partupdate pu (cost=0.00..9633666.43 rows=19808 width=4) Filter: ((SubPlan 1) <> 10) SubPlan 1 -> Aggregate (cost=483.88..483.88 rows=1 width=0) -> Seq Scan on partupdate p (cost=0.00..483.85 rows=10 width=0) Filter: (invnum = pu.invnum)
-- faster:
select pu.invnum from partupdate pu group by pu.invnum having count(*)
<> 10 order by pu.invnum limit 200;
QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=645.06..645.56 rows=200 width=4) -> Sort (cost=645.06..650.06 rows=2000 width=4) Sort Key: invnum -> HashAggregate (cost=533.62..558.62 rows=2000 width=4) Group Key: invnum Filter: (count(*) <> 10) -> Seq Scan on partupdate pu (cost=0.00..434.08 rows=19908 width
The second took 10 ms; the first took 92403 ms.
But the real reason the first was so slow was that there was no primary key, because I forgot to declare in partupdate that (invnum, partnum) was a key. With the key (and thus index) declared, we get
QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Limit (cost=0.29..70941.06 rows=200 width=4) -> Unique (cost=0.29..705860.96 rows=1990 width=4) -> Index Only Scan using partupdate2_pkey on partupdate2 pu (cost=0.29..705811.44 rows=19808 width=4) Filter: ((SubPlan 1) <> 10) SubPlan 1 -> Aggregate (cost=35.39..35.40 rows=1 width=0) -> Bitmap Heap Scan on partupdate2 p (cost=4.37..35.36 rows=10 width=0) Recheck Cond: (invnum = pu.invnum) -> Bitmap Index Scan on partupdate2_pkey (cost=0.00..4.36 rows=10 width=0) Index Cond: (invnum = pu.invnum)