When designing a database schema, choosing an appropriate primary key is critical. A primary key must be unique, efficient for indexing, and fit the use case’s security and performance requirements. This article delves into why UUIDs (Universally Unique Identifiers) are often used and how they compare to other primary key strategies like auto-incremented integers or longs.
Why Not Use a Natural Key (e.g., Email) as Primary Key?
Using a natural key, like an email address, as a primary key is problematic for several reasons:
- String Indexing Overhead: Database indices are automatically created for primary keys. Indexing a string (email) is much slower than indexing numeric data types because string comparisons are more complex and consume more disk space.
- Changes to Key: Emails can change. Updating a primary key creates a ripple effect in all related tables.
- Security Risks: Emails are often exposed, making them vulnerable to enumeration attacks.
Why Auto-Incremented Integers or Longs?
Benefits:
- Efficient Indexing: Numeric data types (e.g., integers) are compact and allow fast indexing.
- Simple to Implement: Auto-incrementing integers or longs are straightforward for database management.
Problems:
- Limited Range: An integer can store 2322^{32} values (approximately 2 billion), which may not suffice for large systems. Longs (2642^{64}) offer 101810^{18} values but come with their own challenges.
- Predictability: Sequential IDs can expose patterns. For example, someone could easily guess the next ID and scrape sensitive data (e.g., Twitter streams or e-commerce product details).
- Scalability: Auto-increment doesn’t scale well across distributed systems. Different databases or shards need unique ID ranges, which introduces complexity and imbalance.
UUID: A Better Alternative
A UUID (Universally Unique Identifier) is a 128-bit number that ensures uniqueness across systems and use cases. UUIDs are represented as 32-character hexadecimal strings, which can encode numbers up to 103610^{36}.
Example of Hexadecimal Representation:
Hexadecimal uses base-16:
- Decimal 0 to 15 maps to:
0, 1, 2, ..., 9, A, B, C, D, E, F
- Example UUID:
550e8400-e29b-41d4-a716-446655440000
Types of UUIDs:
- UUIDv1: Based on timestamp and node (often the MAC address). It can leak system information.
- UUIDv4: Randomly generated. While secure, completely random UUIDs can slow down insertion in sorted tables.
- UUIDv7: Combines timestamps with randomness, ensuring that newly generated UUIDs are always greater than previous ones.
Why UUIDv7?
- Sequential Order: UUIDv7 ensures that newer UUIDs are always greater than older ones, reducing the performance overhead of inserting rows in sorted tables.
- Scalability: Works seamlessly across distributed systems without coordination.
- Security: Combines randomness and timestamp, making the IDs hard to guess.
Example of UUIDv7:
- A timestamp portion ensures order:
01890c9f-7000-7000-8000-000000000000
Challenges with UUIDs in SQL Databases
- Storage Overhead: UUIDs require 128 bits, which is larger than integers or longs.
- Insertion Performance: In SQL tables sorted by primary key, inserting a random UUID (e.g., UUIDv4) requires shifting rows to maintain order, causing significant disk I/O overhead.
How SQL Tables Are Stored
SQL databases store rows on disk sorted by the primary key. When a new row is inserted:
- If the primary key value is less than existing keys, rows are shifted to insert the new row in order.
- This shifting process increases write latency for random keys.
Why Sequential IDs Help:
Sequential IDs (like UUIDv7 or auto-increment) avoid shifting rows, improving insertion speed.
Solutions for Distributed Systems
In distributed environments:
- Partitioned Ranges: Assign ID ranges to different nodes (e.g., Node 1: 1-10,000; Node 2: 10,001-20,000). However, this creates hotspots as newer IDs are fetched more frequently.
- Composite Keys: Combine machine ID, timestamp, and a local counter to ensure uniqueness.
- UUIDv7: Allows distributed databases to generate unique IDs independently while maintaining order.