Log in / create account | Login with OpenID
DocForge
Programmer's Wiki

GUID

From DocForge

A GUID, or Global Unique IDentifier, is a set of characters used to identify an object or data element and which is guaranteed to be unique within a system. A GUID may be unique within one small system, such as one computer or one database, or it may be unique across multiple, even very diverse, systems. Some GUIDs are truly global, being unique among all computer systems on the planet, but in practice they are more often unique within one smaller set of systems.

Contents

[edit] Benefits

  • Uniqueness of identifiers prevents conflicts. Referencing something by just its identifier is sufficient because nothing else within the system can have the same identifier.
  • Comparison of data can often be simplified with GUIDs. To compare any two objects within the system, only the GUIDs need to be compared to determine if they are referring to the exact same thing. The other properties of each object do not need to be compared.
  • In larger systems with multiple subsystems, ranges of GUIDs can be used within each subsystem to prevent clashes in the overall system. This is especially useful with multi-master database replication.

[edit] Drawbacks

  • Implementation of GUIDs can sometimes be difficult. Algorithms to maintain uniqueness across multiple disconnected systems have added complexity, often requiring the inclusion of a unique identifier for each system. For example, two separate systems may generate the same identifier but can guarantee they are unique by appending something that's unique to the system generating it, such as a processor serial number.
  • Maintaining uniqueness across multiple systems can add administrative complexity.

[edit] Implementations

There are a variety of ways to implement and use GUIDs. Typically it's critical GUIDs be considered early in a system design.

[edit] Database Implementation

For software using shared databases it's possible to maintain a simple GUID within one database table. This is especially useful with multi-master database replication, where each master must provide unique record identifiers that will not later conflict when the databases are synchronized.

One table will store the current GUID for each master database server. Let's assume we have three master servers. The first will start with GUID 1, the second with 1 million, and the third with 2 million. This gives the first and second servers a set of 1 million GUIDs to work with.

Note this is just an example. A real implementation must thoroughly consider how quickly a range of GUIDs will be used. For some systems with vast amounts of data, the largest integer may not be large enough when shared among multiple databases. This implementation with an unsigned 8 byte integer is typically sufficient for a reasonably large website.

CREATE TABLE guid (
  server VARCHAR(255) NOT NULL,
  id BIGINT UNSIGNED NOT NULL,
  PRIMARY KEY (server));
 
INSERT INTO guid (server, id) VALUES ('db1', 1);
INSERT INTO guid (server, id) VALUES ('db2', 1000000);
INSERT INTO guid (server, id) VALUES ('db3', 2000000);

The software written to use this database system must now always retrieve GUIDs from the guid table by locking it for writes, selecting the current id based on the server name, increasing the id by the number of GUIDs required, and then unlocking the table.

Of course there will come a time when one of the servers runs out of GUIDs. Before that point the GUIDs must be incremented to non-conflicting ranges.

LOCK TABLES guid WRITE;
UPDATE guid SET id = 3000000 WHERE server = 'db1';
UPDATE guid SET id = 4000000 WHERE server = 'db2';
UPDATE guid SET id = 5000000 WHERE server = 'db3';
UNLOCK TABLES;

This effectively gives each server another batch of 1 million GUIDs to work with.

A GUID table is often very frequently updated. Each master database does not need to have the IDs from the other master databases. Therefore the GUID table can often be removed from database replication. To simplify even further, the table can be non-replicated and just one row on each server, but storing the server name within the table can help with maintenance. It's also very helpful for redundancy and recovery of system failures.

[edit] See Also