Windows Phone 7 Developer Launch - Learn More
kick it on DotNetKicks.com   Shout it  

Sequential Guid Algorithm – Improving the algorithm

My last post addressed generating Guid values which were sequential across system boundaries. The problem with other algorithms is two-fold. First, they are only sequential for a single system. Second, they are based off the MAC address of the system on which they were generated. When you need to generate client id values and store them in an SQL database these two properties create fragmented indexes and increase the number of page splits.

My last attempt was OK, but a problem with the algorithm was that the values were only sequential to an accuracy of 1 millisecond. While this would not be a problem for most systems, it still bugged me. Even though I could still generate sequential values at a rate of 1000/sec, real world systems are not so kind as to be so evenly spaced. Values are most likely to be farther apart, but generated in bursts.

I wanted an algorithm that was more accurate and allowed for more values to be generated per second, thus reducing page splits and fragmentation even more.

I took some time to research hi-resolution timers and found that the recommended way of using a timer was the QueryPerformanceCounter Windows API function. But during my research I found that the System.Diagnostics.Stopwatch already uses QueryPerformanceCounter, so I thought I’d give it a try.

In my previous set of tests in order to reduce fragmentation I added a Thread.Sleep(1) to prevent generating values at a rate of more than 1/millisecond. Using Stopwatch, I was able to remove the call to Thread.Sleep and had all the generated values were sequential.

Here’s the new implementation:

public class Comb
{
    private static Stopwatch perfTimer;
    private static long startTime;
 
    static Comb()
    {
        perfTimer = new Stopwatch();
        startTime = DateTime.UtcNow.Ticks;
        perfTimer.Start();
    }
 
    private static long GetTicks()
    {
        return startTime + perfTimer.ElapsedTicks;
    }
 
    /// <summary>
    /// Creates a new sequential guid (aka comb) <see cref="http://www.informit.com/articles/article.aspx?p=25862&seqNum=7"/>.
    /// </summary>
    /// <remarks>A comb provides the benefits of a standard Guid w/o the database performance problems.</remarks>
    /// <returns>A new sequential guid (comb).</returns>
    public static Guid NewComb()
    {
        byte[] uid = Guid.NewGuid().ToByteArray();
        byte[] binDate = BitConverter.GetBytes(GetTicks()); // use UTC now to prevent conflicts w/ date time savings
 
        // create comb in SQL Server sort order
        byte[] comb = new byte[uid.Length];
 
        // the first 7 bytes are random - if two combs
        // are generated at the same point in time
        // they are not guaranteed to be sequential.
        // But for every DateTime.Tick there are
        // 72,057,594,037,927,935 unique possibilities so
        // there shouldn't be any collisions
        comb[3] = uid[0];
        comb[2] = uid[1];
        comb[1] = uid[2];
        comb[0] = uid[3];
        comb[5] = uid[4];
        comb[4] = uid[5];
        comb[7] = uid[6];
 
        // set the first 'nibble of the 7th byte to '1100' so 
        // later we can validate it was generated by us
        comb[6] = (byte)(0xc0 | (0xf & uid[7]));
 
        // the last 8 bytes are sequential,
        // these will reduce index fragmentation
        // to a degree as long as there are not a large
        // number of Combs generated per millisecond
        comb[9] = binDate[0];
        comb[8] = binDate[1];
        comb[15] = binDate[2];
        comb[14] = binDate[3];
        comb[13] = binDate[4];
        comb[12] = binDate[5];
        comb[11] = binDate[6];
        comb[10] = binDate[7];
 
        return new Guid(comb);
    }
 
    /// <summary>
    /// Validates if comb was generated by this class
    /// </summary>
    /// <remarks>
    /// Guids generated by Guid.NewGuid() have a value of 
    /// 0100 for the first 4 bits of the 7th byte. Ours will
    /// have a value of 1100 for the 6th byte. We're checking that here.
    /// 
    /// We could do additional validation by verifying that
    /// the value of a new Guid is greater than the
    /// one being validated (or that the last 6 bytes
    /// resolve to a valid DateTime), but this should
    /// be enough for now.
    /// </remarks>
    public static bool IsComb(Guid value)
    {
        // get the 7th byte
        byte b = value.ToByteArray()[6];
 
        // make sure the first 'nibble' == 1100
        return (0xc0 & b) == 0xc0;
    }
 
    /// <summary>
    /// Validates Guid to determine the supplied
    /// value was generated by Comb.NewComb. If
    /// invalid an ArgumentException is thrown.
    /// </summary>
    /// <param name="value"></param>
    public static void ValidateComb(Guid value)
    {
        if (!Comb.IsComb(value))
            throw new ArgumentException("The supplied Id value was not generated by Comb.NewComb.");
    }
}
 

There is now a static constructor, two private members and a GetTicks() method. Stopwatch is not a DateTime, but a counter representing the elapsed time since the Start() method was called. So I record DateTime.UtcNow.Ticks at “startup” and then add the number of ticks from Stopwatch to the start time.

Running the test in a single thread for 100,000 records for the previous version resulted in 31% fragmentation. The new version is now down to 1.2%. Also, the records are all sequential whereas about half the records in the previous version were not indexed in the same order they were inserted.

However, using Task.Factory.StartNew on a dual core machine the fragmentation is about the same – 45%. Also the number of “mis-ordered” records is closer (60% verses 50% – about a 10% improvement).

The newer version definitely improves on the previous, but if you’re in a high-throughput environment with a large number of clients you’ll see a good amount of fragmentation still. However, the important thing to remember is that compared to standard Guid values (non-sequential) you’ll see much better performance. The reason is because your data distribution will be completely random across your indexes, while using a Comb ensures your data is always being added at the end of your index, even if values generated at the exact same moment will be fragmented. If you can afford to defrag your indexes you’ll be fine – whereas without using a Comb value your performance will still suffer.

Update: 10/29/2010

When I first wrote the NewComb method (before I came across the Sql Server sort order issue) I was using the 8th byte as a way to validate a Guid value and determine if it were generated by the NewComb method or not. During the discussion in the comments below I realized that when I changed the method to change the byte order to use the Sql Server sort order I was setting the 7th byte. This breaks the validation since, as it was pointed out to me, 1 in 16 values generated by Guid.NewGuid() could match my validation. So below is an update to fix that problem. This works because, as I said before, Guid.NewGuid(), sets the first nibble of the 8th byte to 0x40. Obviously this won’t work with Guid values generated outside of the .NET Framework. But if that is your situation, then this won’t work and you’ll need to find another means or just skip the validation altogether.

public static Guid NewComb()
{
    byte[] uid = Guid.NewGuid().ToByteArray();
    byte[] binDate = BitConverter.GetBytes(GetTicks()); // use UTC now to prevent conflicts w/ date time savings
 
    // create comb in SQL Server sort order
    byte[] comb = new byte[uid.Length];
 
    // the first 7 bytes are random - if two combs
    // are generated at the same point in time
    // they are not guaranteed to be sequential.
    // But for every DateTime.Tick there are
    // 72,057,594,037,927,935 unique possibilities so
    // there shouldn't be any collisions
    comb[0] = uid[0];
    comb[1] = uid[1];
    comb[2] = uid[2];
    comb[3] = uid[3];
    comb[4] = uid[4];
    comb[5] = uid[5];
    comb[6] = uid[6];
 
    // set the first 'nibble of the 8th byte to '1100' so 
    // later we can validate it was generated by us
    comb[7] = (byte)(0xc0 | (0xf & uid[7]));
 
    // the last 8 bytes are sequential,
    // these will reduce index fragmentation
    // to a degree as long as there are not a large
    // number of Combs generated per millisecond
    comb[9] = binDate[0];
    comb[8] = binDate[1];
    comb[15] = binDate[2];
    comb[14] = binDate[3];
    comb[13] = binDate[4];
    comb[12] = binDate[5];
    comb[11] = binDate[6];
    comb[10] = binDate[7];
 
    return new Guid(comb);
}
 
public static bool IsComb(Guid value)
{
    // get the 8th byte
    byte b = value.ToByteArray()[7];
 
    // make sure the first 'nibble' == 1100
    return (0xc0 & b) == 0xc0;
}
 

Update: 6/9/2011

By eliminating the Stopwatch (as suggested by Augi) and optimizing the steps to copy the arrays to the new Guid I’ve been able to reduce the time it takes to generate a new Comb by about 66%. The old version takes 22-25 milliseconds to generate 10,000 Combs, while the new version takes 6-9 milliseconds on my Phenom X4 with 4GB of RAM. Here’s the modified version:

public class Comb
{
    private static int counter;
 
    private static long GetTicks()
    {
        int i = Interlocked.Increment(ref counter);
        // use UTC now to prevent conflicts w/ daylight savings
        return DateTime.UtcNow.Ticks + i;
    }
 
    /// <summary>
    /// Creates a new sequential guid (aka comb) <see cref="http://www.informit.com/articles/article.aspx?p=25862&seqNum=7"/>.
    /// </summary>
    /// <remarks>A comb provides the benefits of a standard Guid w/o the database performance problems.</remarks>
    /// <returns>A new sequential guid (comb).</returns>
    public static Guid NewComb()
    {
        byte[] uid = Guid.NewGuid().ToByteArray();
        byte[] binDate = BitConverter.GetBytes(GetTicks()); 
 
        // create comb in SQL Server sort order
        return new Guid(new[]{
            // the first 7 bytes are random - if two combs
            // are generated at the same point in time
            // they are not guaranteed to be sequential.
            // But for every DateTime.Tick there are
            // 72,057,594,037,927,935 unique possibilities so
            // there shouldn't be any collisions                
            uid[0],
            uid[1],
            uid[2],
            uid[3],
            uid[4],
            uid[5],
            uid[6],
 
            // set the first 'nibble of the 7th byte to '1100' so 
            // later we can validate it was generated by us
            (byte)(0xc0 | (0xf & uid[7])),
            
            // the last 8 bytes are sequential,
            // these will reduce index fragmentation
            // to a degree as long as there are not a large
            // number of Combs generated per millisecond                
            binDate[1],
            binDate[0],
            binDate[7],
            binDate[6],
            binDate[5],
            binDate[4],
            binDate[3],
            binDate[2]
        });
    }
 
    /// <summary>
    /// Validates if comb was generated by this class
    /// </summary>
    /// <remarks>
    /// Guids generated by Guid.NewGuid() have a value of 
    /// 0100 for the first 4 bits of the 7th byte. Ours will
    /// have a value of 1100 for the 6th byte. We're checking that here.
    /// 
    /// We could do additional validation by verifying that
    /// the value of a new Guid is greater than the
    /// one being validated (or that the last 6 bytes
    /// resolve to a valid DateTime), but this should
    /// be enough for now.
    /// </remarks>
    public static bool IsComb(Guid value)
    {
        // get the 7th byte
        byte b = value.ToByteArray()[7];
 
        // make sure the first 'nibble' == 1100
        return (0xc0 & b) == 0xc0;
    }
 
    /// <summary>
    /// Validates Guid to determine the supplied
    /// value was generated by Comb.NewComb. If
    /// invalid an ArgumentException is thrown.
    /// </summary>
    /// <param name="value"></param>
    public static void ValidateComb(Guid value)
    {
        if (!Comb.IsComb(value))
            throw new ArgumentException("The supplied Id value was not generated by Comb.NewComb.");
    }
}

I haven’t tested it, but it’s likely that in a distributed environment (web farm or similar) where there are a large number of value generated w/in the same “tick window” the new version is likely to have different fragmentation statistics. Still it should not be a problem for 99% of the applications out there.

tags: Guid, Comb, SQL Server, Uniqueidentifier, .NET

kick it on DotNetKicks.com   Shout it  

Feedback

# Sequential Guid Algorithm – Improving the algorithm

Gravatar Thank you for submitting this cool story - Trackback from DotNetShoutout 10/13/2010 2:45 PM | DotNetShoutout

# re: Sequential Guid Algorithm – Improving the algorithm

Gravatar Hi,

I was solving the sequential UUID issue too because I'm using Cassandra database. The problem can be reduced to hi-res time obtaining problem at all :)

I have three remarks on your solution:

1) You mix DateTime.UtcNow.Ticks and perfTimer.ElapsedTicks. One tick from DateTime always takes 100 ns. One tick from Stopwatch takes amount of time that depends on Stopwatch.Frequency value. See the yellow box here: msdn.microsoft.com/.../...s.stopwatch.elapsedticks(v=VS.90).aspx

2) Performance Counters are precise in relative but inaccurate in absolute. So you should synchronize the time using DateTime.UtcNow, i.e. every 10 seconds. See one of the answers here: stackoverflow.com/.../how-frequent-is-datetime-...

3) There is big issue with Performance Counters on multi-core machines. I really had this issue so we should matter. See remarks here: http://msdn.microsoft.com/en-us/library/ms644904(VS.85).aspx

The last two points are the reasons why I don't use Performance Counter for time obtaining. Instead, I'm using incrementing number as Ayende, But I set this additional number to zero (in thread-safe manner of course) after DateTime.UtcNow.Ticks changes. In this way, I'm able to get unique time 10000x during one tick (I suppose Windows NT). 10/14/2010 12:56 AM | Augi

# re: Sequential Guid Algorithm – Improving the algorithm

Gravatar Additional info to 1) - If you just want to generate sequential GUID then your solution is probably ok (if we don't take 2 and 3 into account). In my case, I want to extract time from UUID too and then time preciseness does matter. 10/14/2010 1:19 AM | Augi

# re: Sequential Guid Algorithm – Improving the algorithm

Gravatar You have the implication of that 0xC0 nibble backwards.

If the the object is a Comb, then it has nibble==0xC0. You can't infer that if the nibble is 0xC0 then it is a Comb. There is a 1 in 16 chance that a non-Comb has a nibble of 0xC0.

As a thought experiment, consider if instead of setting 4 bits as a signature you set only 1. Would you still think that something was a Comb-style GUID if the one-bit signature matched?
10/14/2010 8:27 AM | Hugh Brown

# re: Sequential Guid Algorithm – Improving the algorithm

Gravatar @hugh, the thing about the decision about the nibble is that when the Guid is generated by Guid.NewGuid() the value of that nibble is always 0x40 (0100). I realize I am taking a risk that someone could generate their own guid using 0xC0 but that is a different concern than the possibility that the value generated by Guid.NewGuid would have the same value of Comb.NewComb. Guid.NewGuid will never have 0xC0. Give it a try for yourself.

As for additional verification beyond checking the nibble, I have considered generating a new Comb, if the new one is "greater than" the value passed in then it would pass validation. Again, it is not a guarantee, but with the two checks the probability increases.

Finally, I could also deconstruct the entire value and try and get a valid datetime from the past. For my current environment this is more checking than is needed and so the solution I have chosen is adequate. 10/14/2010 1:29 PM | MarkJMiller

# re: Sequential Guid Algorithm – Improving the algorithm

Gravatar @Augi, thanks for the feedback. My intent, for now, is just to create a sequential guid. If I need to do additional validation, as suggested by Hugh then I might also be interested in extracting the exact datetime from the generated guid. So DateTime.UtcNow serves as a starting point and then the stopwatch counter serves as my incremental counter, but based on time rather than a controlled sequence.

However, I had not looked into the accuracy over the lifetime of the appdomain. I'll look at taking your revisions to see if they make a difference, especially across app domains. 10/14/2010 1:35 PM | MarkJMiller

# re: Sequential Guid Algorithm – Improving the algorithm

Gravatar UPDATE - I've considered what @Augi mentioned and I made me realize that I made a mistake. Because I am reordering the bytes to match Sql Server sort order, I have moved the location of the 8th byte (index 7) which makes @Augi right. In order to prevent the problem he describes I shouldn't be reordering that byte.

Actually, there really isn't any point in reordering any of the first 8 bytes since they are random anyhow. I'll update the post to correct the error. I'll still be using the same technique to validate the comb, I'll just make sure the "tracking byte" is in the correct location. 10/25/2010 2:52 PM | MarkJMiller

# แท็บเล็ต

Gravatar Sequential Guid Algorithm - Improving the algorithm 10/4/2014 12:50 AM | แท็บเล็ต

Comments have been closed on this topic.
 

 

Copyright © Mark J. Miller