Features and Patterns for Managing Ordered Lists

The last is a technique used quite often, and that works. If you're afraid of running out of decimals you can refine it to detect when the average between the next two previous/following values cannot be distinguished from one of them (i.e.: you run out of decimals).

When this happens, you should resequence as many rows up or down as needed.

Let's assume you work with just 2 decimals, and you have two rows with Seq 1.00 and 1.01; the next value being 2.00. If you want to move a row using your technique between these two, you would have Seq = (1.00, 1.005, 1.01, 2.00). This would need more decimals than are available and, depending on the rounding rules, 1.005 would either be rounded to 1.00 or 1.01. In any case, you can detect that the inserted value is not distinguishable from either 1.00 or 1.01.

In that case, you need to "push down (or up) and make room". That is, you should change one more row, and have (1.00, 1.25, 1.50, 2.00). You've spread the change to two rows instead of just one.

This might need to be done recursively (or iteratively) if you ever run into a situation such as (1.01, 1.02, 1.03, 1.04, ...).

With this extensions to your original technique, you don't need to renormalize. This is taken care of only in an "as needed" basis, and scales quite well.

NOTEs:

Implementing the algorithm might be a bit tricky, because you need to take into account when you're at the first or last rows, and how to handle those, and when your Seq numbers reach the maximum or minimum available values.
The algorithm works with any number of decimals, including 0. That is, you can use it as well with integer values instead of decimal, which might prove more computing efficient, or, depending on how decimals are implemented, be also more space efficient.

My first suggestion would be to tweak the existing processes. Then I'll describe a re-design.

I don't think you need to re-sequence the whole set on every write. As long as you're prepared to accept gaps in the sequence and different numbers of decimal places you can avoid many executions.

For DELETE there is no need to change the sequences of the other rows. Just accept the gap in the numbers.

I take it the user cannot UPDATE the sequence directly so this is a non-issue.

For Move Up / Down simply swap the sequence numbers of the rows involved -- two single-row updates.

For Move Top allow the sequence to become negative. This may look weird but the computer will handle it just fine. Similarly for Move Bottom just let the sequence become large.

INSERT is the only tricky one. Now you cannot rely on neat additions of 0.5. Since we may now have gaps you have to split the sequences of adjoining rows (or add/ subtract 1 for inserting at the top/ bottom). As you say this could produce numbers with many decimal places. I don't think this is a concern, however. These numbers are not for human consumption (they're not, right?). Only the computer has to deal with them. By expanding the sequence to NUMERIC(38,28) you can have up to 90 adjacent inserts before the differences degenerate to zero. Since NUMERIC is exact (as opposed to FLOAT) there is no chance of increasingly small numbers being confused with each other. The application can track the number of decimal places in a newly-added sequence and trigger the re-sequencing when required.

The task which re-distributes the sequence numbers can be taken out of the interactive user session. A separate asynchronous process can take care of this. It can run on a schedule if writes are infrequent or be triggered after a certain number of user edits.

Further it does not need to touch every row in every execution. Only rows after the changed rows need ever receive a new sequence number. Yes, a "move to top" action will affect all rows but a "move to bottom" will not require any further updating.

If many rows are inserted consecutively it is likely their sequences will become increasingly small: 10.5, 10.25, 10.125 ... 10.0009765625. If you know which rows have changed since the last re-sequencing (if there is a last_updated column, for instance) the impact of re-sequencing can be further reduced. Take the FLOOR() of the lowest changed sequence number and CEILING() of the highest changed, then evenly distribute the changes between those limits. This will not return every sequence to an integer but it will reduce the number of decimal places involved. Fewer rows will be touched by the UPDATE.

My suggestion for a re-design would be to use a linked list, or better still a doubly-linked list. Remove the sequence column. The table becomes

MyTable
  Id    int not null primary key
  Prev  int null
  Next  int null

The first row has Prev value of NULL. The last row has a Next value of NULL. All other rows have a valid Id in both columns pointing at the respective rows.

Changing ordering is then an exercise in pointer management. At most four rows are updated. Say we have rows and "Next" foreign keys A -Next-> B -Next-> C -Next-> D.

To DELETE C we update B and
update B set Next = D
update D set Prev = B

To move C to top:
update A set Prev = C
update C set Prev = NULL, Next = A
update B set Next = D
update D set Prev = B

Ideally Prev and Next would be declared as Foreign Keys. Each would have a UNIQUE constraint, too. This makes writing the above UPDATE statements tricky as every statement has to respect the constraints at all times. It may be easier to construct, the movements as a series of DELETEs and INSERTs.

Constructing the list could be achieved through a recursive CTE. Over time it is likely the sequence and the clustered key diverge. CTE performance will not be great in this case. As the use-case is for interactive user display I imagine the list is paged. So a recursive query over a page worth's of rows (100 rows?) will still be quick. Separate CTEs can be written for paging forward (following the Next links) and backward (following the Prev links).

Features and Patterns for Managing Ordered Lists

Tags:

Sql Server

Database Design

Related

Recent Posts