Use SQL Server to Sort Alphanumeric Values

·

·

,

Sort Alphanumeric Values with SQL Server

Recently a reader asked me how to sort an alpha numeric field in SQL in natural order.  Alphanumeric values are commonly found and don’t sort naturally using numeric methods.

When we have a numeric fields we expect values to be sorted in ordinal order, for example: 1,2,5,7,10,11,15,20,21

However when these numbers are in character field, such as char or varchar, the sort becomes  alphabetic and the ordering not what we may wish:  1,10,11,15,2,20,21,5,7.

For numerals in a text field we could easily convert the values to numeric values and sort; however, this option isn’t available if the column values contain alphanumeric values.

NOTE:  This is somewhat of an intermediate to advanced article.  I’m assuming that you already know of the LEFT, SUBSTRING, and CONVERT functions.  All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks2012 database.  You can get started using these free tools using my Guide Getting Started Using SQL Server.

Sorting Alphabetic versus Alphanumeric

For this example we’ll assume we have defined a table named Section that consists of one varchar(50) field named section.  Here is the definition:

Section Table Definition

Our table contains the following alphanumeric values

1
AB1
A1
B2
A11
B20
B21
AB10
B3
AB100
2
B1
B32
11
A10

Which we want to sort alphanumerically as:

1
2
11
A1
A10
A11
AB1
AB10
AB100
B1
B2
B3
B20
B21
B32

But when you try

SELECT Section
FROM Section
ORDER BY Section

The results aren’t as we expect

1
11
2
A1
A10
A11
AB1
AB10
AB100
B1
B2
B20
B21
B3
B32

The items in red aren’t sorted to according to alphanumeric order.  The reason for this is that SQL sorts character by character from left to right.  In its mind, the correct alphabetical sequence is …B1, B2, B20, B21, B3,…

Once nice feature about SQL is that you can sort an expression.  Because of this we can create our own formula to sort.

There are two approaches.  The first is a simple approach that will work with my example and the second is much more complex, but works in general with any alphanumeric data.

First the simple case:

In the simple case the idea is to separate the alpha characters from the numeric, then sort by the alpha characters, convert the numeric portion to a number and sort.

Conceptually our SQL statement will look like:

SELECT Section
FROM Section
ORDER BY SectionAlpha, SectionNumeric

We really don’t have columns for SectionAlpha and SectionNumeric, instead we are going to use two expressions to separate the character portion of the data from the numeric.

The trick is to know where the number begins.  To do this we can use a special formula called PATINDEX.

This formula can be used to find the character position within a string that matches a particular pattern, such as ‘%[0-9]%’

We talked about patterns in our article on pattern matching.  You may recall that %[0-9]% means to return a match, if any position in the string is in the range of characters 0-9.

For instance PATINDEX(‘SQL2005’, ‘%[0-9]%’) returns 4, since the numeral 2 is in the fourth position.

Stated differently, the first character to match %[0-9]% is 2, which is in position 4.

PATINDEX - Alphanumeric

Knowing where the number starts is half the story.  We now need to separate the value into a character and numeric portion.   To do this we’ll use two string functions:

  • LEN(expression)  – Returns the number of characters found in an expression.  LEN(‘MICHIGAN’) returns 8.
  • LEFT(expression, length) – Starting from the left, position 1, this function returns the length amount of characters.  LEFT(‘Airplane’,3) returns ‘Air’
  • SUBSTRING(expression, start, length) – Starting at a specified position, this function returns the specified amount of characters from a character expression.  SUBSTRING(‘Airplane’,4,2) returns ‘pl’

Using the diagram above, you can see that everything to the left of the number is a character.

To get the alpha portion we use the following expression:

LEFT(Section,PATINDEX('%[0-9]%',Section)-1)

The expression instructs SQL to get the all the characters to the left of the first numeric character in the section column.

Then numeric portion is obtained using

SUBSTRING(Section,PATINDEX('%[0-9]%',Section),LEN(Section))

This instructs SQL to get all characters, starting where the first numeric character is found until the end of the value.

Finally we take the numeric string we find and convert it to an integer using the CONVERT function.

The final format for the selection statement is

SELECT Section
FROM dbo.Section
ORDER BY LEFT(Section, PATINDEX('%[0-9]%', Section)-1), -- alphabetical sort
         CONVERT(INT, SUBSTRING(Section, PATINDEX('%[0-9]%', Section), LEN(Section))) -- numerical

If you want to try out the example, start up SQL Server Management Studio and paste the following code into a new query window.

USE [AdventureWorks2012_Data]
CREATE TABLE dbo.Section
(
       Section varchar(50) NULL
)
INSERT INTO dbo.Section (Section.Section) VALUES ('1')
INSERT INTO dbo.Section (Section.Section) VALUES ('AB1')
INSERT INTO dbo.Section (Section.Section) VALUES ('A1')
INSERT INTO dbo.Section (Section.Section) VALUES ('B2')
INSERT INTO dbo.Section (Section.Section) VALUES ('A11')
INSERT INTO dbo.Section (Section.Section) VALUES ('B20')
INSERT INTO dbo.Section (Section.Section) VALUES ('B21')
INSERT INTO dbo.Section (Section.Section) VALUES ('AB10')
INSERT INTO dbo.Section (Section.Section) VALUES ('B3')
INSERT INTO dbo.Section (Section.Section) VALUES ('AB100')
INSERT INTO dbo.Section (Section.Section) VALUES ('2')
INSERT INTO dbo.Section (Section.Section) VALUES ('B1')
INSERT INTO dbo.Section (Section.Section) VALUES ('B32')
INSERT INTO dbo.Section (Section.Section) VALUES ('11')
INSERT INTO dbo.Section (Section.Section) VALUES ('A10')
SELECT Section
FROM dbo.Section
--Show normal Sort
SELECT Section
FROM dbo.Section
ORDER BY Section
--Show AlphaNumberic Sort
SELECT Section
FROM dbo.Section
ORDER BY LEFT(Section,PATINDEX('%[0-9]%',Section)-1), -- alphabetical sort
         CONVERT(INT,SUBSTRING(Section,PATINDEX('%[0-9]%',Section),LEN(Section))) -- numerical sort
--cleanup our work
DROP Table dbo.Section

When you run this code you’ll see three results:  1.  The unsorted table, 2.  The table sorted with the built-in sort, and 3.  The table sorted alphanumerically.

Hopefully through this example you’ve see that it is possible to construct a customize sort expression to sort columns alphanumerically.

However, as you have noticed this example only works when the alphanumeric value consists of one alpha and one numeric section.  What about values such as MI10.5AB23 or MI200.5AB500?  Is there an easy way to sort these values alphanumerically?

General Case for Sorting Alphanumeric Values

I recently came across a really good article by Craig Finck on Alphanumeric Sorting In MSSQL, (update 2022: Article is no longer available).  This article is really advanced, and the code complicated, but the short of the matter is that you can create a UDF (User Defined Function) to create the alpha numeric sort value.

Once you read the article, you’ll see how you can define a UFD called fn_CreateAlphanumericSortValue, which you can then call from your SQL.  Later on on teach you how to create and define UDF’s, but since this article really provides a great general solution to the problem, I wanted to included it as part of this article, so those looking for a really workable solution would know of it, and be able to use it.

Using our example, you could use the UDF to sort as:

SELECT Section
FROM dbo.Section
ORDER BY dbo.fn_CreateAlphanumericSortValue(section)

At this point you may be saying hey!  Why did you go through that first example and all those functions instead of just showing us this general case and the UDF?

Two reasons:

  1. The UDF is pretty complicated and very advanced.  It would be hard to teach, and I wanted you to learn something.  Hopefully you leaned some more about sorts, and also how to use PATINDEX, LEFT, and SUBSTRING to manipulate character values.
  2. Using user defined functions can solve a lot of problem, but beware that they can cause performance issue. If you inspect this definition you’ll see there is a lot of program logic.  This logic has to be run for each record.  This can severely impact the performance of any large query that used this function to soft values.

Because of this, I wouldn’t recommend using this function for every case;  It works good for complicated cases, but for simple cases, such as the one I explained, it is easier to use simple string functions to break out the value and sort.

15 responses to “Use SQL Server to Sort Alphanumeric Values”
  1. Is it good idea to have primary key by varchar. They told it would be a heavy duty for sql performance ?

  2. sanam shaikh

    Hi Kris,

    Your article is very informative.. thanks for sharing..
    but my query bit different, I would appreciate it if you could help me.

    my input is this . and i want it to be sorted alphanumerically

    PRC42995_1
    PRC42995_11
    PRC42995_12
    PRC42995_2
    PRC42995_3
    PRC42995_4
    PRC42995_5
    PRC42995_6
    PRC42995_7
    PRC42995_8
    PRC42995_10
    PRC42995_9

  3. Ahammad Ali

    The sort query will give you an error if you have a value without number , say AB . The error will be like , “invalid length parameter passed to the LEFT or SUBSTRING function”

  4. BHUSHAN

    If fails when we add one more new section value as below

    INSERT INTO dbo.Section (Section.Section) VALUES (‘7AB10’)

    idealy it must be start after numaric range ends. like

    1
    2
    3
    7AB10
    AB
    AB71
    AB71

  5. Harsh Kumar Sharma

    i have a problem if you can help me then please try to solve this one and give a proper code for my problem

    create database db5

    create table Employee(id int, Registration_no varchar(50),Name varchar(50))

    insert into Employee values(1,’DLW/TTC/19/3′,’RAMESH’)
    insert into Employee values(2,’DLW/TTC/19/2′,’RAJEEV’)
    insert into Employee values(3,’DLW/TTC/19/1′,’RUPAK’)
    insert into Employee values(4,’DLW/TTC/19/4′,’RAMLAAL’)
    insert into Employee values(5,’DLW/TTC/19/8′,’RITESH’)
    insert into Employee values(6,’DLW/TTC/19/6′,’HRITIK’)
    insert into Employee values(7,’DLW/TTC/19/9′,’ROSHAN’)
    insert into Employee values(8,’DLW/TTC/19/7′,’RUPALI’)
    insert into Employee values(9,’DLW/TTC/19/5′,’SHRISTI’)
    insert into Employee values(10,’DLW/TTC/19/10′,’ROSHNI’)

    select * from Employee

  6. Harsh Kumar Sharma

    Its not working brother.
    i have pasted your code in sql but its not as you explained above
    it just sorted it like an usual ORDER BY do

  7. AJ Behzadi

    OK this is awesome. Now what do I do when I have the number before the (a-)?
    This is what I have:
    1 – OUTCO
    10 – OUT
    11 – MATCH
    12 – UNRSL
    2 – INCO
    3 – UNDEL
    4 – MAIL
    5 – NOTSU
    6 – NOMSI
    7 – RSRCH
    9 – IN
    9 – INCOM

    this is what I wrote after I read your article:

    GROUP BY CONVERT(int,vi.import_code + ‘ – ‘ + vi.import_reason) with rollup
    ORDER BY CONVERT(INT,Left((vi.import_code + ‘ – ‘ + vi.import_reason),2))

    But I got this Error:

    Conversion failed when converting the varchar value ‘IN’ to data type int.

    Any ideas?

    PLEASE.

    Thank you.

  8. Tony

    I am sorting track positions from LP record albums. No problem with trk1, trk2, etc, or with trkA1, trkB2, etc. The problem is when there is a subtrack like trkA1a, trkA1b, trkA1c, etc. I get a “cant convert to int error”
    How can I rewrite your ORDER BY string to correct for this?
    Thanks

    1. I would look at this article, http://technologycraftsmen.net/blog/2010/10/19/alphanumeric-sorting-in-mssql/, they show a good way to sort alphanumeric values.

  9. Chad

    This solution has two “holes”:
    As has already been mentioned if the numeric portion of the text field exceeds the constraints of the data type used and it will also fail if there is no numeric portion in the text field. A more bullet proof solution would be to change the
    “–Show AlphaNumberic Sort” section to read:

    SELECT Section
    FROM @Section
    ORDER BY
    LEFT(Section,CASE PATINDEX(‘%[0-9]%’,Section) WHEN 0 THEN LEN(Section) ELSE PATINDEX(‘%[0-9]%’,Section)-1 END),
    RIGHT(REPLICATE(‘0’,50) + SUBSTRING(Section, CASE PATINDEX(‘%[0-9]%’,Section) WHEN 0 THEN LEN(Section) ELSE PATINDEX(‘%[0-9]%’,Section) END,LEN(Section)),50)

    Changing the second arguments of the REPLICATE and the RIGHT functions from 50 to the length of your varchar field.

    HTH.

    1. Chad

      Whoops, I should have changed the FROM statement from
      FROM @Section
      to
      FROM dbo.Section

  10. suma

    I am getting The conversion of the nvarchar value ‘11014110141’ overflowed an int column.
    please help

    1. Hi, The issue is that the text value is being converted to a number larger than an int datatype column is able to hold. The maximum value for an int is 2,147,483,647. The value you are trying to convert from is 11,014,110,141, which is too large. Two solutions come to mind:
      1. Change the columns data type from int to big int (see http://msdn.microsoft.com/en-us/library/ms187745.aspx)
      2. Filter your input and don’t attempt to convert rows having values larger than int can convert.

      KRis.

      1. suma

        I have already tried with bigint…I got” Error converting data type varchar to bigint.”

        1. If the varchar column has non numeric data within it, and you doing an operation where the database is expecting a number, it will fail.
          Without knowing all the details about what you’re trying to do, I would say you first have to filter out any non-numeric data.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SqlServer