Organizational Research By

Surprising Reserch Topic

oracle how can i implement a natural order by in a sql query


oracle how can i implement a natural order by in a sql query  using -'sql,oracle'

e.g,

foo1
foo2
foo10
foo100


rather than

foo1
foo10
foo100
foo2


Update: not interested in coding the sort myself (although that's interesting in its own right), but having the database to do the sort for me.
    

asked Sep 30, 2015 by deven.bendale
0 votes
65 views



Related Hot Questions

5 Answers

0 votes

You can use functions in your order-by clause. In this case, you can split the non-numeric and numeric portions of the field and use them as two of the ordering criteria.

select * from t
 order by to_number(regexp_substr(a,'^[0-9]+')),
          to_number(regexp_substr(a,'[0-9]+$')),
          a;

You can also create a function-based index to support this:

create index t_ix1
    on t (to_number(regexp_substr(a, '^[0-9]+')),
          to_number(regexp_substr(a, '[0-9]+$')), 
          a);
answered Sep 30, 2015 by shegokar.anjeet
0 votes

For short strings, small number of numerics

If number of "numerics" and the maximum length are limited, there is a regexp-based solution.

The idea is:

  • Pad all numerics with 20 zeroes
  • Remove excessive zeroes using another regexp. This might be slow due to regexp backtracking.

Assumptions:

  • Maximum length of numerics is known beforehand (e.g. 20)
  • All the numerics can be padded (in other words, lpad('1 ', 3000, '1 ') will fail due do unable to fit padded numerics into varchar2(4000))

The following query is optimized for "short numerics" case (see *?) and it takes 0.4 seconds. However, when using such approach, you need to predefine padding length.

select * from (
  select dbms_random.string('X', 30) val from xmltable('1 to 1000')
)
order by regexp_replace(regexp_replace(val, '(\d+)', lpad('0', 20, '0')||'\1')
                      , '0*?(\d{21})', '\1');

"Clever" approach

Even though separate natural_sort function can be handy, there is a little-known trick to do that in pure SQL.

Key ideas:

  • Strip leading zeroes from all the numerics so 02 is ordered between 1 and 3: regexp_replace(val, '(^|\D)0+(\d+)', '\1\2'). Note: this might result in "unexpected" sorting of 10.02 > 10.1 (since 02 is converted to 2), however there is no single answer how things like 10.02.03 should be sorted
  • Convert " to "" so text with quotes works properly
  • Convert input string to comma delimited format: '"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"'
  • Convert csv to the list of items via xmltable
  • Augment numeric-like items so string sort works properly
  • Use length(length(num))||length(num)||num instead of lpad(num, 10, '0') as the latter is less compact and does not support 11+ digit numbers. Note:

Response time is something like 3-4 seconds for sorting list of 1000 random strings of length 30 (the generation of the random strings takes 0.2 sec itself). The main time consumer is xmltable that splits text into rows. If using PL/SQL instead of xmltable to split string into rows the response time reduces to 0.4sec for the same 1000 rows.

The following query performs natural sort of 100 random alpha-numeric strings:

select *
  from (
    select (select listagg(case when regexp_like(w, '^[0-9]')
                                then length(length(w))||length(w)||w else w
                           end
                   ) within group (order by ord)
              from xmltable(t.csv columns w varchar2(4000) path '.'
                                        , ord for ordinality) q
           ) order_by
         , t.*
    from (
           select '"'||regexp_replace(replace(
                                          regexp_replace(val, '(^|\D)0+(\d+)', '\1\2')
                                        , '"', '""')
                                    , '([^0-9]+)', '","\1","')||'"' csv
                , t.*
           from (
                  select dbms_random.string('X', 30) val from xmltable('1 to 100')
                ) t
         ) t
  ) t
order by order_by;

The fun part is this order by can be expressed without subqueries, so it is a handy tool to make your reviewer crazy:

select *
  from (select dbms_random.string('X', 30) val from xmltable('1 to 100')) t
 order by (
   select listagg(case when regexp_like(w, '^[0-9]')
                       then length(length(w))||length(w)||w else w
                  end
          ) within group (order by ord)
     from xmltable('$X'
            passing xmlquery(('"'||regexp_replace(replace(
                                                     regexp_replace(t.val, '(^|\D)0+(\d+)', '\1\2')
                                                   , '"', '""')
                                                , '([^0-9]+)', '","\1","')||'"')
                             returning sequence
                    ) as X
            columns w varchar2(4000) path '.', ord for ordinality) q
);
answered Sep 30, 2015 by okesh.badhiye
0 votes

I use the following function to 0-pad all sequences of digits shorter than 10 that could be found in the value, so that the total length of each to become 10 digits. It is compatible even with mixed sets of values that have one, many or none sequences of digits in them.

CREATE OR replace function NATURAL_ORDER(
    P_STR   varchar2
) return varchar2
IS
/** --------------------------------------------------------------------
    Replaces all sequences of numbers shorter than 10 digits by 0-padded
    numbers that exactly 10 digits in length. Usefull for ordering-by
    using NATURAL ORDER algorithm.
 */
    l_result  varchar2( 32700 );
    l_len     integer;
    l_ix      integer;
    l_end     integer;
begin
    l_result := P_STR;
    l_len := LENGTH( l_result );
    l_ix := 1;
    while l_len > 0 loop
        l_ix := REGEXP_INSTR( l_result, '[0-9]{1,9}', l_ix, 1, 0 );
        EXIT when l_ix = 0;
        l_end := REGEXP_INSTR( l_result, '[^0-9]|$', l_ix, 1, 0 );
        if ( l_end - l_ix >= 10 ) then
            l_ix := l_end;
        else
            l_result := substr( l_result, 1, l_ix - 1 )
                     || LPAD( SUBSTR( l_result, l_ix, l_end-l_ix ), 10, '0' )
                     || substr( l_result, l_end )
                     ;
            l_ix := l_ix + 10;
        end if;
    end loop;
    return l_result;
end;
/

For example:

select 'ABC' || LVL || 'DEF' as STR
  from (
          select LEVEL as LVL
            from DUAL
           start with 1=1
           connect by LEVEL <= 35
       )
 order by NATURAL_ORDER( STR )
answered Sep 30, 2015 by android_master
0 votes

A related question - which sort algorithm works best on mostly sorted data?

Edit

This is now a separate question: Which sort algorithm works best on mostly sorted data

answered Sep 30, 2015 by sanjaypal1983
0 votes

I keep seeing "solutions" to this problem that involve trying to find the first numbers in the string. But sometimes you have strings like "123_456_7" vs. "123_456_10" vs. "123_8_10", and a "natural sort order" would do:

123_8_10 123_456_7 123_456_10

while alphabetical sort would do 123_8_10 123_456_7 123_456_10

What we really need is for Oracle to implement natural sort order into their product so that it is available inherently. This would also allow indexes to be built on the natural sort order. As it is, we all have to do tricks (0-pad; "to_number" and substring, etc.).

answered Sep 30, 2015 by ukohale

...