Organizational Research By

Surprising Reserch Topic

optimal way to delete specified rows from oracle


optimal way to delete specified rows from oracle  using -'oracle'

I have a project that needs to occasionally delete several tens of thousands of rows from one of six tables of varying sizes but that have about 30million rows between them. Because of the structure of the data I've been given, I don't know which of the six tables has the row that needs to be deleted in it so I have to run all deletes against all tables. I've built an INDEX against the ID column to try and speed things up, but it can be removed if that'll speed things up.

My problem is, that I can't seem to find an efficient way to actually perform the delete. For the purposes of my testing I'm running 7384 delete rows against single test-table which has about 9400 rows. I've tested a number of possible query solutions in Oracle SQL Developer:

7384 separate DELETE statements took 203 seconds:

delete from TABLE1 where ID=1000001356443294;
delete from TABLE1 where ID=1000001356443296;
etc...


7384 separate SELECT statements took 57 seconds:

select ID from TABLE1 where ID=1000001356443294
select ID from TABLE1 where ID=1000001356443296
etc...


7384 separate DELETE from (SELECT) statements took 214 seconds:

delete from (select ID from TABLE1 where ID=1000001356443294);
delete from (select ID from TABLE1 where ID=1000001356443296);
etc...


1 SELECT statement that has 7384 OR clauses in the where took 127.4s:

select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...


1 DELETE from (SELECT) statement that has 7384 OR clauses in the where took 74.4s:

delete from (select ID from TABLE1 where ID=1000001356443294 or ID = 1000001356443296 or ...)


While the last may be the fastest, upon further testing its still very slow when scaled up from the 9000 row table to even just a 200,000 row table (which is still < 1% of the final tableset size) where the same statement takes 14mins to run. While > 50% faster per row, that still extrapolates up to about a day when being run against the full dataset. I have it on good authority that the piece of software we used to us to do this task could do it in about 20mins.

So my questions are:


  
  Is there a better way to delete?
  Should I use a round of SELECT statements (i.e., like the second test) to discover which table any given row is in and then shoot off delete queries? Even that looks quite slow but...
  Is there anything else I can do to speed the deletes up? I don't have DBA-level access or knowledge.
  

    

asked Sep 30, 2015 by rajeevr642
0 votes
28 views



Related Hot Questions

3 Answers

0 votes

In advance of my questions being answered, this is how I'd go about it:

Minimize the number of statements and the work they do issued in relative terms.

All scenarios assume you have a table of IDs (PURGE_IDS) to delete from TABLE_1, TABLE_2, etc.

Consider Using CREATE TABLE AS SELECT for really large deletes

If there's no concurrent activity, and you're deleting 30+ % of the rows in one or more of the tables, don't delete; perform a create table as select with the rows you wish to keep, and swap the new table out for the old table. INSERT /*+ APPEND */ ... NOLOGGING is surprisingly cheap if you can afford it. Even if you do have some concurrent activity, you may be able to use Online Table Redefinition to rebuild the table in-place.

Don't run DELETE statements you know won't delete any rows

If an ID value exists in at most one of the six tables, then keep track of which IDs you've deleted - and don't try to delete those IDs from any of the other tables.

CREATE TABLE TABLE1_PURGE NOLOGGING
AS 
SELECT ID FROM PURGE_IDS INNER JOIN TABLE_1 ON PURGE_IDS.ID = TABLE_1.ID;

DELETE FROM TABLE1 WHERE ID IN (SELECT ID FROM TABLE1_PURGE);

DELETE FROM PURGE_IDS WHERE ID IN (SELECT ID FROM TABLE1_PURGE);

DROP TABLE TABLE1_PURGE;

and repeat.

Manage Concurrency if you have to

Another way is to use PL/SQL looping over the tables, issuing a rowcount-limited delete statement. This is most likely appropriate if there's significant insert/update/delete concurrent load against the tables you're running the deletes against.

declare
  l_sql varchar2(4000);
begin
  for i in (select table_name from all_tables 
             where table_name in ('TABLE_1', 'TABLE_2', ...)
             order by table_name);
  loop
    l_sql := 'delete from ' || i.table_name || 
             ' where id in (select id from purge_ids) ' || 
             '   and rownum <= 1000000';
    loop
      commit;
      execute immediate l_sql;
      exit when sql%rowcount <> 1000000;  -- if we delete less than 1,000,000
    end loop;                             -- no more rows need to be deleted!
  end loop;
  commit;
end;
answered Sep 30, 2015 by android_master
0 votes

Store all the to be deleted ID's into a table. Then there are 3 ways. 1) loop through all the ID's in the table, then delete one row at a time for X commit interval. X can be a 100 or 1000. It works on OLTP environment and you can control the locks.

2) Use Oracle Bulk Delete

3) Use correlated delete query.

Single query is usually faster than multiple queries because of less context switching, and possibly less parsing.

answered Sep 30, 2015 by 20shahi
0 votes

First, disabling the index during the deletion would be helpful.

Try with a MERGE INTO statement :
1) create a temp table with IDs and an additional column from TABLE1 and test with the following

MERGE INTO table1 src
USING (SELECT id,col1
         FROM test_merge_delete) tgt
ON (src.id = tgt.id)
WHEN MATCHED THEN
  UPDATE
     SET src.col1 = tgt.col1
  DELETE
   WHERE src.id = tgt.id
answered Sep 30, 2015 by sumit_jaiswalmca

...