Rails – Table partitioning with Postgresql

Sometime you have to deal with a huge table with dozen of million rows or hundred of million rows. And that’s also the time the performance of the database go down. The most common technique to help improve the db performance is do the indexing properly. However in this case, there’s another technique that will become very powerful which is table partitioning.

Here is the short definition of table partitioning: “Partitioning allows a table, index, or index-organized table to be subdivided into smaller pieces, where each piece of such a database object is called a partition. Each partition has its own name, and may optionally have its own storage characteristics.”

In this post, I will present a way that we could employ this idea in a ruby on rails application.

# run this to generate an empty Rails app
rails new demo-partition -d postgresql
# Add this to your Gemfile
# This gem help to populate some fake data
gem 'faker'

After having project setup, we gonna need to create a user model. And here is the migration file I got:

class CreateUsers < ActiveRecord::Migration
  def change
    create_table :users, id: :uuid do |t|
      t.string :name, null: false
      t.string :email, null: false
      t.string :address
      t.string :birthday

      t.timestamps
    end

    add_index :users, :id, unique: true
    add_index :users, :created_at
  end
end

Here is come to the interesting part. I will create 5 different partitions for the users table.

create table users_1 ( like users including all ) inherits (users);
alter table users_1 add constraint partition_check check (created_at >= '2013-09-01' and created_at < '2014-04-01'); 

create table users_2 ( like users including all ) inherits (users); 
alter table users_2 add constraint partition_check check (created_at >= '2014-04-01' and created_at < '2014-07-01'); 

create table users_3 ( like users including all ) inherits (users); 
alter table users_3 add constraint partition_check check (created_at >= '2014-07-01' and created_at < '2014-10-01'); 

create table users_4 ( like users including all ) inherits (users); 
alter table users_4 add constraint partition_check check (created_at >= '2014-10-01' and created_at < '2015-03-01'); 

create table users_5 ( like users including all ) inherits (users); 
alter table users_5 add constraint partition_check check (created_at > '2015-03-01');

Let me explain a bit what the code above does. It will create 5 different sub-table which will inherit all the fields and indexes from the users table (master) and add a constraint to each sub-tables. As you can see the constraint is the time when user created. So each sub-table will hold data for each period of time.

And the question is how do we insert into the correct partition. That’s gonna be another step which require us to do a trigger on Postgresql insertion.

CREATE OR REPLACE FUNCTION on_users_insert() RETURNS TRIGGER AS $$
BEGIN
   IF (NEW.created_at >= '2013-09-01' AND NEW.created_at < '2014-04-01') THEN 
      INSERT INTO users_1 VALUES (NEW.*); 
   ELSIF (NEW.created_at >= '2014-04-01' AND NEW.created_at < '2014-07-01') THEN 
      INSERT INTO users_2 VALUES (NEW.*); 
   ELSIF (NEW.created_at >= '2014-07-01' AND NEW.created_at < '2014-10-01') THEN 
      INSERT INTO users_3 VALUES (NEW.*); 
   ELSIF (NEW.created_at >= '2014-10-01' AND NEW.created_at < '2015-03-01') THEN 
      INSERT INTO users_4 VALUES (NEW.*); 
   ELSIF (NEW.created_at > '2015-03-01') THEN
      INSERT INTO users_5 VALUES (NEW.*);
   ELSE
   END IF;

   RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER insert_trigger
BEFORE INSERT ON users
FOR EACH ROW EXECUTE PROCEDURE on_users_insert()

One last thing we need to do is to turn on the Constraint Exclusion, otherwise all of the above will not work.

SET constraint_exclusion = on;

Cool, now we can put all the sql above into one single migration file like below

class CreatePartition1ForUser < ActiveRecord::Migration def change execute %Q{ create table users_1 ( like users including all ) inherits (users); alter table users_1 add constraint partition_check check (created_at >= '2013-09-01' and created_at < '2014-04-01'); create table users_2 ( like users including all ) inherits (users); alter table users_2 add constraint partition_check check (created_at >= '2014-04-01' and created_at < '2014-07-01'); create table users_3 ( like users including all ) inherits (users); alter table users_3 add constraint partition_check check (created_at >= '2014-07-01' and created_at < '2014-10-01'); create table users_4 ( like users including all ) inherits (users); alter table users_4 add constraint partition_check check (created_at >= '2014-10-01' and created_at < '2015-03-01'); create table users_5 ( like users including all ) inherits (users); alter table users_5 add constraint partition_check check (created_at > '2015-03-01');
    }

    execute %Q{
      CREATE OR REPLACE FUNCTION on_users_insert() RETURNS TRIGGER AS $$
      BEGIN
          IF (NEW.created_at >= '2013-09-01' AND NEW.created_at < '2014-04-01') THEN INSERT INTO users_1 VALUES (NEW.*); ELSIF (NEW.created_at >= '2014-04-01' AND NEW.created_at < '2014-07-01') THEN INSERT INTO users_2 VALUES (NEW.*); ELSIF (NEW.created_at >= '2014-07-01' AND NEW.created_at < '2014-10-01') THEN INSERT INTO users_3 VALUES (NEW.*); ELSIF (NEW.created_at >= '2014-10-01' AND NEW.created_at < '2015-03-01') THEN INSERT INTO users_4 VALUES (NEW.*); ELSIF (NEW.created_at > '2015-03-01') THEN
              INSERT INTO users_5 VALUES (NEW.*);
          ELSE
          END IF;

          RETURN NULL;
      END;
      $$ LANGUAGE plpgsql;
    }

    # Insert trigger
    execute %Q{
      CREATE TRIGGER insert_trigger
      BEFORE INSERT ON users
      FOR EACH ROW EXECUTE PROCEDURE on_users_insert()
    }

    # Constraint Exclusion
    execute %Q{
      SET constraint_exclusion = on;
    }
  end
end

Now let generate some sample data, so we can test the result of what we did above. I got the following rake file. It will populate 100000 users at random time and insert into the correct partition as we defined above.

require 'faker'

desc "Populate sample data"
task :sample => :environment do

  ### Insert into master users ###
  inserts = []

  100_000.times do
    name    = Faker::Name.name.gsub('\'',' ')
    address = Faker::Address.street_address.gsub('\'', ' ')
    time    = rand(2.year.ago..Time.now).to_formatted_s('db')

    inserts.push("('#{name}', '#{Faker::Internet.email}', '#{address}', '#{time}', '#{time}')")
  end

  sql = "INSERT INTO users (name, email, address, created_at, updated_at) VALUES #{inserts.join(", ")};"
  User.connection.execute(sql)
  ### END ###
end

Now you can run the following to create, migrate, and populate the data.

rake db:create db:migrate sample

Great! It’s time to check out the result:

rails c

# Under your rails console
t = User.last.created_at
User.where(created_at: t).explain
User.where(id: "00005f11-66d1-4a2e-92d1-3b5973b54b33").explain

And here is what I got:

Screen Shot 2015-08-22 at 5.19.06 PM

Screen Shot 2015-08-22 at 5.22.59 PM

In the first case, it’s only look at the right partition. This will improve the db performance significantly. Especially for the big table. And in the second case, it look at all partitions. In this example, I partitioned the table into 5 sub smaller tables by timestamp. However we can partition by id, username, or whatever that we think it suitable for our case. We just need to add the proper constraint for the partitions. Again, table partitioning is a great technique that can help us to improve the database performance.

Advertisements
Rails – Table partitioning with Postgresql

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s