Lookup table for a one-to-many relationship

Your table will work fine for this purpose, but you probably want to add an index. If the primary reason for using this table is to take an outside_ticket_id and get the corresponding ticket_id's I would add the following clustered index:

CREATE CLUSTERED INDEX [CL_Lookup_OD_ID] on [lookup](outside_data_id)
GO

If the primary lookup will be the other way around (trying to find the outside_data_id from a ticket_id) place the clustered index on the other column.

---- Oh, sorry, just noticed this is Postgres. The above syntax is SQL Server. For Postgres, create an index on the column, then issue the cluster command, like so:

CREATE INDEX IX_outside_data_id on lookup(outside_data_id);
CLUSTER lookup using IX_outside_data_id;

You may also want to look at the "fillfactor" on the index depending on how heavy the insert load is here. But that is a large topic worth exploring on it's own...


A structure like yours should probably be solved with:

  • a multi-column primary key constraint on the m-table (lookup) and
  • a foreign key constraint referencing the primary key of the 1-table.

An optimal index for looking up values in one direction is provided automatically by the primary key of the lookup table.

CREATE TABLE ticket (
   ticket_id integer PRIMARY KEY  -- possibly serial instead of integer
 , stuff text
);

CREATE TABLE lookup (
   outside_data_id integer
 , ticket_id integer REFERENCES ticket(ticket_id)
      -- ON UPDATE CASCADE  -- optional
      -- ON DELETE CASCADE  -- optional
 , CONSTRAINT lookup_pkey PRIMARY KEY (outside_data_id, ticket_id)
);

Using ticket_id as descriptive name.

If you ..

just need to know if the association exists

... then a plain foreign key might be all you need. It takes care of that automatically and delivers the additional bonus of relational integrity being enforced no matter what. Plus more options.
It also insists on a related item for each and every filter_id, so you may not be able to use it.

Actual indexes needed depend on your workload, which is still unclear to me after reading your question multiple times. In particular, the order of columns in a multi-column index (or a primary key constraint for that matter) is relevant - as we have discussed in depth under this related question.

For optimal performance (neglecting costs for index maintenance, assuming the table isn't updated much) and if your queries go both ways, you would create another index in addition to the primary key as defined above:

CREATE INDEX lookup_reverse_idx(ticket_id, outside_data_id);

While you could cover most additional use-cases with just a single-column index on ticket_id, due to data alignment in PostgreSQL storage, two integer columns in your index result in the same size on disk as just one. So, hardly any cost for some additional gain.

Aside

Experience gathered with different RDBMS'es is not always applicable across platforms.

PostgreSQL doesn't have a CLUSTERED INDEX like SQL server.

PostgreSQL's CLUSTER command is loosely related, but works differently. It's a one-time operation and does not keep the table clustered. It also completely rewrites the table with all effects of a VACUUM FULL (in modern versions since Postgres 9.0).

Depending on your actual workload CLUSTER may or may not be useful. It can be for looking up multiple related rows (outside_data_id -> ticket_id), especially for tables that are not updated a lot.

Advice on FILLFACTOR is good, though, especially if you use CLUSTER - which may otherwise actually be hurting performance if you UPDATE a lot.


The structure you've created seems like a perfectly reasonable lookup table.

CREATE TABLE lookup1
(
   outside_data_id integer NOT NULL 
   , ticket_id integer NOT NULL
);

I would probably add an index to this table like:

CREATE CLUSTERED INDEX IX_Lookup1 
    ON Lookup1 (outside_data_id, ticket_id) 
    WITH (ALLOW_PAGE_LOCKS=ON, ALLOW_ROW_LOCKS=ON, FILLFACTOR=100);

(My CREATE INDEX sample is for SQL Server, most likely there will be some changes needed for PostgreSQL!)