"I found that when I talk to the little flower or to the
little peanut they will give up their secrets..."
-George Washington Carver (1804-1903)
|
What is a cluster table?
A cluster is a schema object that contains one or more tables that all
have one or more columns in common. Rows of one or more tables that
share the same value in these common columns are physically stored
together within the database.
Generally, you should only cluster tables that are frequently joined
on the cluster key columns in SQL statements. Clustering multiple
tables improves the performance of joins, but it is likely to reduce
the performance of full table scans, INSERT statements, and UPDATE
statements that modify cluster key values. Before clustering, consider
its
Cluster Keys
The columns defined by the CREATE CLUSTER command make up the cluster
key. These cluster columns must correspond in both datatype and size
to columns in each of the clustered tables, although they need not
correspond in name.
Types of Clusters
A cluster can be either an indexed cluster or a hash cluster.
Indexed Clusters
In an indexed cluster, Oracle stores together rows having the same
cluster key value. Each distinct cluster key value is stored only once
in each data block, regardless of the number of tables and rows in
which it occurs. This saves disk space and improves performance for
many operations.
You may want to use indexed clusters in the following cases:
·
Your queries retrieve rows over a range of cluster key values.
·
Your clustered tables may grow unpredictably.
After you create an indexed cluster, you must create an index on the
cluster key before you can issue any data manipulation language (DML)
statements against a table in the cluster. This index is called the
cluster index.
A cluster index provides quick access to rows within a cluster based
on the cluster key. If you issue a SQL statement that searches for a
row in the cluster based on its cluster key value, Oracle searches the
cluster index for the cluster key value and then locates the row in
the cluster based on its ROWID.
Hash Clusters
In a hash cluster, Oracle stores together rows that have the same hash
key value. The hash value for a row is the value returned by the
cluster's hash function. When you create a hash cluster, you can
either specify a hash function or use the Oracle internal hash
function. Hash values are not actually stored in the cluster, although
cluster key values are stored for every row in the cluster.
You may want to use hash clusters in the following cases:
·
Your queries retrieve rows based on equality conditions involving all
cluster key columns.
·
Your clustered tables are static or you can determine the maximum
number of rows and the maximum amount of space required by the cluster
when you create the cluster.
The hash function provides access to rows in the table based on the
cluster key value. If you issue a SQL statement that locates a row in
the cluster based on its cluster key value, Oracle applies the hash
function to the given cluster key value and uses the resulting hash
value to locate the matching rows. Because multiple cluster key values
can map to the same hash value, Oracle must also check the row's
cluster key value. This process often results in less I/O than the
process for the indexed cluster, because the index search is not
required.
Oracle's internal hash function returns values ranging from 0 to the
value of HASHKEYS - 1. If you specify a column with the HASH IS
clause, the column values need not fall into this range. Oracle
divides the column value by the HASHKEYS value and uses the remainder
as the hash value. The hash value for null is HASHKEYS - 1. Oracle
also rounds the HASHKEYS value up to the nearest prime number to
obtain the actual number of hash values. This rounding reduces the
likelihood of hash collisions, or multiple cluster key values having
the same hash value.
You cannot create a cluster index for a hash cluster, and you need not
create an index on a hash cluster key.
If you cannot fit all rows for one hash value into a data block, do
not use hash clusters. Performance is very poor in this circumstance
because an insert or update of a row in a hash cluster with a size
exceeding the data block size fills the block and performs row
chaining to contain the rest of the row.
Cluster Size
Oracle uses the value of the SIZE parameter to determine the space
reserved for rows corresponding to one cluster key value or one hash
value. This space then determines the maximum number of cluster or
hash values stored in a data block. If the SIZE value is not a divisor
of the data block size, Oracle uses the next largest divisor. If the
SIZE value is larger than the data block size, Oracle uses the
operating system block size, reserving at least one data block per
cluster or hash value.
Oracle also considers the length of the cluster key when determining
how much space to reserve for the rows having a cluster key value.
Larger cluster keys require larger sizes. To see the actual size,
query the KEY_SIZE column of the USER_CLUSTERS data dictionary view.
This does not apply to hash clusters because hash values are not
actually stored in the cluster.
Although the maximum number of cluster and hash key values per data
block is fixed on a per `-cluster basis, Oracle does not reserve an
equal amount of space for each cluster or hash key value. Varying this
space stores data more efficiently, because the data stored per
cluster or hash key value is rarely fixed.
A SIZE value smaller than the space needed by the average cluster or
hash key value may require the data for one cluster key or hash key
value to occupy multiple data blocks. A SIZE value much larger results
in wasted space.
When you
create a hash cluster, Oracle immediately allocates space for the
cluster based on the values of the SIZE and HASHKEYS parameters.
Adding Tables to a Cluster
You can add tables to an existing cluster by issuing a CREATE TABLE
statement with the CLUSTER clause. A cluster can contain as many as 32
tables, although the performance gains of clustering are often lost in
clusters of more than four or five tables.
All tables in the cluster have the cluster's storage characteristics
as specified by the PCTUSED, PCT, INITRANS, MAXTRANS, TABLESPACE,
and STORAGE parameters.
Example I
The following statement creates an indexed cluster named PERSONNEL
with the cluster key column DEPARTMENT_NUMBER, a cluster size of 512
bytes, and storage parameter values:
CREATE CLUSTER personnel
( department_number
NUMBER(2) )
SIZE 512
STORAGE (INITIAL 100K NEXT 50K PCTINCREASE 10);
The following statements add the EMP and DEPT tables to the cluster:
CREATE TABLE emp
(empno
NUMBER
PRIMARY KEY,
ename
VARCHAR2(10)
NOT NULL
CHECK (ename = UPPER(ename)),
job
VARCHAR2(9),
mgr
NUMBER
REFERENCES scott.emp(empno),
hiredate DATE
CHECK (hiredate >= SYSDATE),
sal
NUMBER(10,2) CHECK
(sal > 500),
comm
NUMBER(9,0)
DEFAULT NULL,
deptno
NUMBER(2)
NOT NULL )
CLUSTER personnel (deptno);
CREATE TABLE dept
(deptno NUMBER(2),
dname VARCHAR2(9),
loc
VARCHAR2(9))
CLUSTER personnel (deptno);
The following statement creates the cluster index on the cluster key
of PERSONNEL:
CREATE INDEX idx_personnel ON CLUSTER personnel;
After creating the cluster index, you can insert rows into either the
EMP or DEPT tables.
Example II
The following statement creates a hash cluster named PERSONNEL with
the cluster key column DEPARTMENT_NUMBER, a maximum of 503 hash key
values, each of size 512 bytes, and storage parameter values:
CREATE CLUSTER personnel
( department_number
NUMBER )
SIZE 512 HASHKEYS 500
STORAGE (INITIAL 100K
NEXT 50K PCTINCREASE 10);
Because the above statement omits the HASH IS clause, Oracle uses the
internal hash function for the cluster.
Example III
The following statement creates a hash cluster named PERSONNEL with
the cluster key made up of the columns HOME_AREA_CODE and HOME_PREFIX,
and uses a SQL expression containing these columns for the hash
function:
CREATE CLUSTER personnel
( home_area_code
NUMBER,
home_prefix
NUMBER )
HASHKEYS 20
HASH IS MOD(home_area_code + home_prefix, 101);
|