"It is better to
have a permanent income than to be fascinating." - Oscar
Wilde (1854-1900) |
Cluster table
Introduction
A cluster is a schema
object that contains one or more tables that all have one or more
columns in common. Rows of one or more tables that share the same
value in these common columns are physically stored together within
the database.
Generally, you should only
cluster tables that are frequently joined on the cluster key columns
in SQL statements. Clustering multiple tables improves the performance
of joins, but it is likely to reduce the performance of full table
scans, INSERT statements, and UPDATE statements that modify cluster
key values.
Cluster Keys
The columns defined by the
CREATE CLUSTER command make up the cluster key. These cluster columns
must correspond in both datatype and size to columns in each of the
clustered tables, although they need not correspond in name.
Types of Clusters
A cluster can be either an
indexed cluster or a hash cluster.
Indexed Clusters
In an indexed cluster,
Oracle stores together rows having the same cluster key value. Each
distinct cluster key value is stored only once in each data block,
regardless of the number of tables and rows in which it occurs. This
saves disk space and improves performance for many operations.
You may want to use indexed
clusters in the following cases:
Your queries retrieve rows
over a range of cluster key values.
Your clustered tables may
grow unpredictably.
After you create an indexed
cluster, you must create an index on the cluster key before you can
issue any data manipulation language (DML) statements against a table
in the cluster. This index is called the cluster index.
A cluster index provides
quick access to rows within a cluster based on the cluster key. If you
issue a SQL statement that searches for a row in the cluster based on
its cluster key value, Oracle searches the cluster index for the
cluster key value and then locates the row in the cluster based on its
ROWID.
Hash Clusters
In a hash cluster, Oracle
stores together rows that have the same hash key value. The hash value
for a row is the value returned by the cluster's hash function. When
you create a hash cluster, you can either specify a hash function or
use the Oracle internal hash function. Hash values are not actually
stored in the cluster, although cluster key values are stored for
every row in the cluster.
Cases to use Hash Clusters
You may want to use hash
clusters in the following cases:
1- Your queries retrieve
rows based on equality conditions involving all cluster key columns.
2- Your clustered tables
are static or you can determine the maximum number of rows and the
maximum amount of space required by the cluster when you create the
cluster.
The hash function provides
access to rows in the table based on the cluster key value. If you
issue a SQL statement that locates a row in the cluster based on its
cluster key value, Oracle applies the hash function to the given
cluster key value and uses the resulting hash value to locate the
matching rows. Because multiple cluster key values can map to the same
hash value, Oracle must also check the row's cluster key value. This
process often results in less I/O than the process for the indexed
cluster, because the index search is not required.
Oracle's internal hash
function returns values ranging from 0 to the value of HASHKEYS - 1.
If you specify a column with the HASH IS clause, the column values
need not fall into this range. Oracle divides the column value by the
HASHKEYS value and uses the remainder as the hash value. The hash
value for null is HASHKEYS - 1. Oracle also rounds the HASHKEYS value
up to the nearest prime number to obtain the actual number of hash
values. This rounding reduces the likelihood of hash collisions, or
multiple cluster key values having the same hash value. You cannot
create a cluster index for a hash cluster, and you need not create an
index on a hash cluster key. If you cannot fit all rows for one hash
value into a data block, do not use hash clusters. Performance is very
poor in this circumstance because an insert or update of a row in a
hash cluster with a size exceeding the data block size fills the block
and performs row chaining to contain the rest of the row.
Cluster Size
Oracle uses the value of
the SIZE parameter to determine the space reserved for rows
corresponding to one cluster key value or one hash value. This space
then determines the maximum number of cluster or hash values stored in
a data block. If the SIZE value is not a divisor of the data block
size, Oracle uses the next largest divisor. If the SIZE value is
larger than the data block size, Oracle uses the operating system
block size, reserving at least one data block per cluster or hash
value. Oracle also considers the length of the cluster key when
determining how much space to reserve for the rows having a cluster
key value. Larger cluster keys require larger sizes. To see the actual
size, query the KEY_SIZE column of the USER_CLUSTERS data dictionary
view. This does not apply to hash clusters because hash values are not
actually stored in the cluster. Although the maximum number of cluster
and hash key values per data block is fixed on a per `-cluster basis,
Oracle does not reserve an equal amount of space for each cluster or
hash key value. Varying this space stores data more efficiently,
because the data stored per cluster or hash key value is rarely fixed.
A SIZE value smaller than the space needed by the average cluster or
hash key value may require the data for one cluster key or hash key
value to occupy multiple data blocks. A SIZE value much larger results
in wasted space.
When you create a hash
cluster, Oracle immediately allocates space for the cluster based on
the values of the SIZE and HASHKEYS parameters.
Adding Tables to a Cluster
You can add tables to an
existing cluster by issuing a CREATE TABLE statement with the CLUSTER
clause. A cluster can contain as many as 32 tables, although the
performance gains of clustering are often lost in clusters of more
than four or five tables.
All tables in the cluster
have the cluster's storage characteristics as specified by the PCTUSED,
PCT, INITRANS, MAXTRANS, TABLESPACE, and STORAGE parameters.
Example I
The following statement
creates an indexed cluster named PERSONNEL with the cluster key column
DEPARTMENT_NUMBER, a cluster size of 512 bytes, and storage parameter
values:
SQL> CREATE CLUSTER
personnel
( department_number
NUMBER(2) )
SIZE 512
STORAGE (INITIAL 100K NEXT
50K PCTINCREASE 10);
The following statements
add the EMP and DEPT tables to the cluster:
SQL> CREATE TABLE emp
(empno NUMBER PRIMARY KEY,
ename VARCHAR2(10) NOT NULL
CHECK (ename = UPPER(ename)),
job VARCHAR2(9),
mgr NUMBER REFERENCES
scott.emp(empno),
hiredate DATE CHECK (hiredate
>= SYSDATE),
sal NUMBER(10,2) CHECK (sal
> 500),
comm NUMBER(9,0) DEFAULT
NULL,
deptno NUMBER(2) NOT NULL )
CLUSTER personnel (deptno);
SQL> CREATE TABLE dept
(deptno NUMBER(2),
dname VARCHAR2(9),
loc VARCHAR2(9))
CLUSTER personnel (deptno);
The following statement
creates the cluster index on the cluster key of PERSONNEL:
SQL> CREATE INDEX
idx_personnel ON CLUSTER personnel;
After creating the cluster
index, you can insert rows into either the EMP or DEPT tables.
Example II
The following statement
creates a hash cluster named PERSONNEL with the cluster key column
DEPARTMENT_NUMBER, a maximum of 503 hash key values, each of size 512
bytes, and storage parameter values:
SQL> CREATE CLUSTER
personnel
( department_number NUMBER
)
SIZE 512 HASHKEYS 500
STORAGE (INITIAL 100K
NEXT 50K
PCTINCREASE 10);
Because the above statement
omits the HASH IS clause, Oracle uses the internal hash function for
the cluster.
Example III
The following statement
creates a hash cluster named PERSONNEL with the cluster key made up of
the columns HOME_AREA_CODE and HOME_PREFIX, and uses a SQL expression
containing these columns for the hash function:
SQL> CREATE CLUSTER
personnel
( home_area_code NUMBER,
home_prefix NUMBER )
HASHKEYS 20
HASH IS MOD(home_area_code
+ home_prefix, 101);
"You got to be
careful if you don't know where you're going, because you might
not get there." - Yogi Berra |
Questions:
Q: What is a cluster table
in the Oracle database?
Q: What is a cluster key?
Q: What are the types of
clusters?
Q: Describe an indexed
cluster?
Q: Describe an hash
cluster?
Q: When do you use a hash
cluster?
Q: What is the Cluster Size
parameter?
Q: How do you add a table
to a cluster table?
Q: What do the following
SQL statements do?
SQL> CREATE CLUSTER
personnel
( department_number
NUMBER(2) )
SIZE 512
STORAGE (INITIAL 100K NEXT
50K PCTINCREASE 10)
/
SQL> CREATE TABLE emp
(empno NUMBER PRIMARY KEY,
ename VARCHAR2(10) NOT NULL
CHECK (ename = UPPER(ename)),
job VARCHAR2(9),
mgr NUMBER REFERENCES
scott.emp(empno),
hiredate DATE CHECK (hiredate
>= SYSDATE),
sal NUMBER(10,2) CHECK (sal
> 500),
comm NUMBER(9,0) DEFAULT
NULL,
deptno NUMBER(2) NOT NULL )
CLUSTER personnel (deptno)
/
SQL> CREATE TABLE dept
(deptno NUMBER(2),
dname VARCHAR2(9),
loc VARCHAR2(9))
CLUSTER personnel (deptno)
/
SQL> CREATE INDEX
idx_personnel ON CLUSTER personnel
/
SQL> CREATE CLUSTER
personnel
( department_number NUMBER
)
SIZE 512 HASHKEYS 500
STORAGE (INITIAL 100K
NEXT 50K
PCTINCREASE 10)
/
|