Welcome to mapoid.com on July 9 2009.
This is an internet experiment running to monitor browsing habbits of individuals through wikipedia contents.

Serializability

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In databases and transaction processing, a transaction schedule (history) is serializable, has the Serializability property, if its outcome (the resulting database state, the values of the database's data) is equal to the outcome of its transactions executed serially, i.e., sequentially without overlapping in time. Transactions are normally executed concurrently (they overlap), since this is the most efficient way. Serializability is the major correctness criterion for concurrent transactions' executions. It is considered the highest level of isolation between transactions, and plays an essential role in concurrency control. As such it is supported in all general purpose database systems.

Contents

[edit] Correctness

[edit] Correctness - serializability

For this discussion a database transaction is a specific intended run (with specific parameters, e.g., with transaction identification, at least) of a computer program (or programs) that accesses a database (or databases). Such a program is written with the assumption that it is running in isolation from other executing programs, i.e., when running, its accessed data are not changed by other running programs. Without this assumption the transaction's results are unpredictable and can be wrong. A same transaction can be executed in different situations, e.g., in different times, in parallel with different programs. Serializability of a schedule - i.e., equivalence (in the outcome, the database state, data values) to a serial schedule with the same transactions - is the major criterion for the correctness of concurrent transactions' schedule, and is supported in all general purpose database systems.

The rationale behind serializability is the following:

If each transaction is correct by itself, then a schedule that comprises any serial execution of these transactions is correct (no transaction overlap in time, i.e, a complete isolation between each other; any order of the transactions is legitimate). As a result, a schedule that comprises any execution (not necessarily serial) that is equivalent (in its outcome) to a serial execution, is correct.

Schedules that are not serializable are likely to generate erroneous outcomes. Well known examples are with transactions that debit and credit accounts with money: If the related schedules are not serializable, then the total sum of money may not be preserved. Money could disappear, or be generated from nowhere. This and violations of possibly needed other invariant preservations are caused by one transaction writing, and "stepping on" and erasing what has been written by another transaction before it has become permanent in the database. It does not happen if serializability is maintained.

Comment: If any specific order between some transactions is requested by an application, then it is enforced independently of the underlying serializability mechanisms. These mechanisms are typically indifferent to any specific order, and generate some unpredictable partial order that is typically compatible with multiple serial orders of these transactions. This partial order results from the scheduling orders of concurrent transactions' data access operations, which depend on many factors.

[edit] Correctness - recoverability

A major characteristic of a database transaction is atomicity, which means that it either commits, i.e., all its operations' results take effect in the database, or aborts (rolled-back), all its operations' results do not have any effect on the database ("all or nothing" semantics of a transaction). In all real systems transactions can abort for many reasons, and serializability by itself is not sufficient for correctness. Schedules also need to possess the recoverability property. Recoverability means that committed transactions have not read data written by aborted transactions (whose effects do not exist in the resulting database states). While serializability is currently compromised on purpose in many applications for better performance (only in cases when application's correctness is not harmed), compromising recoverability would quickly violate the database's integrity, as well as that of transactions' results.

[edit] Relaxing serializability

In many applications, unlike with finances, absolute correctness is not needed. For example, when retrieving a list of products according to specification, in most cases it does not matter much if a product, whose data was updated a short time ago, does not appear in the list, even if it meets the specification. It will typically appear in such a list when tried again a short time later. Commercial databases provide concurrency control with a whole range of isolation levels which are in fact (controlled) serializability violations in order to achieve higher performance. Higher performance means better transaction execution rate and shorter average transaction response time (transaction duration).

Classes of schedules defined by relaxed serializability properties either contain the serializability class, or are incomparable with it.

[edit] View and conflict serializability

Mechanisms that enforce serializability need to execute in real time, or almost in real time, while transactions are running at high rates. In order to meet this requirement special cases of serializability, sufficient conditions for serializability which can be enforced effectively, are utilized.

Two major types of serializability exist: view-serializability, and conflict-serializability. View-serializability matches the general definition of serializability given above. Conflict-serializability is a broad special case, i.e., any schedule that is conflict-serializable is also view-serializable, but not necessarily the opposite. Conflict-serializability is widely utilized because it is easier to determine. Determining view-serializability of a schedule is an NP-complete problem (a class of problems with only difficult-to-compute known solutions).

View-serializability of a schedule is defined by equivalence to a serial schedule (no overlapping transactions) with the same transactions, such that respective transactions in the two schedules read and write the same data values ("view" the same data values).
Conflict-serializability is defined by equivalence to a serial schedule (no overlapping transactions) with the same transactions, such that both schedules have the same sets of respective chronologically-ordered pairs of conflicting operations (same precedence relations of respective conflicting operations).

Operations upon data are read or write (a write: either insert or modify or delete). Two operations are conflicting, if they are of different transactions, upon the same datum (data item), and at least one of them is write. Each such pair of conflicting operations has a conflict type: It is either a read-write, or write-read, or a write-write conflict. The transaction of the second operation in the pair is said to be in conflict with the transaction of the first operation. A more general definition of conflicting operations (also for complex operations, which may consist each of several "simple" read/write operations) requires that they are noncommutative (changing their order also changes their combined result). Each such operation needs to be atomic by itself (by proper system support) in order to be considered an operation for a commutativity check. For example, the operations increment and decrement of a counter are both write operations (both modify the counter), but do not need to be considered conflicting (write-write conflict type) since they are commutative (e.g., already supported in the old IBM's IMS "fast path"). Only precedence (time order) in pairs of conflicting (noncommutative) operations is important when checking equivalence to a serial schedule, since changing orders of commutative operations does not change the operation sequence result, i.e., the schedule outcome.

[edit] Enforcing conflict serializability

[edit] Testing conflict serializability

Schedule compliance with conflict serializability can be tested with the precedence graph (serializability graph, conflict graph) for committed transactions of the schedule. It is the directed graph representing precedence of transactions in the schedule, as reflected by precedence of conflicting operations in the transactions.

In the precedence graph transactions are nodes and precedence relations are directed edges. There exists an edge from a first transaction to a second transaction, if the second transaction is in conflict with the first (see Conflict serializability above), and the conflict is materialized (i.e., if the requested conflicting operation is actually executed: in many cases a requested/issued conflicting operation by a transaction is delayed and even never executed, typically by a lock on the operation's object, held by another transaction).
Comment: In many text books only committed transactions are included in the precedence graph. Here all transactions are included for convenience in later discussions.

The following observation is a key characterization of conflict serializability:

A schedule is conflict-serializable if and only if its precedence graph of committed transactions (when only committed transactions are considered) is acyclic. This means that a cycle consisting of committed transactions only is generated in the (general) precedence graph, if and only if conflict-serializability is violated.

Cycles of committed transactions can be prevented by aborting an undecided (neither committed, nor aborted) transaction on each cycle in the precedence graph of all the transactions, which can otherwise turn into a cycle of committed transactions (and a committed transaction cannot be aborted). One transaction aborted per cycle is both required and sufficient number to break and eliminate the cycle (more aborts are possible, and can happen in some mechanisms, but unnecessary for serializability). The probability of cycle generation is typically low, but nevertheless, such a situation is carefully handled, typically with a considerable overhead, since correctness is involved. Transactions aborted due to serializability violation prevention are restarted and executed again immediately.

Serializability enforcing mechanisms typically do not maintain a precedence graph as a data structure, but rather prevent or break cycles implicitly (e.g., SS2PL below).

[edit] Common mechanism - SS2PL

Strong strict two phase locking (SS2PL) is a common mechanism utilized in database systems to enforce both conflict serializability and strictness (a special case of recoverability) of a schedule. SS2PL is the name of the resulting schedule property as well, which is also called rigorousness. In this mechanism each datum is locked by a transaction before accessing it (any read or write operation): The item is marked by, associated with a lock of a certain type, depending on operation (and the specific implementation; various models with different lock types exist; in some models locks may change type during the transaction's life). As a result access by another transaction may be blocked, typically upon a conflict (the lock delays or completely prevents the conflict from being materialized and be reflected in the precedence graph by blocking the conflicting operation), depending on lock type and the other transaction's access operation type. Employing an SS2PL mechanism means that all locks on data on behalf of a transaction are released only after the transaction has ended (either committed or aborted).

Mutual blocking between transactions results in a deadlock, where execution of these transactions is stalled, and no completion can be reached. Thus deadlocks need to be resolved to complete these transactions' execution and release related computing resources. A deadlock is a reflection of a potential cycle in the precedence graph, that would occur without the blocking. A deadlocks is resolved by aborting a transaction involved with such potential cycle, and breaking the cycle. It is often detected using a wait-for graph (a graph of conflicts blocked by locks from being materialized; conflicts not materialized are not reflected in the precedence graph and do not affect serializability), which indicates which transaction is "waiting for" lock release by which transaction, and a cycle means a deadlock. Aborting one transaction per cycle is sufficient to break the cycle. Transactions aborted due to deadlock resolution are restarted and executed again immediately.

[edit] Other enforcing techniques

Other known mechanisms include:

[edit] Global serializability and commitment ordering

In a federated database system or any other more loosely defined multidatabase system, which are typically distributed in a communication network, transactions span multiple (and possibly distributed) databases. Enforcing global serializability in such heterogeneous system, where databases may use different types of concurrency control, is problematic. Even if every local schedule of a single database is serializable, the global schedule of a whole system is not necessarily serializable. The massive communication exchanges of conflict information needed between databases to reach conflict serializability would lead to unacceptable performance, primarily due to computer and communication latency. The problem of achieving global serializability effectively in such systems had been characterized as open until the introduction of commitment ordering (CO; or commit ordering, or commit-order-serializability) in 1991 (see Global serializability).

A schedule having the CO property means that the order in time of transactions' commitment events is compatible with the precedence (partial) order of the respective transactions as determined by their local acyclic conflict graph. Any conflict serializable schedule can be made a CO compliant one, without aborting any transaction in the schedule, by delaying commitment events to comply with the needed partial order.

Enforcing the CO property in each local schedule is an effective way to enforce conflict serializability globally. CO is a broad special case of conflict serializability, and enforcing it locally in each local schedule also enforces it, and hence serializability, globally. The only needed communication between the databases for this purpose is that of the atomic commitment protocol (such as the two-phase commit protocol), that exists in most distributed environments and already must be utilized for each distributed transaction. Thus CO incurs no communication overhead. In each single database, local CO algorithm can run beside any local concurrency control mechanism (serializability enforcing mechanism) without interfering with its resource access scheduling strategy. As such CO provides a general, high performance, fully distributed solution. No central processing component or central data structure is needed. Moreover, CO works also in heterogeneous environments with different database system types and other multiple transactional objects that employ different serializability mechanisms. The CO solution scales up effectively with network size and the number of databases without any negative impact on performance (assuming the statistics of a single distributed transaction, e.g., the average number of databases involved with such transaction, are unchanged). This makes CO instrumental for global concurrency control.

CO implementation by itself is not sufficient as a concurrency control mechanism, since it lacks the important recoverability property.

SS2PL implies CO, and any SS2PL-compliant database can participate in multidatabase systems that utilize the CO solution for global serializability without any modification or addition of a CO algorithm component. As a matter of fact global serializability has been achieved in all-SS2PL multidatabase environments with the two phase commit (2PC) protocol since the eighties, and SS2PL in conjunction with 2PC is the de facto standard to reach global serializability across (the common, SS2PL-compliant) databases. Also the following is true: any CO-compliant database system can transparently join any such existing SS2PL based solution for global serializability.

The commitment event of a distributed transaction is always generated by some atomic commitment protocol, utilized to reach consensus among its processes on whether to commit or abort it. This procedure is always carried out for distributed transactions, independently of CO. The atomic commitment protocol plays a central role in the distributed CO algorithm. In case of incompatible local commitment orders in two or more databases (no global partial order can embed the respective local partial orders together), which implies a global cycle (a cycle that spans two or more database) in the global conflict graph, CO generates a voting-deadlock for the atomic commitment protocol (resulting in missing votes), and the protocol resolves that deadlock by aborting a transaction on the cycle and breaking the cycle. Furthermore: with CO, the global augmented conflict graph provides a complete characterization of voting-deadlocks. In this graph also being blocked by a lock to prevent a conflict from being materialized is represented by an edge as a materialized conflict. It is the union of the regular precedence graph with (reversed edge) regular wait-for graph, and reflects both materialized and non-materialized conflicts. This graph is a (reversed edge) wait-for graph for voting (an edge from a first transaction to a second transaction indicates that either the voting or local commit of the first is waiting to the second to end), and a global cycle means a voting-deadlock. Thus also global deadlocks due to locking (when at least one edge for lock blocking exists on a global cycle) generate voting-deadlocks and are resolved automatically by the same mechanism (such locking based global deadlocks are resolved automatically also in the common, completely SS2PL based environments, but no research article besides the CO articles is known to notice this fact). No implementation of such global graph is needed, and it is used only to explain the behavior of CO and its effectiveness in both guaranteeing global serializability and resolving locking-based global deadlocks.

[edit] References

Personal tools

Visit joltnews for the latest headlines
Visit bloit.com for company information
Geed Media does computer consulting on long island.
This page viewed times. See Logs