6. In-Memory Server Mode
The
in-memory server option is an entirely new mode of operation for improving the
speed of inserts and updates at the cost of durability. You can create and load
a database file in "SQL Anywhere Classic" mode and then run it in one
of the two in-memory modes: Never Write or Checkpoint Only. You can switch
between in-memory modes and back to Classic mode by stopping and restarting the
engine... it's the same software, just using different command-line parameters.
Before
anyone gets all upset, SQL Anywhere 11 isn't introducing some kind of cheesy
non-transactional auto-commit storage engine like MyISAM in MySQL. SQL Anywhere
is still a fully-relational database management system, fully transactional,
almost fully ACId even when running in-memory mode.
ACId
means Atomic, Consistent, Isolated and durable, the Four Horsemen of the
Transaction. Durability means a COMMIT is permanent. I'm using a lowercase
"d" in ACId because the Checkpoint Only flavor of in-memory mode
preserves all committed transactions up to the last checkpoint, but doesn't
allow forward recovery beyond that point. Moreover, nothing's preserved, no
recovery at all for the Never Write flavor... the name kind of gives that away.
In-memory
mode is most effective when your database RAM cache is large enough to hold the
entire database, when you are doing lots of inserts and updates and commits,
and when you're willing to sacrifice some safety to get a lot of speed. I'm
thinking of continuous monitoring as a candidate application, where huge
amounts of data are received in short periods of time, and where it's more
important to keep up with the input than to guarantee that 100% of the data is
recovered after a crash.
Both
in-memory modes allow updates to the data in the cache; in other words,
"Never Write" doesn't mean "read only". With Checkpoint
Only mode, updated data is written to the database file on a CHECKPOINT. With
Never Write, it's up to you to save your data if permanence is important; three
speedy ways to do that are database backup, the database unload utility and
individual UNLOAD TABLE statements. Figure 4 shows the details of how Never
Write and Checkpoint Only stack up against one another, and against Classic
mode.
Figure 4: Comparing Modes of Server Operation
-im nw -im c
Never Checkpoint
Write Only Classic
====== ========== =======
Transactional yes yes yes
.db file updated no yes yes
Transaction log no no yes
Temporary file no no yes
Checkpoint log no yes yes
Checkpoints no yes yes
Dirty page flushes no yes(a) yes
Unlimited growth no yes(b) yes
Recoverable no yes(c) yes
(a) Dirty pages are flushed only on checkpoint.
(b) Without a temporary file, checkpoints are the only way to keep the cache from being exhausted.
(c) After a crash, a restart will perform the usual automatic recovery to the
last good checkpoint, but without a transaction log, it is not possible to
run a forward recovery from the last good checkpoint to the last committed transaction.
7. MERGE Statement
The new
MERGE statement brings the power of set-oriented processing to bear on INSERT,
UPDATE and DELETE logic when you need to copy data from one table to another.
Here is a simple example that starts with an input table containing a set of
transactions:
CREATE TABLE transaction (
part_number INTEGER NOT NULL PRIMARY KEY,
action VARCHAR ( 6 ) NOT NULL,
description LONG VARCHAR NULL,
price DECIMAL ( 11, 2 ) NULL );
INSERT transaction VALUES ( 1, 'Add', 'widget', 10.00 );
INSERT transaction VALUES ( 2, 'Add', 'gizmo', 20.00 );
INSERT transaction VALUES ( 3, 'Add', 'thingie', 30.00 );
INSERT transaction VALUES ( 4, 'Add', 'bouncer', 40.00 );
Here's
the output table, plus the MERGE statement which turns the four input 'Add'
transactions into INSERT operations:
CREATE TABLE part (
part_number INTEGER NOT NULL PRIMARY KEY,
description LONG VARCHAR NOT NULL,
price DECIMAL ( 11, 2 ) NOT NULL,
exceptions LONG VARCHAR NOT NULL DEFAULT '' );
MERGE INTO part USING transaction ON PRIMARY KEY
WHEN NOT MATCHED AND transaction.action = 'Add'
THEN INSERT ( part_number,
description,
price )
VALUES ( transaction.part_number,
transaction.description,
transaction.price )
WHEN MATCHED AND transaction.action = 'Change'
THEN UPDATE SET part.description
= COALESCE ( transaction.description,
part.description ),
part.price
= COALESCE ( transaction.price,
part.price )
WHEN MATCHED AND transaction.action = 'Delete'
THEN DELETE
WHEN MATCHED AND transaction.action = 'Add'
THEN UPDATE SET part.exceptions
= STRING ( part.exceptions,
'Duplicate Add ignored. ' )
WHEN NOT MATCHED AND transaction.action = 'Change'
THEN INSERT VALUES ( transaction.part_number,
transaction.description,
transaction.price,
'Change accepted, new part inserted. ' )
WHEN NOT MATCHED AND transaction.action = 'Delete'
THEN SKIP;
The MERGE
statement works as follows:
-
The INTO
clause specifies the output or target table.
-
The USING clause
names the input or source table.
-
The ON clause
specifies how row matching is done, in this case using the primary keys for
both tables.
-
The first WHEN
clause says what to do when an 'Add' transaction doesn't match any row in the
part table: INSERT a new row.
-
The second
WHEN says what to do with a matching 'Change': UPDATE the description and/or
price columns.
-
The third WHEN
turns a matching 'Delete' transaction into a DELETE operation.
-
The fourth
WHEN records an error: 'Duplicate Add ignored.'
-
The fifth WHEN
turns a mismatching 'Change' into an INSERT.
-
The last WHEN
simply ignores an attempt to 'Delete' a part that isn't there.
In the
first run of the MERGE, all it does is execute the first WHEN clause four times
to fill the part table:
part_number description price exceptions
=========== =========== ===== ==========
1 widget 10.00
2 gizmo 20.00
3 thingie 30.00
4 bouncer 40.00
Here's
where it gets interesting; six new transactions that exercise all the remaining
WHEN clauses:
TRUNCATE TABLE transaction;
INSERT transaction VALUES ( 1, 'Change', NULL, 15.00 );
INSERT transaction VALUES ( 2, 'Change', 'gizmoid', NULL );
INSERT transaction VALUES ( 3, 'Delete', NULL, NULL );
INSERT transaction VALUES ( 4, 'Add', 'bouncer', 45.00 );
INSERT transaction VALUES ( 5, 'Change', 'gong', 55.00 );
INSERT transaction VALUES ( 6, 'Delete', NULL, NULL );
Here's
what the part table looks like after running the MERGE again:
part_number description price exceptions
=========== =========== ===== ==========
1 widget 15.00
2 gizmoid 20.00
4 bouncer 40.00 Duplicate Add ignored.
5 gong 55.00 Change accepted, new part inserted.
There is
a dark side to MERGE, however: it won't let you have multiple input rows
updating one output row. That's not SQL Anywhere's fault, it's a Pointy-Haired
Committee decision, and I predict it's going to be the Number One complaint
about MERGE: "The ANSI SQL/2003 standard does not allow rows in
target-object to be updated by more than one row in source-object during a
merge operation." The Number Two request? Folks will ask for the ability
to call a procedure from a WHEN clause.
8. CREATE EVENT ... TYPE DEADLOCK
A new
EVENT type has been introduced for catching and reporting on cyclical
deadlocks. Deadlocks sometimes occur on busy systems when two different
connections each lock a row and then immediately ask for a lock on the row the
other guy's got. Both connections are blocked, neither one can get what it
wants, and neither one will ever proceed until some third party takes action to
break the deadlock.
SQL
Anywhere itself is that third party; it has always been able to detect
deadlocks, and it instantly "resolves" them by picking one of the
connections as the victim, issuing a ROLLBACK and returning an error message to
that application.
So far,
so good; one user wins, the other can try again, and the maintenance programmer
can start looking for the error... and it almost always is a programming
error, some flaw in the order in which SQL statements are executed. The problem
is that the standard deadlock error message gives absolutely no clue about what
tables or rows were involved in the deadlock, or what either connection was
doing at the time. The new CREATE EVENT ... TYPE DEADLOCK statement fixes that
by giving you the ability to record and report on all the pertinent details:
the connection numbers and user ids for both connections, and the actual SQL
statements that were executing at the time the deadlock occurred.
CREATE EVENT deadlock TYPE DEADLOCK HANDLER
BEGIN
-- Copy fresh entries from sa_report_deadlocks()
INSERT deadlock (
snapshotId,
snapshotAt,
waiter,
who,
what,
object_id,
record_id,
owner,
is_victim,
rollback_operation_count )
SELECT *
FROM sa_report_deadlocks()
WHERE NOT EXISTS (
SELECT * FROM deadlock
WHERE deadlock.snapshotId
= sa_report_deadlocks.snapshotId
AND deadlock.snapshotAt
= sa_report_deadlocks.snapshotAt );
COMMIT;
MESSAGE STRING ( 'DIAG ', CURRENT TIMESTAMP,
' deadlock' ) TO CONSOLE;
END;
The built-in
sa_report_deadlocks() procedure isn't new, but it has been greatly improved for
SQL Anywhere 11. It returns a result set containing at least two rows for each
deadlock that has occurred since the server started: one for the victim, one
for the other connection. The INSERT copies that data into a permanent
user-defined table called "deadlock". The WHERE NOT EXISTS is needed
to prevent old rows from being copied again when another deadlock happens.
Here's
what the table looks like; it has one extra column "row_number" to
act as an artificial primary key, plus columns for everything returned by
sa_report_deadlocks():
CREATE TABLE deadlock (
row_number BIGINT NOT NULL
DEFAULT AUTOINCREMENT,
snapshotId BIGINT NOT NULL,
snapshotAt TIMESTAMP NOT NULL,
waiter INTEGER NOT NULL,
who VARCHAR ( 128 ) NOT NULL,
what LONG VARCHAR NOT NULL,
object_id UNSIGNED BIGINT NOT NULL,
record_id BIGINT NOT NULL,
owner INTEGER NOT NULL,
is_victim BIT NOT NULL,
rollback_operation_count UNSIGNED INTEGER NOT NULL,
PRIMARY KEY ( row_number ) );
In order
to make this work, you have to tell the server to gather the report data:
SET OPTION PUBLIC.LOG_DEADLOCKS = 'ON';
It also
helps a lot if you enable the 'LastStatement' connection property so the report
data will show the actual SQL statements that were blocked. The easiest way to
do that is to specify the dbsrv11 -zl option when starting the server.
Figure 5
shows what the following query displayed after a deadlock; "waiter"
is the connection number, "who" is the user name, "what" is
the CONNECTION_PROPERTY ( 'LastStatement' ) and "is_victim" tells you
which connection got the ROLLBACK:
SELECT waiter,
who,
what,
is_victim
FROM deadlock
ORDER BY row_number;
Figure 5: Displaying Deadlock Details
9. It's The Little Things That Count
In every
release of SQL Anywhere, many features are introduced with little or no fanfare
because nobody thinks they are significant. Sometimes, one of these features
becomes hugely popular and important, like the magic DEFAULT TIMESTAMP, which
works with UPDATE as well as INSERT. I call these features The Little Things
That Count, and somewhere in the following list is lurking at least one that
deserves to be in the Top 10 Cool list, only we don't know it yet:
-
The function
call PROPERTY ( 'TCPIPAddresses' ) answers the question "have I connected
to the right server?" by telling you what address the server's listening
on; e.g.: 192.168.1.51:49270.
-
You can press
Ctrl- to comment and uncomment blocks of code in dbisql.
-
You can use
the DEFAULT keyword in an UPDATE as well as INSERT; e.g., UPDATE t SET c =
DEFAULT.
-
ENDIF and END
IF can be used interchangeably; no longer do you need to remember which one
goes with the IF statement and which one with the IF expression.
-
Line numbers
are displayed next to your code in the dbisql "SQL Statements" frame.
-
You can use
the ampersand "&" as a line continuation character in
@configuration files, which really helps when you're setting up long parameter
strings for High Availability and the built-in HTTP server.
-
JSON or
JavaScript Object Notation is now available for web service result sets, in
addition to HTML, SOAP, XML and raw formats... maybe JSON will save us all from
XML.
-
The new
PRIORITY option can be used by an application to increase or decrease the
priority level at which its SQL requests are executed. For example, a
high-speed OLTP connection can set its priority to 'Critical' while a
long-running OLAP query can use 'Background'... or any one of the five other
levels between those two extremes.
-
Even if you
don't care about UNLOAD into a variable, one of the other enhancements to LOAD
and UNLOAD is sure to make you happy: encryption, compression, using LOAD and
UNLOAD in dbisql... how about optional transaction logging so you can use LOAD
and UNLOAD in a mirrored database environment?
-
Speaking of
mirrored databases: now you can run read-only queries against the secondary
server in a High Availability setup; now you can move your heavy OLAP sessions
off your primary OLTP server and use the secondary server for something other
than just failover.
-
The
NewPassword connection parameter lets the user specify a new password even if
the current one's expired.
Oh, and
that business about expired passwords? There's a whole mess of new features
under the topic of "Login Policies". It's not in the Top 10 list
because, let's face it, security is not cool... but logging in without having
to call the DBA, that's priceless.
10. Full Text Search
If this were
a "Top 1 List" then full text search would still be on it: it's the
number one coolest feature in SQL Anywhere 11. With full text search, you don't
have to use a separate piece of software to implement fast searches across
multiple LONG VARCHAR columns, and you don't have to store the data outside the
database to do it. If you've ever run SELECT LIKE '%word%' queries against a
large table, you know what "slow" means, and you're going to love
full text search.
Here is
an example of how it works, using a table that contains all 6.5 million entries
downloaded from the English version of Wikipedia:
CREATE TABLE enwiki_entry ( -- 6,552,490 rows, 17.2G total
page_number BIGINT NOT NULL,
from_line_number BIGINT NOT NULL,
to_line_number BIGINT NOT NULL,
page_title VARCHAR ( 1000 ) NOT NULL,
page_id VARCHAR ( 100 ) NOT NULL,
page_text LONG VARCHAR NOT NULL,
PRIMARY KEY CLUSTERED ( page_number ) );
The first
step is to define a text index on the columns to be searched:
CREATE TEXT INDEX tx_page_text
ON enwiki_entry ( page_text )
MANUAL REFRESH;
The
second step is to build the text index, a process that can take quite a long
time if the table is very large:
REFRESH TEXT INDEX tx_page_text
ON enwiki_entry
WITH EXCLUSIVE MODE
FORCE BUILD;
Once the
index is built, however, queries that use the index are very fast. The
following SELECT uses the new CONTAINS clause to find all the Wikipedia entries
containing the exact phrase "Ayn Rand":
SELECT score,
enwiki_entry.page_title,
LEFT ( enwiki_entry.page_text, 500 ) AS excerpt
FROM enwiki_entry
CONTAINS ( enwiki_entry.page_text,
'"Ayn Rand"' )
ORDER BY score DESC;
The
CONTAINS clause applies the query string '"Ayn Rand"' to the
enwiki_entry.page_text column using the full text index previously defined on
that column, thus limiting the result set to rows that match. The CONTAINS
clause also returns an implicit "score" column, which measures how
closely each row matches the query string. The ORDER BY clause uses that column
to sort the best matches to the top, and Figure 6 shows the results in a
browser display produced by a SQL Anywhere 11 web service.
Figure 6: Full Text Search of Wikipedia
Here's a
tip: Don't do like I did and leave out an important column from the CREATE TEXT
INDEX statement. In Figure 6 the main Wikipedia entry entitled "Ayn
Rand" doesn't appear on the first page, but it should, and it would have
if I hadn't forgotten to include the enwiki_entry.page_title column in the
index. When a full text index specifies multiple columns the CONTAINS clause
calculates a score that counts both columns, and in this case the row with
"Ayn Rand" in the title would have received a very high score for the
query string '"Ayn Rand"'.
There are
many, many options available with full text searching; I've only touched on the
basics. Here's one of the extras: If you build an index on two columns, you can
refer to one column in the CONTAINS clause and the search will only look at
that column... another reason not to make the mistake I did, better to index on
more columns, not fewer, you'll have more freedom when designing your queries.
Oh, and
by the way: The default boolean operator is AND, just like in Google. In other
words, the query strings 'Ayn Rand' and 'Ayn AND Rand' are the same, and they're
different from 'Ayn OR Rand'.
About the Author
Breck
Carter is principal consultant at RisingRoad Professional Services, providing consulting and
support for SQL Anywhere databases and MobiLink synchronization with Oracle,
SQL Server and SQL Anywhere. He is also author of the SQL Anywhere blog.
Breck can
be reached at breck.carter@gmail.com.