Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

Sybase

Posted Nov 3, 2008

Top 10 Cool New Features In SQL Anywhere 11 - Page 2

By DatabaseJournal.com Staff

6. In-Memory Server Mode

The in-memory server option is an entirely new mode of operation for improving the speed of inserts and updates at the cost of durability. You can create and load a database file in "SQL Anywhere Classic" mode and then run it in one of the two in-memory modes: Never Write or Checkpoint Only. You can switch between in-memory modes and back to Classic mode by stopping and restarting the engine... it's the same software, just using different command-line parameters.

Before anyone gets all upset, SQL Anywhere 11 isn't introducing some kind of cheesy non-transactional auto-commit storage engine like MyISAM in MySQL. SQL Anywhere is still a fully-relational database management system, fully transactional, almost fully ACId even when running in-memory mode.

ACId means Atomic, Consistent, Isolated and durable, the Four Horsemen of the Transaction. Durability means a COMMIT is permanent. I'm using a lowercase "d" in ACId because the Checkpoint Only flavor of in-memory mode preserves all committed transactions up to the last checkpoint, but doesn't allow forward recovery beyond that point. Moreover, nothing's preserved, no recovery at all for the Never Write flavor... the name kind of gives that away.

In-memory mode is most effective when your database RAM cache is large enough to hold the entire database, when you are doing lots of inserts and updates and commits, and when you're willing to sacrifice some safety to get a lot of speed. I'm thinking of continuous monitoring as a candidate application, where huge amounts of data are received in short periods of time, and where it's more important to keep up with the input than to guarantee that 100% of the data is recovered after a crash.

Both in-memory modes allow updates to the data in the cache; in other words, "Never Write" doesn't mean "read only". With Checkpoint Only mode, updated data is written to the database file on a CHECKPOINT. With Never Write, it's up to you to save your data if permanence is important; three speedy ways to do that are database backup, the database unload utility and individual UNLOAD TABLE statements. Figure 4 shows the details of how Never Write and Checkpoint Only stack up against one another, and against Classic mode.

       Figure 4: Comparing Modes of Server Operation
       
                          -im nw  -im c
                          Never   Checkpoint
                          Write   Only        Classic
                          ======  ==========  =======
       Transactional       yes     yes         yes
       .db file updated    no      yes         yes
       Transaction log     no      no          yes
       Temporary file      no      no          yes
       Checkpoint log      no      yes         yes
       Checkpoints         no      yes         yes
       Dirty page flushes  no      yes(a)      yes
       Unlimited growth    no      yes(b)      yes
       Recoverable         no      yes(c)      yes
       
       (a) Dirty pages are flushed only on checkpoint.
       
       (b) Without a temporary file, checkpoints are the only way to keep the cache from being exhausted.
       
       (c) After a crash, a restart will perform the usual automatic recovery to the 
       last good checkpoint, but without a transaction log, it is not possible to 
       run a forward recovery from the last good checkpoint to the last committed transaction.

7. MERGE Statement

The new MERGE statement brings the power of set-oriented processing to bear on INSERT, UPDATE and DELETE logic when you need to copy data from one table to another. Here is a simple example that starts with an input table containing a set of transactions:

CREATE TABLE transaction (
   part_number  INTEGER NOT NULL PRIMARY KEY,
   action       VARCHAR ( 6 ) NOT NULL,
   description  LONG VARCHAR NULL,
   price        DECIMAL ( 11, 2 ) NULL );
INSERT transaction VALUES ( 1, 'Add', 'widget',  10.00 );
INSERT transaction VALUES ( 2, 'Add', 'gizmo',   20.00 );
INSERT transaction VALUES ( 3, 'Add', 'thingie', 30.00 );
INSERT transaction VALUES ( 4, 'Add', 'bouncer', 40.00 );

Here's the output table, plus the MERGE statement which turns the four input 'Add' transactions into INSERT operations:

CREATE TABLE part (
   part_number  INTEGER NOT NULL PRIMARY KEY,
   description  LONG VARCHAR NOT NULL,
   price        DECIMAL ( 11, 2 ) NOT NULL,
   exceptions   LONG VARCHAR NOT NULL DEFAULT '' );
MERGE INTO part USING transaction ON PRIMARY KEY
 WHEN NOT MATCHED AND transaction.action = 'Add' 
    THEN INSERT ( part_number,
                  description,
                  price )
         VALUES ( transaction.part_number,
                  transaction.description,
                  transaction.price )
 WHEN MATCHED AND transaction.action = 'Change' 
    THEN UPDATE SET part.description 
                       = COALESCE ( transaction.description,  
                                    part.description ),
                    part.price 
                       = COALESCE ( transaction.price,  
                                    part.price )
 WHEN MATCHED AND transaction.action = 'Delete' 
    THEN DELETE
 WHEN MATCHED AND transaction.action = 'Add' 
    THEN UPDATE SET part.exceptions 
                   = STRING ( part.exceptions,
                          'Duplicate Add ignored. ' )
 WHEN NOT MATCHED AND transaction.action = 'Change' 
    THEN INSERT VALUES ( transaction.part_number,
                         transaction.description,
                         transaction.price,
                         'Change accepted, new part inserted. ' )
 WHEN NOT MATCHED AND transaction.action = 'Delete' 
    THEN SKIP;

The MERGE statement works as follows:

  • The INTO clause specifies the output or target table.
  • The USING clause names the input or source table.
  • The ON clause specifies how row matching is done, in this case using the primary keys for both tables.
  • The first WHEN clause says what to do when an 'Add' transaction doesn't match any row in the part table: INSERT a new row.
  • The second WHEN says what to do with a matching 'Change': UPDATE the description and/or price columns.
  • The third WHEN turns a matching 'Delete' transaction into a DELETE operation.
  • The fourth WHEN records an error: 'Duplicate Add ignored.'
  • The fifth WHEN turns a mismatching 'Change' into an INSERT.
  • The last WHEN simply ignores an attempt to 'Delete' a part that isn't there.

In the first run of the MERGE, all it does is execute the first WHEN clause four times to fill the part table:

part_number  description  price  exceptions
===========  ===========  =====  ==========
1            widget       10.00 
2            gizmo        20.00 
3            thingie      30.00 
4            bouncer      40.00 

Here's where it gets interesting; six new transactions that exercise all the remaining WHEN clauses:

TRUNCATE TABLE transaction;
INSERT transaction VALUES ( 1, 'Change', NULL,      15.00 );
INSERT transaction VALUES ( 2, 'Change', 'gizmoid', NULL );
INSERT transaction VALUES ( 3, 'Delete', NULL,      NULL );
INSERT transaction VALUES ( 4, 'Add',    'bouncer', 45.00 );
INSERT transaction VALUES ( 5, 'Change', 'gong',    55.00 );
INSERT transaction VALUES ( 6, 'Delete', NULL,      NULL );

Here's what the part table looks like after running the MERGE again:

part_number  description  price  exceptions
===========  ===========  =====  ==========
1            widget       15.00 
2            gizmoid      20.00 
4            bouncer      40.00  Duplicate Add ignored. 
5            gong         55.00  Change accepted, new part inserted. 

There is a dark side to MERGE, however: it won't let you have multiple input rows updating one output row. That's not SQL Anywhere's fault, it's a Pointy-Haired Committee decision, and I predict it's going to be the Number One complaint about MERGE: "The ANSI SQL/2003 standard does not allow rows in target-object to be updated by more than one row in source-object during a merge operation." The Number Two request? Folks will ask for the ability to call a procedure from a WHEN clause.

8. CREATE EVENT ... TYPE DEADLOCK

A new EVENT type has been introduced for catching and reporting on cyclical deadlocks. Deadlocks sometimes occur on busy systems when two different connections each lock a row and then immediately ask for a lock on the row the other guy's got. Both connections are blocked, neither one can get what it wants, and neither one will ever proceed until some third party takes action to break the deadlock.

SQL Anywhere itself is that third party; it has always been able to detect deadlocks, and it instantly "resolves" them by picking one of the connections as the victim, issuing a ROLLBACK and returning an error message to that application.

So far, so good; one user wins, the other can try again, and the maintenance programmer can start looking for the error... and it almost always is a programming error, some flaw in the order in which SQL statements are executed. The problem is that the standard deadlock error message gives absolutely no clue about what tables or rows were involved in the deadlock, or what either connection was doing at the time. The new CREATE EVENT ... TYPE DEADLOCK statement fixes that by giving you the ability to record and report on all the pertinent details: the connection numbers and user ids for both connections, and the actual SQL statements that were executing at the time the deadlock occurred.

CREATE EVENT deadlock TYPE DEADLOCK HANDLER
BEGIN
   -- Copy fresh entries from sa_report_deadlocks()
   INSERT deadlock (
      snapshotId,
      snapshotAt,
      waiter,
      who,
      what,
      object_id,
      record_id,
      owner,
      is_victim,
      rollback_operation_count )
   SELECT *
     FROM sa_report_deadlocks()
    WHERE NOT EXISTS ( 
       SELECT * FROM deadlock
        WHERE deadlock.snapshotId
            = sa_report_deadlocks.snapshotId        
          AND deadlock.snapshotAt
            = sa_report_deadlocks.snapshotAt );         
   COMMIT;
   MESSAGE STRING ( 'DIAG ', CURRENT TIMESTAMP, 
      ' deadlock' ) TO CONSOLE;
END;

The built-in sa_report_deadlocks() procedure isn't new, but it has been greatly improved for SQL Anywhere 11. It returns a result set containing at least two rows for each deadlock that has occurred since the server started: one for the victim, one for the other connection. The INSERT copies that data into a permanent user-defined table called "deadlock". The WHERE NOT EXISTS is needed to prevent old rows from being copied again when another deadlock happens.

Here's what the table looks like; it has one extra column "row_number" to act as an artificial primary key, plus columns for everything returned by sa_report_deadlocks():

CREATE TABLE deadlock (
   row_number      BIGINT NOT NULL 
                      DEFAULT AUTOINCREMENT,
   snapshotId      BIGINT NOT NULL,
   snapshotAt      TIMESTAMP NOT NULL,
   waiter          INTEGER NOT NULL,
   who             VARCHAR ( 128 ) NOT NULL,
   what            LONG VARCHAR NOT NULL,
   object_id       UNSIGNED BIGINT NOT NULL,
   record_id       BIGINT NOT NULL,
   owner           INTEGER NOT NULL,
   is_victim       BIT NOT NULL,
   rollback_operation_count UNSIGNED INTEGER NOT NULL,
   PRIMARY KEY ( row_number ) );

In order to make this work, you have to tell the server to gather the report data:

SET OPTION PUBLIC.LOG_DEADLOCKS = 'ON';

It also helps a lot if you enable the 'LastStatement' connection property so the report data will show the actual SQL statements that were blocked. The easiest way to do that is to specify the dbsrv11 -zl option when starting the server.

Figure 5 shows what the following query displayed after a deadlock; "waiter" is the connection number, "who" is the user name, "what" is the CONNECTION_PROPERTY ( 'LastStatement' ) and "is_victim" tells you which connection got the ROLLBACK:

SELECT waiter,
       who,
       what,
       is_victim
  FROM deadlock 
 ORDER BY row_number;

Displaying_Deadlock_Details
Figure 5: Displaying Deadlock Details

9. It's The Little Things That Count

In every release of SQL Anywhere, many features are introduced with little or no fanfare because nobody thinks they are significant. Sometimes, one of these features becomes hugely popular and important, like the magic DEFAULT TIMESTAMP, which works with UPDATE as well as INSERT. I call these features The Little Things That Count, and somewhere in the following list is lurking at least one that deserves to be in the Top 10 Cool list, only we don't know it yet:

  • The function call PROPERTY ( 'TCPIPAddresses' ) answers the question "have I connected to the right server?" by telling you what address the server's listening on; e.g.: 192.168.1.51:49270.
  • You can press Ctrl- to comment and uncomment blocks of code in dbisql.
  • You can use the DEFAULT keyword in an UPDATE as well as INSERT; e.g., UPDATE t SET c = DEFAULT.
  • ENDIF and END IF can be used interchangeably; no longer do you need to remember which one goes with the IF statement and which one with the IF expression.
  • Line numbers are displayed next to your code in the dbisql "SQL Statements" frame.
  • You can use the ampersand "&" as a line continuation character in @configuration files, which really helps when you're setting up long parameter strings for High Availability and the built-in HTTP server.
  • JSON or JavaScript Object Notation is now available for web service result sets, in addition to HTML, SOAP, XML and raw formats... maybe JSON will save us all from XML.
  • The new PRIORITY option can be used by an application to increase or decrease the priority level at which its SQL requests are executed. For example, a high-speed OLTP connection can set its priority to 'Critical' while a long-running OLAP query can use 'Background'... or any one of the five other levels between those two extremes.
  • Even if you don't care about UNLOAD into a variable, one of the other enhancements to LOAD and UNLOAD is sure to make you happy: encryption, compression, using LOAD and UNLOAD in dbisql... how about optional transaction logging so you can use LOAD and UNLOAD in a mirrored database environment?
  • Speaking of mirrored databases: now you can run read-only queries against the secondary server in a High Availability setup; now you can move your heavy OLAP sessions off your primary OLTP server and use the secondary server for something other than just failover.
  • The NewPassword connection parameter lets the user specify a new password even if the current one's expired.

Oh, and that business about expired passwords? There's a whole mess of new features under the topic of "Login Policies". It's not in the Top 10 list because, let's face it, security is not cool... but logging in without having to call the DBA, that's priceless.

10. Full Text Search

If this were a "Top 1 List" then full text search would still be on it: it's the number one coolest feature in SQL Anywhere 11. With full text search, you don't have to use a separate piece of software to implement fast searches across multiple LONG VARCHAR columns, and you don't have to store the data outside the database to do it. If you've ever run SELECT LIKE '%word%' queries against a large table, you know what "slow" means, and you're going to love full text search.

Here is an example of how it works, using a table that contains all 6.5 million entries downloaded from the English version of Wikipedia:

CREATE TABLE enwiki_entry ( -- 6,552,490 rows, 17.2G total 
   page_number      BIGINT NOT NULL,
   from_line_number BIGINT NOT NULL,
   to_line_number   BIGINT NOT NULL,
   page_title       VARCHAR ( 1000 ) NOT NULL,
   page_id          VARCHAR ( 100 ) NOT NULL,
   page_text        LONG VARCHAR NOT NULL,
   PRIMARY KEY CLUSTERED ( page_number ) );

The first step is to define a text index on the columns to be searched:

CREATE TEXT INDEX tx_page_text 
   ON enwiki_entry ( page_text ) 
   MANUAL REFRESH; 

The second step is to build the text index, a process that can take quite a long time if the table is very large:

REFRESH TEXT INDEX tx_page_text 
   ON enwiki_entry 
   WITH EXCLUSIVE MODE 
   FORCE BUILD;

Once the index is built, however, queries that use the index are very fast. The following SELECT uses the new CONTAINS clause to find all the Wikipedia entries containing the exact phrase "Ayn Rand":

SELECT score,
       enwiki_entry.page_title,
       LEFT ( enwiki_entry.page_text, 500 ) AS excerpt
  FROM enwiki_entry 
       CONTAINS ( enwiki_entry.page_text, 
                  '"Ayn Rand"' )
 ORDER BY score DESC;

The CONTAINS clause applies the query string '"Ayn Rand"' to the enwiki_entry.page_text column using the full text index previously defined on that column, thus limiting the result set to rows that match. The CONTAINS clause also returns an implicit "score" column, which measures how closely each row matches the query string. The ORDER BY clause uses that column to sort the best matches to the top, and Figure 6 shows the results in a browser display produced by a SQL Anywhere 11 web service.

Full_Text_Search_of_Wikipedia
Figure 6: Full Text Search of Wikipedia

Here's a tip: Don't do like I did and leave out an important column from the CREATE TEXT INDEX statement. In Figure 6 the main Wikipedia entry entitled "Ayn Rand" doesn't appear on the first page, but it should, and it would have if I hadn't forgotten to include the enwiki_entry.page_title column in the index. When a full text index specifies multiple columns the CONTAINS clause calculates a score that counts both columns, and in this case the row with "Ayn Rand" in the title would have received a very high score for the query string '"Ayn Rand"'.

There are many, many options available with full text searching; I've only touched on the basics. Here's one of the extras: If you build an index on two columns, you can refer to one column in the CONTAINS clause and the search will only look at that column... another reason not to make the mistake I did, better to index on more columns, not fewer, you'll have more freedom when designing your queries.

Oh, and by the way: The default boolean operator is AND, just like in Google. In other words, the query strings 'Ayn Rand' and 'Ayn AND Rand' are the same, and they're different from 'Ayn OR Rand'.

About the Author

Breck Carter is principal consultant at RisingRoad Professional Services, providing consulting and support for SQL Anywhere databases and MobiLink synchronization with Oracle, SQL Server and SQL Anywhere. He is also author of the SQL Anywhere blog.

Breck can be reached at breck.carter@gmail.com.



Sybase Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
Sybase Forum
Topic By Replies Updated
DB Error Help Prepared Statements sjulian 2 July 5th, 04:58 AM
DB Error Help Prepared Statements sjulian 0 May 31st, 07:19 AM
Test, just a test XRumerTest 0 May 5th, 04:57 AM
Execute Procedure in SQL statement vcs1161 0 August 24th, 07:38 AM