Free Newsletters:
DatabaseJournal  
DBANews
Search Database Journal:
 
HOME News MS SQL Oracle DB2 Access MySQL PostgreSQL PHP SQL Etc Scripts Links Discussion
internet.com

» HOME
» NEWS
» FEATURES
» SERIES
MS SQL
Oracle
MS Access
MySQL
DB2
» RESOURCES
Products
Scripts
Links
» DISCUSSION
» TECH JOBS

Marketplace Partners
Be a Marketplace Partner




internet.commerce
Be a Commerce Partner
Shop Online
Shop
Computer Deals
Best Price
Data Center Solutions
Promotional Products
Imprinted Gifts
Disney World Tickets
KVM Switch over IP
Compare Prices
Find Software
Web Design
Promotional Golf
Condos For Sale




MySpace Joins eBay, Yahoo in Open Profile Push

News Corp. Unit Under Fire for Ties to Hacker

Are Non-PC Devices Hurting 'Net Innovation?

internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Linked Data Planet Conference & Expo

CA ERwin® Data Modeler Proven database design and modeling. Efficiently analyze, design and deploy effective database solutions. Whitepaper: Manage SQL Server Deployments
Try it free: CA ERwin® Data Modeler


Solaris 8 Migration Assistant
Rapidly move your Solaris 8 application environments to new systems running Solaris 10 with the Solaris 8 Migration Assistant. Reduce migration risk while taking advantage of increased performance, reliability and security of the latest SPARC hardware platforms and Solaris 10 OS. »

 
Sun Eco Innovation: Good for Business, Good for the Environment
A complete solution to help you optimize and refresh your datacenter while properly recycling equipment and eliminating eWaste, including money-saving promotions to lower hardware acquisition costs. »

 
Sun Eco Innovation: Power Calculators
Power consumption has increasingly become a priority in customer's minds when purchasing new systems or storage. Sun's Power Calculators provide data on power consumption of Sun products allowing IT managers to better plan the power requirements in the datacenter to achieve better energy and cost savings. »

 
Optimize the Web Tier: Consolidate to Get More Performance in Less Space and Lower Power Consumption
Expansion in the Web tier is generally accomplished by adding more servers whenever extra capacity is needed. As the pool of servers grows larger, however, the complexity of the environment can grow exponentially. »

Production Manager (hands on)
Aquent
US-MA-Cambridge

Justtechjobs.com Post A Job | Post A Resume
MS SQL
February 3, 2004
Data Mining Algorithms: Microsoft SQL Server 2000 vs. "Yukon" SQL Server
By Alexzander Nepomnjashiy

This article describes a well-known concept, (Data Mining algorithms, built into Microsoft SQL Server 2000 Analysis Services) and what I would like to see in the final "Yukon" SQL Server release (i.e. my expectations in a field of new / improved data mining algorithms).

What do we know already? According to SQL Server 2000 Books On-Line: "Central to the data mining process, data mining algorithms determine how the cases for a data mining model are analyzed. Data mining model algorithms provide the decision - making capabilities needed to classify, segment, associate and analyze data for the processing of data mining columns that provide predictive, variance, or probability information about the case set...

Many data mining algorithms are goal-oriented; given a case set, a data-mining algorithm will predict something about the case, usually an attribute of the case itself. Most algorithms require a training set of cases where the attributes to be predicted are already known, at which point the algorithm constructs a data mining model capable of predicting these attributes for cases in which the attributes are unknown".

Two data mining algorithms are built-in into Microsoft SQL Server 2000 Analysis Services: Microsoft Decision Trees and Microsoft Clustering.

Just a theory . . .

Cluster

A set of similar cases.

Clustering

The development of a model that labels a new instance as a member of a group of similar records (a cluster). See clustering algorithms. For example, clustering could be used by a company to group customers according to income, age, prior purchase behavior. Cluster detection rarely provides actionable information, but rather feeds information to other data mining tasks. (Reference: Barry, M. and Linoff, G. Data Mining Techniques. 1997. "Chapter 10 - Automatic Cluster Detection).

Clustering Algorithms

Given a data set, these algorithms induce a model that classifies a new instance into a group of similar instances. Commonly the algorithms require that the number of (c) clusters to be identified is pre-specified. E.g. find the c=10 best clusters. Given a distance metric, these algorithms will try to find groups of records that have low distances within the cluster but large distances with the records of other clusters. Reference: Hair, J. F. et al, (1998) "Multivariate Data Analysis", 5th edition, Chapter 9, pages 469-517).

Decision Tree

A model made up of a root, branches and leaves. Decision trees are similar to organization charts, with statistical information presented at each node.

Decision Tree Algorithm

An algorithm that generates classification or estimation models from the fields of Machine Learning and Statistics. The basic approach of the algorithm is to use a splitting criterion to determine the most predictive factor and place it as the first decision point in the tree (the root), and continually perform this search for predictive factors to build the branches of the tree until there is no more data to continue with. Tree pruning raises accuracy on noisy data and can be performed as the tree is being constructed (pre-pruning), or after the construction (post-pruning). The algorithm is commonly used for classification problems that require the model represented in a human-readable model . . .

How does SQL Server Books On-Line describe both of these algorithms? Let's take a look . . .

Microsoft Decision Trees

The Microsoft Decision Trees algorithm uses classification techniques to analyze data. It then constructs one or more decision trees that can be used to predict attributes or values for new data. For example, you can use this algorithm to analyze credit history data and predict the credit risk of new applicants . . .

Microsoft Clustering

The Microsoft Clustering algorithm uses the nearest neighbor method to group records into clusters that share similar characteristics. Often, these characteristics may be hidden or not intuitive . . ."

That's all, (only 2 algorithms) for the current SQL Server release. What about "Yukon" SQL Server? For now, it is an unknown, but I would like to see the following in "Yukon":

  • The ability to use data minig algorithms from third party providers as well as the ability to integrate them into the SQL Server environment;
  • A set of a NEW (!) data mining algorithms, to build mining models more quickly than now;
  • Data mining algorithms, combining both sequence analysis and clustering analysis;
  • Data mining algorithms based on a modern "artificial intelligence" term.

Finally, for those who are interested in Business Intelligence / Data Mining topics I'd like to provide a few excerpts (on my opinion they contains interesting links and info) from a past Microsoft TechNet Chat (check the full Technet chat transcript at: http://www.microsoft.com/technet/treeview/default.asp?url=/technet/itcommunity/chats/trans/sql/sql0123.asp): .

Q: Hi! Can you say anything about the new Data Mining algorithms to be included in Yukon?

A: We will have some new DM algorithms in Yukon, however, At this stage, we are not yet ready to give the list of new features in Yukon as we are in the middle of development cycle..

Q: Which third party companies' tools work best with MS Data Mining Tools?

A: You can try Angoss and DBMiner's products. They both have algorithm providers. Angoss also has some UI controls.

A: Here is the link to the Data Mining Performance paper: http://www.microsoft.com/SQL/evaluation/compare/AnalysisDMWP.asp

Q: From my own experience as a SQL Server instructor, I have seen that most SQL Server users do not think about the advantages they could have by using data mining techniques. Perhaps if they had some clear case studies about it, you could

A: Actually MSDN just posted an excellent example using Microsoft Data Mining for cross-sell at an online bookstore - check out http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnduwon/html/D52design.asp

Other good resources are the newsgroup microsoft.public.sqlserver.datamining and the community site http://communities.msn.com/AnalysisServicesDataMining. These are monitored frequently by the DM dev team . . ." (check the full Technet chat transcript at:
http://www.microsoft.com/technet/treeview/default.asp?url=/technet/itcommunity/chats/trans/sql/sql0123.asp)

Just before writing this article for DatabaseJournal.com, I found an article, describing what to expect from the "Yukon" SQL Server (in a field of new / improved Data Mining algorithms). Check the Technet online article at http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/next/DWSQLSY.asp, authored by Joy Mundy. "Overview of Business Intelligence and Data Warehousing in SQL Server Yukon" contains an overview of the "Yukon" SQL Server's new features from a Data Warehousing professional's point of view.

» See All Articles by Columnist Alexzander Nepomnjashiy

Tools:
Add databasejournal.com to your favorites
Add databasejournal.com to your browser search box
IE 7 | Firefox 2.0 | Firefox 1.5.x
Receive news via our XML/RSS feed

MS SQL Archives

Whitepaper: Enterprise Information Integration--Deployment Best Practices for Low-Cost Implementation
IT in 2018: Download Free eBook By The Author Of "Does IT Matter?" Simple Registration Is Required.
Download: SQL Compare Pro 6--The fastest, easiest way to compare and synchronize two databases.
Whitepaper: HP Integrated Citrix XenServer for HP ProLiant Servers. Sponsored by HP, Citrix, and Intel.
Download: SQL Backup & DBA Best Practices eBook


Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
How To Transfer Access Data Records To SQL ?? ankurdjariwala 1 May 8th, 12:24 PM
problem with federated server linking majidkhan 1 April 29th, 10:00 AM
"SELECT rowguidcol" from tables on linked servers? brentbordelon 1 April 25th, 04:12 PM
"SELECT rowguidcol" vs. "SELECT <actual name>" rgarrison 9 April 16th, 03:46 PM







JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers

Solutions
Whitepapers and eBooks
Microsoft Article: HyperV-The Killer Feature in WinServer ‘08
Avaya Article: How to Feed Data into the Avaya Event Processor
Microsoft Article: Install What You Need with Win Server ‘08
HP eBook: Putting the Green into IT
Whitepaper: HP Integrated Citrix XenServer for HP ProLiant Servers
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 1
Intel Go Parallel Portal: Interview with C++ Guru Herb Sutter, Part 2--The Future of Concurrency
Avaya Article: Setting Up a SIP A/S Development Environment
IBM Article: How Cool Is Your Data Center?
Microsoft Article: Managing Virtual Machines with Microsoft System Center
HP eBook: Storage Networking , Part 1
Microsoft Article: Solving Data Center Complexity with Microsoft System Center Configuration Manager 2007
MORE WHITEPAPERS, EBOOKS, AND ARTICLES
Webcasts
Intel Video: Are Multi-core Processors Here to Stay?
On-Demand Webcast: Five Virtualization Trends to Watch
HP Video: Page Cost Calculator
Intel Video: APIs for Parallel Programming
HP Webcast: Storage Is Changing Fast - Be Ready or Be Left Behind
Microsoft Silverlight Video: Creating Fading Controls with Expression Design and Expression Blend 2
MORE WEBCASTS, PODCASTS, AND VIDEOS
Downloads and eKits
Sun Download: Solaris 8 Migration Assistant
Sybase Download: SQL Anywhere Developer Edition
Red Gate Download: SQL Backup Pro and free DBA Best Practices eBook
Red Gate Download: SQL Compare Pro 6
Iron Speed Designer Application Generator
MORE DOWNLOADS, EKITS, AND FREE TRIALS
Tutorials and Demos
How-to-Article: Preparing for Hyper-Threading Technology and Dual Core Technology
eTouch PDF: Conquering the Tyranny of E-Mail and Word Processors
IBM Article: Collaborating in the High-Performance Workplace
HP Demo: StorageWorks EVA4400
Intel Featured Algorhythm: Intel Threading Building Blocks--The Pipeline Class
Microsoft How-to Article: Get Going with Silverlight and Windows Live
MORE TUTORIALS, DEMOS AND STEP-BY-STEP GUIDES