Introduction
Data explosion brought a need to store both structured and un-structured data together in the database so that the benefits of the database system (like transactional support, backup and restore support, integrated security support, Full-Text Search support, etc.) can be leveraged. SQL Server 2008 introduced the FileStream data type to store unstructured data, such as documents, presentations, videos, audios and images, on the file system with a pointer to the data in the database.
Storage of un-structured data in FileStream improves the performance by leveraging the NTFS APIs streaming along with additional benefits of the database system.
SQL Server 2012 enhanced this capability even further by introducing FileTable, which lets an application integrate its storage and data management components to allow non-transactional access, and provide integrated SQL Server services – including full-text search and semantic search – over unstructured data and metadata. I am going to talk about these two new features in detail.
Understanding FileStream in SQL Server 2008
FileStream data type is implemented as a varbinary(max) column in which the data is stored as BLOBs in the NTFS file system and pointer to the data in the database. In this case the storage is not limited to 2GB as in the case of regular BLOB storage but the size of FileStream columns are limited only by the volume size of the NTFS file system. To specify a column to store BLOB data on the NTFS file system, you need to specify the FileStream attribute on a varbinary(max) column. This causes the SQL Server Database Engine to store all data for that column on the NTFS file system instead of storing it in the database itself and pointer in the database. Storing the BLOB data on the NTFS file system brings several benefits as mentioned below:
- Performance matches the streaming performance of the NTFS file system; Data stored in the FileStream column is not cached in the SQL Server buffer cache but rather is cached in the Windows NT system cache. In other words, SQL Server buffer pool is not used by FileStream data therefore SQL Server buffer pool is available for query processing and is not impacted by the humongous size of FileStream data.
- BLOB size is limited only by the NTFS file system volume size.
- Transactional consistency together with other data in database.
- Integrated security model.
- Backup and restore of the file group, which contains FileStream BLOB data.
- Full-text search works well with the FileStream data column in the same way as it does with other varbinary(max) columns.
- Standard SELECT, INSERT, UPDATE, and DELETE statements work well with FileStream data.
In order to use this new feature, we need to first enable it at the instance level (it can be done during installation, or by modifying the Properties of the SQL Server instance, or by using the sp_configure to change the instance properties) and then creating or modifying the database to have a file group with the FileStream property and then finally creating a table with a column of varbinary(max) data type specified with the FileStream attribute.
Understanding FileTable in SQL Server 2012
FileTable is implemented using FileStream and enhances its capability even further; it allows direct, transactional, access to large data. However, in contrast to FileStream columns, the FileTable table can be configured to also allow non-transactional access, i.e. the ability to access files without prior authorization from the Database Engine.
FileTable is a specialized type of table with a fixed schema, unlike regular a user table with its contents exposed to the Operating System, as if it were a folder in the file system. A FileTable contains FileStream data along with several file level attributes (file_id, name, path locator, created date, modified date, last access time, etc.) and file & directory hierarchy. It means every row in the FileTable represents either a file or directory on the file system.
The file & directory data stored in a FileTable is exposed through a Windows share location for non-transactional file access (the ability to access files without prior authorization from the Database Engine) for Windows API based applications. For a Windows application, this looks like a normal network share location with its files and directories. Applications can use a rich set of Windows APIs to manage the files and directories under this network share location.
Windows API operations are non-transactional in nature, and are not associated with database user transactions. However, transactional access to FileStream data stored in a FileTable is fully supported, as is the case for any other FileStream column in a regular table. FileTable can also be queried and updated through normal Transact-SQL commands. They are also integrated with SQL Server management tools, and features such as backup and restore.
We need to configure FileTable separately from FileStream at the database level. This means, we can continue to use the FileStream feature without enabling non-transactional access or creating FileTable.
Getting Started with FileTable in SQL Server 2012
As I said before, FileStream can be enabled during installation of SQL Server or by modifying Properties of the SQL Server instance or using the sp_configure to change the instance properties with the command given below:
USE master GO EXEC sp_configure 'filestream access level', 2 Go RECONFIGURE GO --You can use this statement to see current --config value and running value EXEC sp_configure filestream_access_level; GO
Once FileStream is enabled at the instance level, we can create or modify a database to have filegroup for FileStream data storage as shown below; also as I want to create a FileTable in the database I have enabled it as well.
USE master GO IF EXISTS (SELECT name FROM sys.databases WHERE name = N'LearnFileTable') DROP DATABASE LearnFileTable GO CREATE DATABASE LearnFileTable --Details of primary file group ON PRIMARY ( NAME = LearnFileTable_Primary, FILENAME =N'D:\FileTable\LearnFileTable_Data.mdf', SIZE = 10MB, MAXSIZE = 50MB, FILEGROWTH = 5MB), --Details of additional filegroup to be used to store data FILEGROUP DataGroup ( NAME = LearnFileTable_Data, FILENAME =N'D:\FileTable\LearnFileTable_Data.ndf', SIZE = 10MB, MAXSIZE = 50MB, FILEGROWTH = 5MB), --Details of special filegroup to be used to store FILESTREAM data FILEGROUP FSDataGroup CONTAINS FILESTREAM ( NAME = FileStream, --FILENAME refers to the path and not to the actual file name. It --creates a folder which contains a filestream.hdr file and --also a folder $FSLOG folder as depicted in image below FILENAME =N'D:\FileTable\FSData') --Details of log file LOG ON (Name = LearnFileTable_Log, FILENAME = 'D:\FileTable\LearnFileTable_Log.ldf', SIZE = 5MB, MAXSIZE = 25MB, FILEGROWTH = 5MB ) WITH FILESTREAM (NON_TRANSACTED_ACCESS = FULL, DIRECTORY_NAME = N'LearnFileTable') --Other option for NON_TRANSACTED_ACCESS is READ_ONLY or OFF GO
You can verify the FileStream and FileTable options enabled for a database using this command:
-- Check the Filestream/FileTable Options SELECT DB_NAME(database_id), non_transacted_access, non_transacted_access_desc, directory_name FROM sys.database_filestream_options WHERE DB_NAME(database_id) = 'LearnFileTable'
We use the following command to create a FileTable in database that has FileStream enabled already. We provide the name of FileTable, FileTable directory name (which will appear on the network share location and which will contain the data for this FileTable) and the name of the collation to be used for file names in the Name column of the FileTable.
USE LearnFileTable GO CREATE TABLE MyFirstFileTable AS FileTable WITH ( FileTable_Directory = 'MyFirstFileTable', FileTable_Collate_Filename = database_default ); GO
If we try creating FileTable without first enabling FileStream on the dataase level, it will fail with following exception:
Msg 1969, Level 16, State 1, Line 1 Default FILESTREAM filegroup is not available in database '<database_name>'
Once you have created a FileTable, you can query the table using the SELECT statement as shown below. As we don’t have any data placed into this table, you can see the query returns no record:
USE LearnFileTable SELECT * FROM [dbo].[MyFirstFileTable]
To explore the share location where the data for a particular FileTable is stored, right click on the FileTable that you created under the FileTables node in the Object Explorer and then click on the “Explorer FileTable Directory” link as shown below:
Here you can see the share location for the FileTable that we created. I have added some files here, now let’s revert back to SSMS to see the data appear on the FileTable in SQL Server:
As you can see in the image above, I have copied three files to the share location and now if I run the SELECT statement again, I see three records appear here, one record for each file; each record has information like name of the file, file type, size, created date, etc. as you can see below:
USE LearnFileTable SELECT * FROM [dbo].[MyFirstFileTable]
Conclusion
In this article, I talked about FileStream and FileTable features of SQL Server to store unstructured data in SQL Server along with other data. The FileStream feature stores unstructured data in the file system and keeps a pointer of the data in the database whereas FileTable extends this feature even further to allow non-transactional access (the ability to access files without prior authorization from the Database Engine from the shared location). In other words, with this feature it will appear that we are managing files in file system rather than in SQL Server and a the same time you can access these files in a transactional way in SQL Server as well.