Data development in a pure open source environment is possible. In larger enterprises and consulting firms, expect to interface with internal and external systems based on a mix of open source, homegrown code, and commercial technologies. Thousands of words could be devoted to pontificating on the ideal of pure environments. That is not this article. Open source database tools are found in many categories, including engine, modeling, script management, metadata, interactive query, semantic layer development, governance and more.
Open source database management tools are proper for professional developers. Is it safe to assume that your team uses software project management, whether Atlassian's JIRA or an open source alternative? While defining the problem, the users, and the approach to solving the problem from the view of each audience, developers select appropriate software tools and environments. A team environment is initiated, including roles for user, application, cloud processing/storage, source code, data, and interchange management, among others.
_____ AS A SERVICE
A growing focus is on XaaS. The '-as-a-Service' in any of Software, Infrastructure, Data, Platform, and other services that might or might not be based on open source technologies while providing a managed service to role-based users. Data integration through a secure and metered API request is offered for most services. There is often a 'Community' or 'Express' version of services and tools. These limited capability versions always limit the number of rows/users/widgets that are allowed in the environment. Most allow a proof of concept to be demonstrated. Ongoing support and access to updated source code often requires licensing with fees. Years ago, open source was expected to be free. Today, supported implementations of open source databases and environments have monthly fees that often exceed $1,000. Altruistic free is not sustainable for complex environments, as seen with Hadoop distributions, integration suites, and performance management assessment tools.
Database management tools are offered in two primary forms: bloatware and components. Bloatware developers think of all possible scenarios and create dialogs and functionality for each of them. What could go wrong? Components do something very well, but it is up to you to build the supply chain. YOU get to define integration. YOU get to map the semantic layer. YOU manage the metadata. YOU get the idea.
In the course of developing the underlying database schema, open source tools for modeling are adept at producing scripts. The scripts are used to programmatically define structured and unstructured data storage. It is possible to open a blank sheet in any of the the modeling tools and begin to create a data model that implements logical and physical models is valuable only after a conceptual data model has been developed and refined through discussions with users and product management. Many of the open source components create precise scripts but are of no help in defining the business domains that make up the conceptual data model. In those cases, X-a-a-S shines. LucidChart is a valued example of an online service for modeling plus conceptual diagramming, including a selection of templates for data models. A superb modeling tool for many would approach bloatware for others. Consider the features and templates and data models provided at mockaroo.com. When building the logical and physical data models, the integration between data generation and modeling is highly valuable for the real-world datatypes and generated data to test a system.
The scripts from most modeling tools are directed toward commercial data engines like Oracle and SQL Server. Open source engines such as PostgreSQL, MariaDB, and Firebird are also widely supported. Storage of JSON packets or documents or audio and video files in a variety of formats is often made in data lakes. Metadata regarding the unstructured data is often written to structured schemas for unmetered historical analysis. Languages support many parsing options on stored, unstructured packets and files.
There are commercial engines that ship with tools for developers, administrators, and users. Data development seldom drives development, and developers have coalesced to environments that are closely tied to languages and testing, including various editions of Visual Studio and Eclipse. Database administrators need to see metadata and to alter scripts. Multi-tab dashboards are populated with out-of-the-box reference scripts. Some of the best ideas for backend scaffolding comes from the aforementioned mockaroo's wealth of schemas.
On the user side, the low-code and no-code services are very informative. Users are increasingly targeted by software vendors that offer low-code and no-code services, usually as a remote system accessed through a browser. Airtable, Salesforce, and ServiceNow provide dozens of commonly used business scenarios through templates and dashboards. Often, the functionality is implemented by user departments without regard to maturities in quality and governance. The schemas can serve as starting points for applications across industries.
The Open Semantic Framework and Open Semantic Search provide the basis for datamining of local documents and datasets. User communities are familiar with search strings and results, so it serves their needs to provide a user experience with which they are familiar as they seek answers - and questions - that are in their [proprietary] data. Is searching local data a prelude to artificial intelligence?
Interactive query is useful to the developer, administrator, and advanced user. Commercial engines provide multiple tools for these audiences. For instance, the Microsoft developer will use SQL Server Developer Tools for integration and reporting tasks, while the administrator will use SQL Server Management Studio for configuration, data profiling, and performance assessments. Similarly, Oracle developers use Oracle Enterprise Manager for many aspects of that underlying engine. Dozens of database management tools are available from developers other than those that develop engines. Open source and commercial tools for data developers have been available for decades. Many of the tools have been acquired by other firms. Some have been changed focus. The Japanese giant, Hitachi, acquired Pentaho, and continues to grow the tools. erwin became a data governance company when then they became a private company and associated with several consultancies focused on governance.
Which tools should you use on your projects? Professional developers will know which tools are able to be supported and fit the culture. The maturity of the people and processes is indicated. From the indicators, a selection of open source, homegrown code, and commercial systems are selected based on needs defined in the project initiation with the customers.
About the Author:
Dave Leininger has been a Data Consultant for 30 years. In that time, he has discussed data issues with managers and executives in hundreds of corporations and consulting companies in 20 countries. Mr. Leininger has shared his insights on data warehouse, data conversion, and knowledge management projects with multi-national banks, government agencies, educational institutions and large manufacturing companies. Reach him at firstname.lastname@example.org.