MonetDB

MonetDB is an open source column-oriented database management system developed at the Wiskunde Centrum & Informatica (CWI) in the Netherlands . It was designed to provide high performance on complex queries against large databases, such as combining tables with millions of rows and millions of rows. MonetDB has been applied in high-performance applications for online analytical processing , data mining , geographic information system (GIS), [1] Resource Description Framework (RDF), [2] text retrieval and sequence alignmentprocessing.[3]

History

Data mining projects in the 1990s required analytical support. This resulted in a CWI spin-off called Data Distilleries, which used early MonetDB implementations in its analytical suite. Data Distilleries eventually became a subsidiary of SPSS in 2003, which in turn was acquired by IBM in 2009. [4]

MonetDB in icts current form Was first created in 2002 by doctoral student Peter Boncz and professor Martin L. Kersten as share of the 1990s’ MAGNUM research project at the University of Amsterdam . [5] Monet, Claude Monet, Claude Claude Monet . The first release under an open source software license (a modified release of the Mozilla Public License ) Was released on September 30, 2004. When MonetDB Version 4 Was released into the open source domain And Many extensions to the base code Were added by the MonetDB / CWI team. These included a new SQL frontend, supporting the SQL: 2003 standard.[6]

MonetDB introduced innovations in all layers of the DBMS : a storage model based on vertical fragmentation, a modern CPU -tuned query execution architecture that often gives MonetDB a speed advantage over the same algorithm over a typical interpreter-based RDBMS . It is one of the first database systems for tune query optimization for CPU caches . MonetDB includes automatic and self-tuning indexes, run-time query optimization, and a modular software architecture. [7] [8]

By 2008, a follow-on project called X100 (MonetDB / X100) started, which evolved into the VectorWise technology. VectorWise was acquired by Actian Corporation , integrated with the Ingres database and sold as a commercial product. [9] [10]

In 2011 a major effort to renovate the MonetDB codebase was started. As part of it, the code for the MonetDB 4 kernel and its XQuery components were frozen. In MonetDB 5, parts of the SQL layer were pushed into the kernel. [6]The resulting results has a difference in internal APIs , as it has been transitioned from MonetDB Instruction Language (MIL) to MonetDB Assembly Language (MAL). Older, no-longer maintained top-level query interfaces were also removed. First was XQuery , which was released on MonetDB 4 and was never ported to version 5. [11] The experimental Jaql interface support was removed with the October 2014 release. [12] With the July 2015 release, MonetDB gained support for read-onlydata sharding and persist indices. In this release the deprecated data streaming module DataCell was also removed from the codebase in an effort to streamline the code. [13] In addition, the license has been changed in the Mozilla Public License, version 2.0 .

Architecture

MonetDB architecture is represented in three layers, each with its own set of optimizers. [14] The front-end is the top layer, providing query interface for SQL , with SciQL and SPARQLinterfaces under development. Queries are parsed into domain-specific representations, like relational algebra for SQL, and optimized. MonetDB Assembly Language (MAL) instructions, which are passed to the next layer. The middle or back-end layer provides cost-based optimizers for the MAL. The bottom layer is the kernel database, which provides access to the stored data in Binary Association Tables (BATs). Each BAT is a table consisting of an Object-identifier and value columns, representing a single column in the database. [14]

MonetDB internal data representation aussi subsequently assembled on the memory Addressing ranges of contemporary CPUs using demand paging of memory mapped files, and THUS Departing from traditional DBMS management of complex designs Involving wide data stores in limited memory.

Query Recycling

Query recycling is an architecture for reusing the byproducts of the operator-at-a-time DBMS paradigm in a column store. Recycling makes use of the generic idea of ​​storing and reusing the results of expensive computations. Unlike low-level statement caches, query recycling uses an optimizer to pre-select statements to cache. The technique is designed to improve the search for response times and throughput, while working in a self-organizing fashion. [15] The authors from the CWI Database Architectures group, composed of Milena Ivanova, Martin Kersten , Niels Nes and Romulo Goncalves, won the “Best Paper Runner Up” at the ACM SIGMOD 2009 conference for their work on Query Recycling. [16] [17]

Database Cracking

MonetDB was one of the first databases to introduce Database Cracking. Database Cracking is an incremental partial indexing and / or sorting of the data. It directly exploits the columnar nature of MonetDB. Cracking is a technique that shifts the cost of indexing maintenance from updates to query processing. The query pipeline optimizers are used to massage the query plans to crack and to propagate this information. The technique allows for improved access times and self-organized behavior. [18] Database Cracking received the ACM SIGMOD 2011 J.Gray best dissertation award. [19]

Components

A number of extensions exist for MonetDB that extend the functionality of the database engine. Due to the three-layer architecture, top-level query interfaces can benefit from optimizations in the backend and kernel layers.

SQL

MonetDB / SQL is a top-level extension, which provides complete support for transactions in compliance with the SQL: 2003 standard. [14]

GIS

MonetDB / GIS is an extension to MonetDB / SQL with support for the Simple Access Features Standard of Open Geospatial Consortium (OGC). [1]

SciQL

SciQL an SQL-based query language for science applications with arrays as first class citizens. SciQL allows MonetDB to effectively function as an array database . SciQL is used in the European Union PlanetData and teleiosproject, together with the Data Vault technology, providing good access to transparent wide scientific data repositories. [20] Data Vaults the data from the distributed repositories to SciQL arrays, allowing for the handling of spatio-temporal data in MonetDB. [21] SciQL will be further extended for the Human Brain Project . [22]

Data Vaults

Data Vault is a database-attached external file repository for MonetDB, similar to the standard SQL / MED . The Data Vault enables you to integrate with distributed / remote file repositories. It is designed for scientific data data exploration and mining , SPECIFICALLY for remote sensing data. [21] There is support for GeoTIFF ( Earth observation ), FITS ( astronomy ), MiniSEED ( seismology ) and NetCDF formats. [21] [23]The data is stored in the file repository in the original format, and loaded in the database in a lazy fashion, only when needed. The system can also process the data upon ingestion, if the data format requires it. [24] As a result, even very large file repositories can be analyzed, as only the required data is processed in the database. The data can be accessed through the MonetDB SQL or SciQL interfaces. The Data Vault technology was used in the European Union ‘s TELEIOSproject, which was aimed at building a virtual observatory for Earth observation data. [23] Data Vaults for FITS files have been used for processing astronomical surveydata for the INT Photometric H-Alpha Survey (IPHAS) [25] [26]

SAM / BAM

MonetDB has a SAM / BAM module for efficient processing of sequence alignment data. Aimed at the bioinformatics research, the module has a SAM / BAM data loader and a set of SQL UDFs for working with DNA data. [3] The module uses the popular SAMtools library. [27]

RDF / SPARQL

MonetDB / RDF is a SPARQL -based extension for working with linked data, which adds support for RDF and allowing MonetDB to function as a triplestore . Under development for the Open Data Link 2 project. [2]

R integration

MonetDB / R Module Allows for UDFs written in R to be Executed in the SQL layer of the system. This is done using the native R support for running embedded in another application, inside the RDBMS in this case. Previously MonetDB.R connector allowed using MonetDB data sources and processes in R session. The newer R integration feature of MonetDB does not require data transfer between RDBMS and the R session, reducing overhead and improving performance. The feature is intended to give users access to the functions of statistical data for in-line analysis of data stored in the RDBMS. It complements the existing support for C UDFs and is intended for usein-database processing . [28]

Python integration

Similarly to the embedded R UDFs in MonetDB, the database now has support for UDFs written in Python / NumPy . The implementation uses Numpy arrays (themselves Python wrappers for C arrays), providing a functional Python integration with speed matching native SQL functions. The embedded Python functions also support mapped operations, allowing user to execute Python functions in parallel within SQL queries. The practical side of the feature gives users access to Python / NumPy / SciPy libraries, which can provide a wide selection of statistical / analytical functions. [29]

MonetDBLite

Following the release of R ( MonetDB.R ) and R UDFs in MonetDB (MonetDB / R), the authors created an embedded version of MonetDB in R called MonetDBLite . It is distributed as an R package, removing the need to manage a database server, required for the previous R integrations. The DBMS runs within the R process itself, eliminating socket communication and serialization overhead – greatly improving efficiency. The idea behind it is to deliver an SQLite -like package for R, with the performance of an in-memory optimized columnar store. [30]

Former extensions

A number of forms have been deprecated and removed from the stable code base over time. Some notable examples include an XQuery extension removed in MonetDB version 5; a JAQL extension, and a streaming dataextension called Data Cell . [14] [31] [32]

See also

  • List of relational database management systems
  • Comparison of Relational Database Management Systems
  • Database management system
  • Column-oriented DBMS
  • Array DBMS

References

  1. ^ Jump up to:b “GeoSpatial – MonetDB” . March 4, 2014.
  2. ^ Jump up to:b “MonetDB – LOD2 – Creating Knowledge out of interlined Data” . March 6, 2014.
  3. ^ Jump up to:b “Life Sciences in MonetDB” . November 24, 2014.
  4. Jump up^ “A short history about us – MonetDB” . March 6, 2014.
  5. Jump up^ Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications (PDF) . Ph.D. Thesis . University of Amsterdam. May 2002.
  6. ^ Jump up to:b MonetDB historic background
  7. Jump up^ Stefan Manegold (June 2006). “An Empirical Evaluation of XQuery Processors” (PDF) . Proceedings of the International Workshop on Performance and Evaluation of Data Management Systems (ExpDB) . ACM. 33 (2): 203-220. doi : 10.1016 / j.is.2007.05.004 . Retrieved December 11, 2013 .
  8. Jump up^ PA Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, J. Teubner. MonetDB / XQuery: A Fast XQuery Processor Powered by a Relational Engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, June 2006.
  9. Jump up^ Marcin Zukowski; Peter Boncz (May 20, 2012). “From x100 to vectorwise: opportunities, challenges and things most researchers do not think about, chapter: From x100 to vectorwise”. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM: 861–862. doi:10.1145/2213836.2213967. ISBN 978-1-4503-1247-9.
  10. Jump up^ Inkster, D .; Zukowski, M .; Boncz, PA (September 20, 2011). “Integration of VectorWise with Ingres” (PDF) . ACM SIGMOD Record . ACM. 40 (3): 45. doi : 10.1145 / 2070736.2070747 .
  11. Jump up^ “XQuery” . December 12, 2014.
  12. Jump up^ “MonetDB Oct2014 Release Notes” . December 12, 2014.
  13. Jump up^ “MonetDB July 2015 Released” . 31 August 2015.
  14. ^ Jump up to:d Idreos, S .; Groffen, FE; Nes, NJ; Manegold, S .; Mullender, KS; Kersten, ML (March 2012). “MonetDB: Two Decades of Research in Column-Oriented Database Architectures” (PDF) . IEEE Data Engineering Bulletin . IEEE: 40-45 . Retrieved March 6, 2014 .
  15. Jump up^ * Ivanova, Milena G; Kersten, Martin L; Nes, Niels J; Goncalves, Romulo AP (2010). “An architecture for recycling intermediates in a column-store”. ACM Transactions on Database Systems. ACM. 35 (4): 24. doi:10.1145/1862919.1862921.
  16. Jump up^ “CWI database team wins Best Paper Runner Up at SIGMOD 2009” . CWI Amsterdam . Retrieved 2009-07-01 .
  17. Jump up^ “SIGMOD Awards” . ACM SIGMOD . Retrieved 2014-07-01 .
  18. Jump up^ Idreos, Stratos; Kersten, Martin L; Manegold, Stefan (2007). Database cracking . Proceedings of CIDR .
  19. Jump up^ “SIGMOD Awards” . ACM SIGMOD . Retrieved 2014-12-12 .
  20. Jump up^ Zhang, Y .; Scheers, LHA; Kersten, ML; Ivanova, M .; Nes, NJ (2011). “Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data”. Astronomical Data Analysis Software and Systems .
  21. ^ Jump up to:c Ivanova, Milena; Kersten, Martin; Manegold, Stefan (2012). Data vaults: a symbiosis between database technology and scientific file repositories . Springer Berlin Heidelberg. pp. 485-494.
  22. Jump up^ “SCIQL.ORG” . March 4, 2014.
  23. ^ Jump up to:b Ivanova, Milena; Kargin, Yagiz; Kersten, Martin; Manegold, Stefan; Zhang, Ying; Datcu, Mihai; Molina, Daniela Espinoza (2013). “Data Vaults: A Database Welcome to Scientific File Repositories”. SSDBM. ACM. doi : 10.1145 / 2484838.2484876 . ISBN  978-1-4503-1921-8 .
  24. Jump up^ Kargin, Yagiz; Ivanova, Milena; Zhang, Ying; Manegold, Stefan; Kersten, Martin (August 2013). “Lazy ETL in Action: Scientific Dates ETL Technology.” VLDB Endowment Proceedings . 6 (12). VLDB Endowment. pp. 1286-1289. doi : 10.14778 / 2536274.2536297 . ISSN  2150-8097 .
  25. Jump up^ “Astronomical data analysis with MonetDB Data Vaults” . 2015-09-09.
  26. Jump up^ “Data Vaults” . 2015-09-09.
  27. Jump up^ “SAM / BAM installation” . November 24, 2014.
  28. Jump up^ “Embedded R in MonetDB” . November 13, 2014.
  29. Jump up^ “Embedded Python / NumPy in MonetDB” . 11 January 2015.
  30. Jump up^ “MonetDBLite for R” . November 25, 2015.
  31. Jump up^ “Xquery (obsolete)” . MonetDB . Retrieved 2015-05-26 .
  32. Jump up^ “Announcement: New Oct2014 Feature release of MonetDB suite” . MonetDB . Retrieved 2015-05-26 .