Industrial big data

Industrial big data refers to a large amount of diversified time series generated at a high speed by industrial equipment,[1] known as the Internet of things[2]The term emerged in 2012 along with the concept of “Industry 4.0”, and refers to big data”, popular in information technology marketing, in that data created by industrial equipment might hold more potential business values.[3] Industrial big data takes advantage of industrial Internet technology. It uses raw data to support management decision making, so to reduce costs in maintenance and improve customer service.[2]


Big data refers to data generated in high volume, high variety, and high velocity that require new technologies of processing to enable better decision making, knowledge discovery and process optimization.[4] Sometimes, the feature of veracity is also added to emphasize the quality and integrity of the data.[5] However, for industrial big data, there should be two more “V’s”. One is visibility, which refers to the discovery of unexpected insights of the existing assets and/or processes and in this way transferring invisible knowledge to visible values. The other “V” is value. This characteristic also implies that, due to the risks and impacts industry might face, the requirements for analytical accuracy in industrial big data is much higher than other analytics, such as social media and customer behavior.[6][7][8][9]

Industrial big data is usually more structured, more correlated, more orderly in time and more ready for analytics.[6] This is because industrial data is generated by automated equipment and processes, where the environment and operations are more controlled and human involvement is reduced to minimum. Even when machines are more connected and networked, industrial big data possess characteristics:[6] namely:

  • Background

o General “Big Data” analytics often focuses on the mining of relationships and capturing the phenomena. Yet “Industrial Big Data” analytics is more interested in finding the physical root cause behind features extracted from the phenomena. This means effective “Industrial Big Data” analytics will require more domain know-how than general “Big Data” analytics.

  • Broken

o Compared to “Big Data” analytics, “Industrial Big Data” analytics favors the “completeness” of data over the “volume” of the data, which means that in order to construct an accurate data-driven analytical system, it is necessary to prepare data from different working conditions. Due to communication issues and multiple sources, data from the system might be discrete and un-synchronized. That is why pre-processing is an important procedure before actually analyzing the data to make sure that the data are complete, continuous and synchronized.

  • Bad-Quality

o The focus of “Big Data” analytics is mining and discovering, which means that the volume of the data might compensate the low-quality of the data. However, for “Industrial Big Data”, since variables usually possess clear physical meanings, data integrity is of vital importance to the development of the analytical system. Low-quality data or incorrect recordings will alter the relationship between different variables and will have a catastrophic impact on the estimation accuracy.

Therefore, simply transferring the techniques developed for general-purpose big data analytics might not work well for industrial data. Industrial big data requires deeper domain knowledge, clear definitions of analytical system functions, and the right timing of delivering extracted insights to the right personnel to support wiser decision making.[6][10]


Data acquisition, storage and management

As data from automated industrial equipment are being generated at an extraordinary speed and volume, the infrastructure of storing and managing these data becomes the first challenge any industry will face. Different from the tradition business intelligence which mostly focuses on internal structured data and processes that information in regularly occurring cycles,[11] “Industrial Big Data” analytical system requires near real-time analytics and visualization of the results.

The first step is to collect the right data.[10] Since the automation level of modern equipment is getting higher, data are being generated from an increasing number of sensors. Recognizing the parameters are related to equipment status is important to reducing the amount of data necessary to be collected and increase the efficiency and effectiveness of data analytics.

The next step is to build a data management system that will be able to handle large amounts of data and perform analytics in near real-time. In order to enable rapid decision making, data storage, management and processing need to be more integrated.[10] General Electric has built a prototype data storage infrastructure for fleet of gas turbines.[12] The developed in-memory data grids (IMDG)-based system was proved to be able to handle challenging high velocity and high volume data flow while performing near real-time analytics on the data. They believe that the developed technology has demonstrated a viable path to realize batch “Industrial Big Data” management infrastructure. As prices of memory becomes cheaper, such systems will become central and fundamental to future industry.

Cyber-physical systems

Cyber-physical systems is the core technology of industrial big data.[6][7] Cyber-physical systems are systems that require seamless integration between computational models and physical components.[13] Differing from the traditional operation technology, “Industrial Big Data” requires that the decision to be informed from a way wider scope, a central part of which is equipment status. The “5C” (Connection, Conversion, Cyber, Cognition, Configuration) architecture[14] has indicated that cyber-physical systems is focused on transferring raw data to actionable information, understanding process insights, and eventually improve the process by well-informed decision making. Improved processes will further increase productivity and reduce costs. This aligns with the mission of “Industrial Big Data”, which is to reveal insights from the large amount of raw data and turn that information into values. This combines the power of information technology and operation technology to create an information-transparent environment to support decisions for users of different levels.

Application of such techniques has been realized by the NSF Industry/University Collaborative Research Center for Intelligent Maintenance Systems (IMS) on a Cosen bandsaw machine, and demonstrated the technology in IMTS 2014 in Chicago.[7] Adaptive degradation monitoring techniques have been developed by IMS to cope with the high data volume and velocity generated during cutting and the ever-changing load conditions. With the predicted bandsaw degradation condition, users will be advised of the optimal time of bandsaw change, so that safety will be ensured and material failure cuts will be avoided. The developed analytical computation is realized on cloud, and is accessible through the Internet and mobile devices.[7]

Sample repositories

Every unit in an industrial system generates vast amount of data every moment. Billions of data samples are being generated by every single machine per day in a manufacturing line.[1] As an example, a Boeing 787 generates over half a terabyte of data per flight.[15] Clearly the volume of data generated by group of units in an industrial system is far beyond the capability of traditional methods therefore handling, managing and processing it would be a challenge.

In the course of last several years, researchers and companies have actively participated in collecting, organizing and analyzing huge industrial data sets. Some of these data sets are currently available for public usage for research purposes.

NASA data repository[16] is one of the most famous data repositories for Industrial Big Data. Various data sets provided by this repository may be used for predictive analysis, fault detection, prognostics and etc.

ID Repository Name Description of the Data
1 Algae Raceway Data Set 3 small raceways experiment for algae biomass
2 CFRP Composites Data Set Run-to-failure experiment on CFRP panels
3 Milling Data Set Experiments on a milling machine for different speeds, feeds, and depth of cut. Records the wear of the milling insert, VB. The data set was provided by the BEST lab at UC Berkeley.[17]
4 Bearing Data Set Experiments on bearings. The data set was provided by the Center for Intelligent Maintenance Systems (IMS), University of Cincinnati.[18]
5 Battery Data Set Experiments on Li-Ion batteries. Charging and discharging at different temperatures. Records the impedance as the damage criterion. The data set was provided by the Prognostics CoE at NASA Ames.
6 Turbofan Engine Degradation Simulation Data Set Engine degradation simulation was carried out using C-MAPSS. Four different were sets simulated under different combinations of operational conditions and fault modes. Records several sensor channels to characterize fault evolution. The data set was provided by the Prognostics CoE at NASA Ames.
7 IGBT Accelerated Aging Sata Set Preliminary data from thermal overstress accelerated aging using the aging and characterization system. The data set contains aging data from 6 devices, one device aged with DC gate bias and the rest aged with a squared signal gate bias. Several variables are recorded and in some cases, high-speed measurements of gate voltage, collector-emitter voltage and collector current are available. The data set is provided by the Prognostics CoE at NASA Ames.
8 Trebuchet Data Set Trajectories of different types of balls launched from a trebuchet with varying counter weights. Flights were filmed and extraction routines calculated position of data. Both raw video data and extracted trajectories are provided. Geometry and physical properties of the trebuchet are available.
9 FEMTO Bearing Data Set Experiments on bearings’ accelerated life tests provided by FEMTO-ST Institute, Besançon, France.[19]
10 Randomized Battery Usage Data Set Batteries are continuously cycled with randomly generated current profiles. Reference charging and discharging cycles are also performed after a fixed interval of randomized usage in order to provide reference benchmarks for battery state of health.
11 Capacitor Electrical Stress Data Set Capacitors were subjected to electrical stress under three voltage levels i.e. 10V, 12V and 14V. Data Set contains EIS data as well as Charge/Discharge Signal data.


  1. ^ Jump up to:a b “The Rise of Industrial Big Data” (PDF). GE Intelligent Platforms. 2012.
  2. ^ Jump up to:a b Millman, Nick (February 2015). “Big data to unlock value from the industrial internet of things”. Computer Weekly. Retrieved March 19, 2017.
  3. Jump up^ Kelly, Jeff. “The Industrial Internet and Big Data Analytics: Opportunities and Challenges”. Wikibon.
  4. Jump up^ Laney, Douglas. “The Importance of ‘Big Data’: A Definition”. Gartner.
  5. Jump up^ Villanova University. “What is Big Data?”.
  6. ^ Jump up to:a b c d e Lee, Jay (2015). Industrial Big Data. China: Mechanical Industry Press. ISBN 978-7-111-50624-9.
  7. ^ Jump up to:a b c d Lee, Jay (November 19, 2014). “Keynote Presentation: Recent Advances and Transformation Direction of PHM”. Roadmapping Workshop on Measurement Science for Prognostics and Health Management of Smart Manufacturing Systems Agenda. NIST.
  8. Jump up^ Lee, Jay; Kao, Hung-An; Yang, Shanhu. “Service innovation and smart analytics for industry 4.0 and big data environment”. Procedia CIRP16: 3–8. ISSN 2212-8271.
  9. Jump up^ Lee, Jay; Bagheri, Behrad; Kao, Hung-An (2014). “Recent Advances and Trends of Cyber-Physical Systems and Big Data Analytics in Industrial Informatics”. Int. Conference on Industrial Informatics (INDIN) 2014.
  10. ^ Jump up to:a b c Courtney, Brian. “Industrial big data analytics: The present and future”. InTech Magazine.
  11. Jump up^ ABB. “Big Data and decision-making in industrial plants”.
  12. Jump up^ Williams, Jenny Weisenberg; Aggour, Kareem; Interrante, John; McHugh, Justin; Pool, Eric (2014). “Bridging high velocity and high volume industrial big data through distributed in-memory storage & analytics”. Big Data (Big Data), 2014 IEEE International Conference on: 932–941.
  13. Jump up^ National Science Foundation. “Program Solicitation: Cyber-Physical Systems (CPS)”.
  14. Jump up^ Lee, Jay; Bagheri, Behrad; Kao, Hung-An (2015). “A cyber-physical systems architecture for industry 4.0-based manufacturing systems”. Manufacturing Letters3: 18–23.
  15. Jump up^ Finnegan, Matthew (March 6, 2013). “Boeing 787s to create half a terabyte of data per flight, says Virgin Atlantic”. ComputerworldUK.
  16. Jump up^ NASA Prognostics Center of Excellence (PCoE). “PCoE Datasets”. National Aeronautics and Space Administration.
  17. Jump up^ “Best Lab at UC Berkeley”.
  18. Jump up^ “NSF I/UCRC for Intelligent Maintenance Systems (IMS)”.
  19. Jump up^ “FEMTO-ST Institute”.