SYSTEM COMPONENTS FOR DATA EXTRACTION AND PROCESSING FROM INTERNET OF THINGS

Authors

  • Luben Boyanov University of National and World Economy, Bulgaria

Keywords:

Internet of Things, Big data, data extraction, data processing, Hadoop

Abstract

In the last two decades, information technology has developed at a very rapid pace as its products have become very popular and widespread due to their small size and low cost. Apart from mobile phones, laptops and computers and network equipment, many different digital components have entered almost all areas and aspects of human life and activities. Nowadays, there are numerous devices that transmit and/or receive data over the Internet, and with their participation, the Internet and other data processing devices form the modern phenomenon "Internet of Things“ - IoT. This term denotes the connection of sensors and other elements to the global network and fetching measured by the sensors physical values such as temperature, humidity, pressure. Other important components of IoT are digital tags attached to goods, vehicles even living beings, smart devices connecting and controlling processes and appliances in smart offices and smart homes. The whole complex of the listed components generate huge amounts of digital data which is stored in Big data systems.  This data can be structured, semi-structured and unstructured and is used by information systems to report, store and sometimes feedback to the objects from which this data originated. The main blocks of a system for work with IoT data can be defined as: edge IoT sensors and devices, preprocessing edge devices (routers, gateways), communication networks (sensor networks, Internet, mobile systems), brokers (systems to receive and aggregate data) and Big data systems (often build around Hadoop). We discuss all those blocks and present examples. There have been various approaches and architectures for processing Big data from IoT but one of the architectures, that is popular in many others under different names is the Lambda architecture (LA). It is a general, extensible, and fault-tolerant data processing architecture. LA is a way of processing huge amounts of data that provides access to batch processing and stream processing methods with a hybrid approach. Its main functional blocks are presented. The work also considers the most popular open source software for Big data processing – the Hadoop environment. This ecosystem for Big data processing is presented with software tools and components with functions like data extraction, data distribution, data storage, data processing, etc. Examples for performing those functions with the packets HDFS, Fulme, Storm, WiFi, Kafka, Hive, Hue, Spark and Impala are given. The components used for streaming and batch processing data, as in the LA are identified. The paper presents a simplified model using open source components, that can extract data from IoT, make the necessary transformation, store them and process that data according to the requirements of the end users. The system is scalable, flexible and extendable with other modules and components. A successful verification with different IoT data sources has been carried out.

References

Apache Foundation. (2021). The Apache Software Foundation. Welcome to The Apache Software Foundation! https://apache.org/

Apache Hive. (2021, February 15). http://hive.apache.org/

Apache Impala. (2021, February 15). https://impala.apache.org/

Apache Kafka. (2021, February 26). Apache Kafka. https://kafka.apache.org/

Apache NiFi. (2021, February 26). https://nifi.apache.org/

Apache Spark. (2021, February 19). https://spark.apache.org/

Ashton, K. (1999). That ‘Internet of Things’ Thing | RFID JOURNAL. https://www.rfidjournal.com/that-internet-of-things-thing

CEO. (2020, August 10). Top 5 Best Big Data Tools. The CEO Views. https://theceoviews.com/top-5-best-big-data-tools/

Dang, T. A. (2020, October 20). Big Data: Lambda Architecture in a nutshell. Medium. https://levelup.gitconnected.com/big-data-lambda-architecture-in-a-nutshell-fd5e04b12acc

Hiba, J., Hadi, H., Hameed Shnain, A., Hadishaheed, S., & Haji, A. (2015). BIG DATA AND FIVE V’S CHARACTERISTICS. 2393–2835.

Lambiente, F. (2019). Cloudera End-To-Eed IOT Open Architecture. https://www.cloudera.com/content/dam/www/marketing/emea/pdfs/cldr-deloitte-2018/D1T2_IOT_PRESENTATION.pdf

Magnani, M., & Montesi, D. (2004). A Unified Approach to Structured, Semistructured and Unstructured Data (p. 29) [Technical Report]. University of Bologna.

Microsoft Power BI. (2021, February 15). https://powerbi.microsoft.com/en-us/

Suhasini. (2021, February 25). Big Data and the Internet of Things (IoT). https://blogs.mastechinfotrellis.com/big-data-internet-things-iot

Tableau. (2021). Tableau: Business Intelligence and Analytics Software. Tableau. https://www.tableau.com/

Yordanova S., & Stefanova K. (2019). Big Data Challenges-Definition, Characteristics and Technologies. Nauchni trudove, University of National and World Economy, 1, 13–31. http://unwe-research-papers.org/bg/journalissues/list/135

Downloads

Published

2021-08-16

How to Cite

Boyanov, L. (2021). SYSTEM COMPONENTS FOR DATA EXTRACTION AND PROCESSING FROM INTERNET OF THINGS. KNOWLEDGE - International Journal , 47(3), 463–467. Retrieved from https://ikm.mk/ojs/index.php/kij/article/view/4732