Water Quality Monitoring for Smart Farming Using Machine Learning Approach

Water quality in fish farming environments has been a topic of research investigation for numerous years. While most studies have concentrated on managing water quality in fish ponds, there is a lack of research on implementing these practices on a commercial scale. Maintaining good water quality helps prevent disease, stress, and death in fish, resulting in higher yields and profits in fish farming operations. In our study, we gathered weekly data from two fish ponds in the Lintangsongo smart farming area over six months. To deal with the limited dataset, we utilized methods for reducing dimensionality, like the pairwise comparison of correlation matrices to eliminate the highest correlated predictors. We used techniques of feature selection, including XGBoost classification, and apart from that, we used Recursive Feature Elimination (RFE) to determine the importance of features. This analysis identified ammonium and calcium as the top two predictors. These nutrients played a vital role in maintaining the paired cultivation system and promoting the robust development of Nile tilapia fish and water spinach. This process of detecting and distributing nutrients persists until the desired quantities of ammonium and calcium are reached. During each cycle, 0.7 g of ammonium sulfate and calcium nitrate are distributed, and the nutrient levels are assessed. Vernier sensors were employed for assessing nutrient values, and a system of actuators was integrated to supply the necessary nutrients to the smart farming environment using the closed-loop concept. This research investigates water quality management practices in fish farming, assesses their impact on fish health and profitability, identifies key water quality predictors, and implements a closed-loop system for nutrient delivery.


I. INTRODUCTION
Lintangsongo smart farming is expected to be able to provide education regarding the use of smart farming technology to Islamic boarding school caregivers and the surrounding community [1].The main problem that will be solved in this research is implementing and testing the concept of smart farming by implementing an IoT-based pond water quality monitoring tool using a machine learning approach.This research aims to increase agricultural and fisheries results at the Lintangsongo Islamic boarding school.
Water quality in fish farming refers to water composition, including living organisms, energy substances, and other components.It is assessed based on physical, chemical, and biological parameters.These parameters play a crucial role in the success of fish farming, as their interaction can impact the occurrence of diseases.Various factors such as pests, diseases, the organisms themselves, and the overall environment contribute to water quality in fish farming [2].Apart from these factors, maintaining low levels of ammonia and nitrite, ensuring chemical cleanliness, appropriate pH, hardness, and temperature, as well as minimizing organic pollutants and ensuring stability, are equally important for the well-being of fish [3]- [5].
The advancement of technology has brought about changes in societal needs.With various technological innovations, tasks humans traditionally carried out can now be automated [6], [7].This allows individuals to focus on more significant and purposeful activities, ultimately enhancing productivity [8].Similarly, when it comes to freshwater fish cultivation conducted by Pesantren Lintangsongo, special attention is required due to the sensitivity of these creatures to changes in their environment.Hence, the development of an IoT (Internet of Things) based device becomes crucial in facilitating the automatic monitoring of water quality factors such as pH levels, temperature, total dissolved solids (TDS), dissolved oxygen (DO) levels, as well as the levels of ammonia and nitrite in the pond, without the need for human intervention.The aquaponics system combines hydroponics and fish pond aquaculture to address food scarcity and environmental challenges [9].By integrating these two methods, aquaponics has the potential to significantly enhance food production while avoiding the utilization of harsh chemicals or excessive amounts of water.Studies have shown that aquaponics can attain a significant boost in output by adopting sustainable fish farming methods that require substantially 2-10% of the water compared to conventional techniques.However, despite extensive research on growing fish and plants in aquaponic systems, there is a lack of specific focus on monitoring and controlling essential nutrients based on the crop's cultivation season.
Numerous investigations have explored the implementation of various Internet of Things (IoT) systems in controlled laboratory environments to optimize plant growth in hydroponic and aquaponic settings [10], [11].A study [12] developed a system to cultivate Nile tilapia fish and water spinach; the pH levels and temperature of the water were constantly monitored.Additionally, an actuation system was developed to ensure that the pH levels were kept within the range of 6.0 to 7.5, and the water temperature was maintained between 25°C to 30.°C.This was done to create the most favorable conditions for the growth of these organisms.Farmanullah et al. [13] A comprehensive evaluation was performed to analyze the utilization of IoT and smart technologies in overseeing and managing levels of electroconductivity, nitrite concentrations, dissolved oxygen, and water hardness in aquaponic solutions.The goal was to establish a viable business model for semi-automated systems on a limited scale.The study [14] conducted A thorough examination to assess the development of water spinach by conducting experiments with varying quantities of plasma-activated water, adjusting voltage levels, and modifying time intervals.Researchers can investigate the effects of these factors on hydroponic systems.The objective was to reduce the absorption of toxic metals and improve the yield of crops.The various cloud-based methods were utilized to oversee and control essential variables such as pH levels and water temperature for Nile tilapia fish and other green plants [15].Nevertheless, these green plant systems lack a data-driven approach to regulating and overseeing vital nutrients or developing fish and plants [16].
Developing an intelligent system for aquaponic systems faces a major challenge in the form of limited available data [9][16].Researchers have implemented different approaches to generate synthetic data to overcome this obstacle.In one study conducted by Soltana et al. [17], an iterative method was used to produce data samples that adhere to a desired statistical distribution, although logical constraints were initially overlooked.Any violations that occurred were later rectified through adjustments [18].The researchers [19] suggested a method specifically developed to create artificial data for IoT devices.This approach included analyzing the XML structure and describing the values of the data-contained dataset.The study [20] was developed to generate sensory data by combining a Mixture Density1 Network (MDN) and a network of Long Short-Term Memory (LSTM).A discriminator based on LSTM was utilized to distinguish between real and synthetic data.Likewise, Hernandez et al. [18]introduced SynSys as a machine learning method to fabricate artificial data.
Furthermore, previous research has explored the idea of a correlation matrix to tackle the problem of strongly correlated predictors [21].Various machine learning algorithms, such as the Multi-Layer Perceptron (MLP), Decision Trees, Random Forest, and Support Vector Machine (SVM), were utilized to create a Histogram of Oriented Gradients (HOG) to extract features [22].The comparative analysis of various classification algorithms [23] achieved a 92% accuracy rate in detecting cardiovascular diseases using ExtraTreesClassifier.The healthcare industry has applied the XGBoost classifier, a technique known for its effectiveness, to identify the importance of features and reduce the size of datasets.This has led to improved accuracy in classifying healthcare data.
The careful choice of parameters with various feature selection techniques is vital in an IoT configuration to ensure the effectiveness of aquaponic systems.This process is crucial for automating the essential parameters that foster the best possible development of fish and plants.For example, a system of smart irrigation leveraging a humidity sensor and temperature and a Raspberry Pi was used to observe weather conditions for cultivating rice [24]- [26].Another example from research [27] is the Zigbee model is utilized to monitor the growth of Chinese citrus by employing sensors that track real-time variations in atmospheric moisture, atmospheric temperature, soil moisture, and soil temperature.Moreover, a study [28] introduced a plant growth system that incorporates sensors for monitoring air quality, light intensity, and soil moisture, along with actuators controlled by a microcontroller.Similar concepts from research [29] include a wireless sensor network for monitoring and controlling environmental parameters in a greenhouse [30], which developed a system utilizing a Raspberry Pi module to measure parameters in a greenhouse and display data on a cloud platform.In contrast to regulating the chemical properties of the solution or the environmental parameters of the fish pond, our research focuses on measuring and regulating important water quality in the aquaponic fish pond using data-driven approaches.

II. METHOD
Our research relied on a data set gathered over six months from two aquaponics facilities in the fish ponds area of Smart Farming Lintangsongo.Throughout this period, we collected two samples every weekone from the pond housing koi fish, another from the pond housing tilapia fish, and one from the area where water spinach was cultivated.These samples were then sent to the Lintangsongo Testing Laboratory for analysis to ascertain the pond water's main nutrient content.The nitrate level in the pond water was determined by converting nitrite to nitrate using a cadmium column and then analyzing it using spectrophotometry.Chromatography was utilized to reduce chloride levels in the aquaponic solution, while acid titration with sulfuric acid was employed to measure the concentration of carbonate or bicarbonate.This research was conducted within the Smart Farming Lintangsongo area, encompassing the fish ponds and the hydroponic system in Fig. 1 and 2.

A. Dataset Analysis
The dataset used for analysis in our research consisted of 124 observations and ten predictors.The study's predictors were sulfate, chloride, calcium, carbonate, nitrate, ammonium, magnesium, sodium, boron, and phosphorus, which were all measured in parts per million (ppm).The classification variable was established according to the recorded observation months.Months falling within the winter and spring period (June to November) were assigned a code of 1, signifying plant cultivation, while summer and rainy months (September to November) were coded as 0. The initial analysis excluded consideration of carbonate concentration in ppm due to its consistent absence of variation.

B. Feature Selection
It was determined that the dataset in its original form had a multivariate normal distribution.As a result, no adjustments were made to standardize the dataset before implementing methods to identify the most significant features.To handle the multicollinearity issue and enhance the model's interpretability, a correlation matrix was computed to identify and eliminate predictors that had a strong correlation with each other.The importance of the remaining features was assessed using the XGBoost classifier after eliminating those with the highest correlation.Ensemble models in the form of boosted decision trees were developed to evaluate the significance of these features.
Additionally, the results from multiple decision trees that were not correlated with each other were combined using the ExtraTreesClassifier.This ultimately produced the final feature importance.The two most important nutrients were chosen, monitored, and controlled using an IoT configuration.
To lower the expenses of the IoT system for monitoring and managing nutrient parameters, selecting the most significant parameters was imperative.To achieve for the objective, three techniques were employed to reduce the complexity of the dataset.The process is illustrated in Fig. 3.

C. Synthetic Data Generation
Because of the limitations of the small dataset, it was crucial to generate artificial data before applying any data-driven approaches.The dataset initially comprised 60 observations for Class 0 and 64 observations for Class 1.Using these estimated parameters, it created synthetic data that independently calculated each class's mean vectors and covariance matrices.The synthetic data points were generated.The entire dataset was also used iteratively to create a second set of synthetic data without considering the class.Ultimately, these two datasets were merged, and various techniques for feature selection were employed to identify the two main predictors of macronutrients.

D. IoT System
Development of the IoT water quality monitoring system.It was separated into three components: the sensor subsystem, the feedback loop, and the actuator system.The sensor subsystem comprised two sensors that measured water quality and sent the gathered information to the Raspberry Pi via a USB cable.
In choosing the sensors, various aspects were considered, including their cost, necessary interface type, accuracy level, and userfriendliness.These sensors could measure calcium and ammonium levels with high precision, enabling the detection of concentrations as high as 40,000 parts per million (ppm) with a margin of error of approximately 10%.The sensors could be connected to the Raspberry Pi through a USB cable or Bluetooth and programmed with the Python driver.Distilled water was used to calibrate the sensors, and they supplied an average measurement to the feedback loop.This loop then determined the nutrient values in parts per million (ppm).
The feedback loop constantly observes the surroundings to assess and regulate the nutrient levels in the system.It was connected to both the sensor system and the actuator system.Collect data on the current nutrient levels and distribute nutrients evenly if necessary.The Python program was developed to create a feedback loop, and it used specialized libraries designed for ion-selective electrodes.The sensors did not have serial communication capabilities, so a USB cable connection was employed.The debugging process was made easier by using Python for the feedback loop, and it also enabled the creation of highly object-oriented code, making future improvements easier.The reason for selecting the Raspberry Pi Model 3B+ as the program's operating platform was its ability to provide power to the sensor module using multiple USB cable connections.
In the initial stage, the feedback loop established the connections of all the systems involved, such as the ISE probes and actuators.Once the setup process was accomplished successfully, the loop gathered data from the Vernier sensors and calculated an average to assess the present nutrient level.If the measured levels consistently stayed lower than the desired levels, the Raspberry Pi's GPIO pins would signal the actuator system to release nutrients and slightly boost their concentration.Through the continuous iteration of this feedback loop, the target nutrient levels were maintained by gradually adding small quantities of nutrients.This method achieved a stable system while reducing the chances of oversaturation or undersaturation of nutrients.
In this situation, the actuator was composed of a PIC microcontroller and motor modules capable of operating at 12 V.Additionally, two 12 V stepper motors were used specifically for dispensing nutrients.The motor modules were responsible for providing power and signals for the rotation of the dispensing mechanism, and they were connected to the stepper motors.These motor modules were connected to the PIC microcontroller, which controlled the motors' rotation direction and speed.The motors were activated upon receiving a signal from the Raspberry Pi, indicating the need for additional nutrients.

III. RESULT AND DISCUSSION
Before analyzing the dataset, the first step involved removing predictors with no variation or null values.The dataset no longer includes the variable "carbonate" because it consistently had a zero value.Furthermore, it was decided to create artificial data.Since all the predictors displayed a normal distribution, the data points were not standardized or normalized.A modified Monte-Carlo technique generated data for each class, resulting in 2127 data points.Separate mean and covariance matrices were then constructed for each class.Moreover, 10,500 observations with ten predictors were obtained by generating an additional 4835 synthetic data points.The data points for both classes were generated using identical mean values and covariance matrices.The purpose of generating synthetic data was to augment the dataset for further analysis, given the moderately demanding computational needs of generating synthetic data and conducting feature selection algorithms.The algorithms were run on a system equipped with an Intel(R) Core (™) i7 processor based on the x64 architecture and 8 GB of RAM.
Hence, it was decided to move forward by incorporating the complete dataset and all the predictors.Additionally, The F-scores for each predictor were calculated using the XGBoost algorithm, as depicted in Fig. 4. The XGBoost algorithm produced results shown in Fig. 4, which identified sulfate as the most important feature with an F-score of around 340.Additionally, sodium, calcium, bicarbonate, potassium, and chloride were important features, with values ranging from 255 to 340, before moving on to the next stage of the feature selection process.The remaining predictors related to nutrients were removed the analysis.
Furthermore, the primary objective was to create an automated system to detect and control these crucial components using a feedback mechanism.As previously mentioned, ammonium and calcium levels were monitored using two sensors, and two modules were created to ensure an adequate supply of these vital nutrients in case their concentrations dropped below the recommended levels.A diagrammatic depiction of the sensing, regulation, and feedback mechanism can be seen in Fig. 5 and Fig. 6.  ---------------------------------------

DOI: xxxx
Configuration of the sensors for measuring calcium and ammonium, installing motor control units, and creating a preliminary version of the system to demonstrate nutrient dispensing using actuators.A feedback loop was established to connect the sensor and actuator modules.The machine learning model's output provided information on the month of observation and recommended nutrient levels based on a historical dataset.
The initial component of the system focused on inputting nutrient levels into the system.Ion-selective electrodes were placed in the water to measure calcium and ammonium levels, and the readings were allowed to accumulate for at least 30 minutes.The sensor data was transmitted to the Raspberry Pi 3 B+ through USB.
The nutrient and regulation system consisted of two main components: the sensor and actuator systems.These components were connected through a feedback loop.A Raspberry Pi-based Python program was responsible for reading the sensor data and managing the dispensing of nutrients.The program utilized the GDX library, which was supplied by Vernier, to establish a connection with the sensors.The sensors were called multiple times at regular intervals to obtain an average nutrient level and compare the target levels defined in Table I.The Machine Learning model used in this system was manually adjusted using a GPIO toggle switch according to the crop's growing season.Another switch allowed for pausing functions for tasks like sensor calibration to prevent unnecessary nutrient dispersal.If the nutrient levels were lower than expected, the PIC32 microcontroller would send signals to actuators through GPIO pins.The key element of the system consisted of stepper motor actuators responsible for dispensing powdered nutrients.The system would only function when the PIC32 microcontroller received a robust signal from the Raspberry Pi.After receiving the signal, the PIC32 microcontroller would interpret it and activate the stepper motors, causing them to rotate by 90 degrees.These devices rotated a wheel linked to a tank containing different nutrients.The tanks had openings at the bottom that aligned with openings in the wheel.After each device rotation, The powder was released into the water tank from a filled wheel opening, with a corresponding weight of 0.7 g.This process continued until the levels of nutrients reached a state of balance.A recommendation system employing models driven by binary data was implemented.Multiple experimental trials assessed the system's outcomes, leveraging the Internet of Things (IoT) and incorporating machine learning for a decision support system solution.Detailed results are available in Table II.Several insights can be gleaned from the data outlined in Table II.The sensors consistently operate five times in each cycle, as demonstrated by the July 2, 2022 test run.Notably, the introduction of ammonium sulfate in the initial iteration leads to a 0.21 ppm increase in the ammonium level within the fish pond solution.This process of detecting and distributing nutrients persists until the desired quantities of ammonium and calcium are reached, as specified in Table I.During each cycle, 0.7 g of ammonium sulfate and calcium nitrate are distributed, and the nutrient levels are assessed.This pattern is evident in the last three test runs conducted on August 4, 2022, September 2, 2022, October 4, 2022, and November 2, 2022.The sensing and actuation of nutrients fall short of completing all five cycles as they reach the desired nutrient concentrations earlier.The closed-loop system monitors the nutrient levels and terminates the loop before completing five cycles.
Mapping geographic data with diverse weather conditions is crucial in controlling the concentration of calcium and ammonium using IoT.This mapping provides further insights into the prevalence and effectiveness of the model in various locations of Lintangsongo.With geographic data mapping, we can observe different patterns of calcium and ammonium concentrations in each location.Factors such as climate, rainfall, and soil type can influence the concentration levels.By understanding these patterns, we can optimize the control of calcium and ammonium concentrations more effectively for each location.
Additionally, geographic data mapping helps identify areas potentially prone to high-concentration issues.We can take more specific and targeted preventive actions by highlighting these areas.This will help reduce the negative impacts caused by high concentrations of calcium and ammonium.We can collect and analyze real-time data by leveraging IoT in geographic data mapping.This enables us to take quick and responsive control measures to changes in weather conditions and the environment in each location.
Controlling the concentration of calcium and ammonium allows farmers to optimize plant nutrition more efficiently using IoT technology.Real-time monitoring allows for continuous tracking of calcium and ammonium concentrations in the soil.This capability empowers farmers to tailor fertilization and irrigation practices precisely based on the specific requirements of the plants.Commercial farming can avoid issues such as nutrient excess or deficiency that can hinder plant growth by controlling the concentration of calcium and ammonium.Plants that receive the right nutrients will have better growth, higher quality, and increased productivity.Implementing this model also contributes to agricultural sustainability.By utilizing IoT technology, fertilizers and irrigation water can be optimized, reducing waste and negative environmental impacts.Additionally, by efficiently increasing agricultural productivity, this model can help meet the increasing food demand in line with population growth.
This project marks a significant accomplishment as it effectively utilizes IoT systems and Machine Learning to improve nutrient provision in fish pond solutions.The primary advantage of utilizing a data-driven Internet of Things (IoT) system instead of a traditional fishpond system is the potential to save costs by maximizing the quantity and quality of crops.The integration of AI advancements into the system makes it feasible.Concerning regulating calcium and ammonium concentrations, IoT enables farmers to measure and control these levels in their agricultural environment precisely.This offers substantial advantages, such as enhanced crop productivity and resource conservation.Through IoT, the management of calcium and ammonium concentrations becomes more streamlined and impactful, contributing to improved agricultural yields and environmental sustainability.
Potential Industry Collaborations in Calcium and Ammonium Concentration Control using IoT, collaborating with relevant industries, has great potential to apply the findings of this research on a larger scale.Engaging with industries can extend the practical and applicable impact of this research.Potential collaborations with relevant industries may include: 1) Technology Development: Collaborating with technology industries can aid in the development of more advanced and efficient IoT devices for controlling calcium and ammonium concentrations.These industries can assist in designing and manufacturing the necessary sensors, hardware, and software.
2) Large-Scale Implementation: Collaborating with chemical or water treatment industries can enable the application of these findings on a larger scale.These industries have the infrastructure and resources required to implement these solutions at an industrial level and ensure success in controlling calcium and ammonium concentrations.Ethical aspects that need to be considered are privacy and data security.The use of IoT in the control of calcium and ammonium concentration involves the collection and use of personal data.Therefore, this research must ensure that the collected data is confidential and not misused.Additionally, the research should also consider the environmental impact of IoT usage.IoT devices require significant energy resources, especially if the devices need to operate continuously.Therefore, this research should strive for efficient and sustainable resource usage and consider the lifecycle of IoT devices to reduce environmental impact.In terms of sustainability, this research should consider its positive environmental impact.Effective control of calcium and ammonium concentration can help prevent environmental pollution and ecosystem damage.IoT can do this control more efficiently and accurately, reducing the use of hazardous chemicals and minimizing negative impacts on the environment.

IV. CONCLUSION
We conducted a trial to demonstrate how we can manipulate and regulate nutrition.We found that calcium and ammonium concentrations are crucial factors by analyzing data from the Smart Farming area of Lintangsongo.The sensors consistently function five times during each cycle, as evidenced by the July 2, 2022 test run.The introduction of ammonium sulfate in the initial iteration leads to a 0.21 ppm rise in the ammonium level within the fish pond solution.This process of identifying and dispensing nutrients continues until the desired levels of ammonium and calcium are attained.In each cycle, 0.7 g of ammonium sulfate and calcium nitrate are distributed, and the nutrient levels are evaluated.We used synthetic data and an IoT-based system to control these nutrients in a closed-loop setup.Our recommendations significantly improved fish and plant growth over a 25-day cycle of growing water spinach compared to unregulated fish pond environments.The water spinach also grew larger than those in traditional aquaponic environments.The IoT configuration reduced nutrient parameters by more than 75% in larger fish ponds.In future research, we can establish a cloud-based database to store data and host our Machine Learning model.We can increase the dimensions of the sensing and actuation modules, allowing us to control a broader spectrum of nutrients and heavy metals, creating a comprehensive system for commercial setups.We can gather additional data from different geographical locations with diverse weather conditions to strengthen our model.
Further studies should explore the effectiveness of IoT strategies in providing nutrients in challenging weather and management situations.Further exploration recommendations in controlling calcium and ammonium concentration using IoT include the development of more advanced sensors, the use of intelligent algorithms for data analysis, and integration with existing agricultural management systems.The ongoing advancement of this technology aims to enhance efficiency and productivity in delivering plant nutrition, particularly in adverse weather conditions and challenging management scenarios.

3 )
Field Testing and Validation: Collaborating with industries opens up opportunities for field testing and validating the research findings in real-world environments.Industries can provide access to water treatment facilities or relevant industrial sites to test the effectiveness and reliability of these solutions.4) Knowledge Dissemination: Through collaboration with industries, the knowledge and research findings can be more easily disseminated to practitioners and stakeholders in related industries.The Industries can help communicate the benefits and significance of IoT for controlling calcium and ammonium concentrations to other industries.This research aims to control calcium and ammonium concentrations using IoT technology, focusing on sustainability and International Journal of Artificial Intelligence & Robotics (IJAIR) E -ISSN : 2686-6269 Vol.5, No.2, 2023, pp.81-90 88 DOI: xxxx developing a model.The following details provide more information on the sustainability plan and model development: 1) Cloud-based Data Storage: Utilize cloud-based storage to store data collected from sensors.This ensures data accessibility and scalability and reduces the need for physical storage infrastructure.2) Expansion of Sensing and Actuation Modules: Involve expanding the sensing and actuation modules to enhance the accuracy and efficiency of calcium and ammonium concentration control.This includes integrating advanced sensors and actuators to improve data collection and control mechanisms.