AMALGAMATION OF DATA MININGAND IMAGE PROCESSING TECHNIQUES IN IMAGE RETRIEVAL K. Kamala Kannan1,Dr. G. Anandharaj2AssistantProfessor, PG & Research Department of Computer Science, Sun Arts andScience College, Vettavalam Road, Keeranoor Village, Rajapalayam Post,Tiruvannamalai-606 7551AssistantProfessor & Head, PG & Research Department of Computer Science,Adhiparasakthi College of Arts and Science (Autonomous),G.B.Nagar, Kalavai – 632 506.2 Abstract: In the field of Image Processing, Mining is a new technique.
Imagemining is the extraction of latent data, amalgamation of image data and surpluspatterns which are invisible in images. This involves Image Processing, DataMining etc., without any prior notice, it can create all the important patterns.This research paper focuses on the extracting of knowledge from a hugedatabase. Information is transfering message through direct or indirecttechnique. These techniques are neural network, clustering, correlation andassociation.
This writing explains how data mining is used in the fields oftelecommunication, fraud detection, manufacturing, marketing and educationsector. Through this technique, we canuse size, texture and dominant colour factors of an image. Gray LevelCo-Occurance Matrix(GLCM) feature determines the texture of an image. Textureand colour features are normalized. Data Mining predicts the earth quake andBioinformatics. It also analyses agricultural fields. This is used in cloudcomputing to retrieve meaningful information from data storage.
Keywords: Data Mining, Feature Extraction, Image Retrieval, Clustering, knowledge discoverydatabase, Gray Level Co-Occurance Matrix and Cloud Computing,. I. INTRODUCTIONDATAMINING In the actual world,massive amount of data is found in the education, industry, medical and manyother branches. These data may give knowledge and information for decisionmaking. For instance, we can detect the drop out students in any college oruniversity and discover the sales details in the shopping databases.
These datacan be analysed, shortened or understood to meet the challenges.1 Data Miningis the significant idea for data analysis, discovering amazing patterns from thelarge data, knowing the data stored in various databases, like warehouse, WorldWide Web and external sources. The pattern is to understand the unknown validand the potential data. Data Mining is a kind of sorting techniques used toextract the hidden patterns. Their goals are past recovery of data orinformation. They help us identify the hidden patterns and reduce the level ofcomplexity. They also save time.2 Data Mining is sometimes treated as KnowledgeDiscovery in Database(KDD).
3 KDD process consists of following steps shownbelow. Fig.1. Knowledge Data Mining · Selection: Select data from various resources where operation to beperformed.
· Preprocessing: Also known as data cleaning in which remove the unwanted data.· Transformation: Transform/consolidate into a new format for processing.· Data mining: Identify the desire result.· Interpretation/Evaluation: Interpret the result/query to give meaningful information.
Various algorithms and techniqueslike Classification, Clustering, Regression, Artificial Intelligence, NeuralNetworks, Association Rules, Decision Trees, Genetic Algorithm, Nearest Neighbor methodetc., are meant for knowledge discoveryfrom databases5. The mainobjective of this paper learns about the data mining and the rest of this Section 2 discusses data miningmodels and techniques. Section 3 explores the application of data mining.Finally, we conclude the paper in Section 4. IMAGEMINING This is to search anddiscover the valid and hidden data largely. The above figure (Fig.
1) exhibitsthe different process of Image mining system. Some other methods too used togather knowledge. They are, Image Retrieval, Data Mining, Image Processing andArtificial Intelligence. The methods permit Image mining to follow twodifferent approaches. The first is to extract from databases or images. Thesecond is to mine the alphanumeric data or images. Here the feature extractionreduces dimensionally. If the input data is more to be accessed, it is doubtedas notoriously repeated, then the input data will be changed into a reduced setof features.
It simplifies the quantity of resources needed to locate a lot ofdata clearly. Many other features are used in Image Retrieval system. The mostfamous are color features, shape features and texture ones. Fig.
2. Image Mining Process II. FEATURE EXTRACTION Generally the feature extraction has got a majorproblem in detecting the objects but the Genetic Algorithm (GA) gives an easycommon and powerful framework for detecting the better sets of features.
Therefore it leads to lower detection error rates. Zehange Sun et al., 13debate to carry out the method using Principle Component Analysis (PCA) andClassifications using Support Vector Machines (SVMs). Hence GA can remove thedetection and unwanted features. The methods have two difficult objectsdetection problems – vehicle detection and face detection. They boost theperformance of both systems using SVMs for classification. According to PatriciaG. Foschi 10, feature selection and extraction is the pre-processing step ofImage Mining.
It is a critical step. Mining from images is to extract patternsand derive the images. Its aim is to identify the best ones. In the views ofBroun, Ross A et al.
, 3 discuss the need of digital images forensics whichunderpin the design of mining system. It can be trained by a hierarchical SVMsto detect objects. Image mining generally deals with the study and developmentof new technologies. It is not only to rediscover relevant images; but also toinnovate the image patterns. Fernandez. J et al.
,4 exhibit how a naturalsource of parallelism can be utilized to reduce the cost of mining. The imagesfrom the database are first pre-processed to improve their quality. Theyundergo several transformations to generate the important features from theimages with the help of generated features.
Mining can be done using datamining techniques to discover important patterns. A. Color Features Image Mining givesunique characteristics due to the richness of the data shown. Its evaluationresult needs the performance parameters. Aura Conci et al.,2 refer anevaluation for comparing the function by colour.
Experiments with colouraffinity mining by quantization on colour space and measures of similarityillustrate the scheme. Lukaz Kobylinski and Krzysztof Walczak 9 proposed afast and effective method of indexing image metadatabases. The index is made totheir colour characteristics Binary Thresholded Histogram (BTH), a colorfeature description method to create a metadatabase index. The BTH is proved tobe a sufficient method to show the characteristics of image databases.Ji Zhang, Wynne Hsn andMong Li Lee 8 recommended an effective information driven framework for imagemining.
They divided four levels of information: Pixel Level, Object Level,Semantic Concept Level and Pattern and Knowledge Level. B. Texture Feature The Human percepts theimages which is based on the color histogram texture. The Human Neurons holdthe 1012 of information; the Human brain knows everything with the sensoryorgans like eye which transfers the image to the brain which interprets theimage.
According to Rajshree S. Dubey et al., 12 the mining images are basedon the color Histogram, texture of Image. Janani.
M and Dr. Manicka Chezian. R 7refer that Image Mining is a pivotal method used to mine knowledge from Images.This is based on the content based Image Retrieval system. Color, texture,pattern and shape of objects are the basis of visual content. C.
Shape Feature According to PeterStanchev 11 a new method is proposed on extraction of low level color, shapeand texture into high level semantic features with the help of an image miningmethod. Johannes Itten’s theory is offered for getting high level shape features.Harini D.N.D and Dr. Lalitha Bhaskari. D 5 argue that Image Retrieval issimply to reveal out low level pixel representation to recognize high levelimage objects and their relationships 22,23. Fig.
3.Content Based Image Retrieval System Architecture III. METHODOLOGY The gray-level co-occurrence matrix (GLCM) considersthe relationship of pixels. This calculates how often the pairs of pairs ofpixel with specific values and in a specified spatial relationship in an image. Understanding a Gray-LevelCo-Occurrence Matrix We use the graycomatrix functionto make a GLCM. Itcreates GLCM by calculating how often a pixel with the intensity (Gray Level)value i occurs in a default. Each element (i,j) is the sum of the pixel withvalue i occurred in the specified spatial relationship to a pixel with value jin the input image. Graycomatix uses scaling to reduce the number of intensityvalues.
The Num levels and the Gray Limits control this scaling of gray level.Let us understand the process through the following diagram. The followingfigures explains how graycomatrix calculates the first three values in a GLCM.To illustrate, the following figure show graycomatrixcalculates the first three values in a GLCM. In the output GLCM, element (1,1)contains the value 1 because there is only one instance in the input imagewhere two horizontally adjacent pixels have the values 1 and 1, respectively.glcm(1,2) contains the value 2 because there are two instances where two horizontally adjacent pixels have the values 1 and 2. Element (1,3) in the GLCM has the value 0 because thereare no instances of two horizontally adjacent pixels with the values and 3.graycomatrix continues processing theinput image, scanning the image for other pixel pairs (i,j) and recording thesums in the corresponding elements of the GLCM.
Fig.4. Process used to create the GLCM Specify Offset Used in GLCM Calculation By default, a single GLCM is created by thegraycomatrix with offset as two horizontally adjacent pixels. A single GLCMmight not adequate to describe the texture features of the input image. Asingle offset might not be sensitive to texture. So graycomatrix can makemultiple GLCM for a one input image. The offsets produce multiple GLCM tograycomatrix function. They define mainly pixel relationships of differentdirections (Horizontal, Vertical and Two diagonals) and four distances.
In thisway, the input image is shown by 16 GLCMs. When we calculate statistics fromthese GLCMs, we can take the average. Weighted Euclidean Distance The standardized Euclidean distance between twoJ-dimensional vectors can be written as: Where sj is the samplestandard deviation of the j-thvariable. Notice that we need not subtract the j-th mean from xjand yj because they will just cancel out in the differencing. Now(1.1) can be rewritten in the following equivalent way: Where wj = 1/sj2is the inverse ofthe j-th variance. wj as a weight attached to the j-th variable: inother words IV. DATA MINING TECHNIQUES Data mining is gathering relevant information fromdisordered data.
So it helps achieve specific objectives. Its aim is simplyeither to create a descriptive model or a predictive model. A descriptionrefers the main characteristics and a prediction allows the data miner topredict an unknown (often future) values of specific or the target variable.7Simply their goals are to use a variety of data mining techniques as shown inthe figure 58. Fig.5.
Data Mining Models 3.1 Classification: It is based on discrete andunordered. This is based on the desired output. It classifies the data based onthe training set and values. These goals achieve using a decision tree, neuralnetwork and classification rule(If-Then). For instance we can apply this ruleon the past record of the students who left for university and evaluate them.This helps us identify the performance of the students. 3.
2 Regression: It is utilized to map a datapart to a real valued prediction variable. 8 It can be used for predictiontoo. Here, the target values are known, for example, we can predict the childbehavior based on family history. 3.3 Time Series Analysis: This process used the statistical techniques to model. Itexplains a time dependent series of data points. It is a method of using amodel to create prediction (forecasts) for future happenings based on knownpast events.9 Stock market is a good example.
3.4 Prediction: This techniques discover the relationship between independentvariables and dependent and independent variable.4 This model is based oncontinuous or organized value. 3.5 Clustering: It is a gathering of some data objects. Another cluster isdissimilar object.
It generally finds out the similarities between the data ofthe same qualities. This is based on the unsupervised learning. For instance,city planning, image processing, pattern recognition, etc., 3.6 Summarization: This is abstraction of data. Itis formed as a set of related task.
It provides an overview of data. For example,running race for long distance can be shortened total minutes or seconds.Association rule is another famous technique to mine the data. It find the mostfrequent item set. It discovers the patterns in data of relationships betweenitems in the same transaction. It is also referred as “relation technique” asit relates the sets/items.6 3.
7 Sequence Discovery: This sequence discloses the relationships among data.8 It is aset of object associated with its own timeline of events. Natural disaster andanalysis of DNA sequence and scientific experiments are best examples. V. DATA MINING APPLICATIONS Data mining is applied for fastaccess of data and valid information from a huge amount of data. Its main areaincludes marketing, fraud detection, finance, telecommunication, educationsector, medical field, etc.
, some of the main applications are categorizedbelow. 4.1 Data Mining in Education Sector: This is used in new emerging field called “EducationData Mining”. This helps know performance of students, dropouts, students’behaviour and their choice of different courses. It is highly used in higher educationsector.10 4.2 Data Mining in Banking and Finance: It is used largely in the Banking and Financial market.11It mines the credit card fraud, estimate risk and trend and profitability.
Infinancial markets, it plays as a neural networks in stock forecasting priceprediction etc., 4.3 Data Mining in Market Basket Analysis: Thesemethodologies are based on the shopping database. Their goal is to find out theproducts and the customers purchase. The shopping can utilize this informationby putting these products more visible and accessible for customers.12 4.4 Data Mining in Earthquake Prediction: This predicts the earthquake from the satellite maps.The quake is the sudden movement of the Earth’s crust caused by the abruptrelease of stress of a geologic fault in the interior.
This is done in twotypes of prediction: forecasts (months to years in advance) and short-termpredictions (hours or days in advance) 13. 4.5 Data Mining in Bioinformatics: Bioinformatics created a huge amount of biologicaldata. This is a new field of inquiry to generate and integrate large quantitiesof proteomic, genomic and other data.
4 4.6 Data Mining in Telecommunication: This field has large amount of data consisting of hugecustomers. So it is need to mine the data to limit the fraudulent, improvetheir marketing efforts and better management of networks.4 4.
7 Data Mining in Agriculture: This is mainly used toproduce more crop yields. This is done in four parameters namely year,rainfall, production and area of sowing. It is used to improve yield from theprediction data.
It can be promoted by using data mining techniques such as KMeans, K nearest neighbor (KNN), Artificial Neural Network and support vectormachine (SVM) 14,20. 4.8 Data Mining in Cloud Computing: This techniqueis used in cloud computing. Through cloud computing, the mining technique willpermit the users to retrieve correct information from the data warehouse. Thislessens the cost of infrastructure and storage.15, 21 It utilizes theinternet services to relay on clouds of servers to manage tasks.
Thistechniques in cloud computing performs efficient, reliable and secure servicesfor their users. VI. CONCLUSION This article presents the expansion of Image mining.
Itgives a research on the image techniques measured earlier. This review finds onchallenges and accountability of different prospects. This mainly focuses ondata mining techniques in various projects. Its aim is to get information bycurrent data. People from different fields can utilize association, clustering,prediction and classification techniques.
REFERENCES 1. Janani M and Dr. ManickaChezian. R, “A Survey On Content BasedImage Retrieval System”, International Journal of Advanced Research in Computer Engineering & Technology,Volume 1, Issue 5, pp 266, July 2012. 2. Aboli W.Hole Prabhakar L. Ramteke, “Design and Implementation of Content Based Image Retrieval Using Data Miningand Image Processing Techniques”International Journal of Advance Research in Computer Science andManagement Studies Volume 3, Issue 3, March 2015 pg.
219-224 3. Anil K. Jain and AdityaVailaya, “ImageRetrieval using color and shape”, In SecondAsian Conference on Computer Vision, pp 5-8. 1995.
4. Harini. D. N. D and Dr. LalithaBhaskari. D,”Image Mining Issues and Methods Related toImage Retrieval System”, International Journal of Advanced Research inComputer Science, Volume 2, No.
4, 2011. 5. Hiremath. P. S and JagadeeshPujari, “Content Based Image Retrieval based on Color, Texture and Shape features using Imageand its complement”, International Journal of Computer Science and Security,Volume (1) : Issue (4). 6. Brown, Ross A.
, Pham, Binh L., and De Vel,Olivier Y, “Design of a Digital ForensicsImage Mining System”, in Knowledge Based Intelligent Information and Engineering Systems, pp 395-404, Springer Berlin Heidelberg, 2005. 7. Rajshree S.
Dubey, NiketBhargava andRajnishChoubey, “Image Mining using Content Based Image Retrieval System”, InternationalJournal on Computer Science and Engineering, Vol. 02, No. 07, 2353-2356, 2010. 8. Aura Conci, Everest Mathias M. M.
Castro, “Image mining byColor Content”, In Proceedings of 2001 ACM International Conferenceon Software Engineering and Knowledge Engineering (SEKE), Buenos Aires,Argentina Jun 13-15, 2001. 9. Er. RimmyChuchra “Use of Data Mining Techniques for the Evaluation of Student Performance: A Case Study” International Journal ofComputer Science and Management Research Vol. 01, Issue 03 October 2012. 10. Ji Zhang, Wynne Hsuand Mong Li Lee, “An Information-Driven Framework for Image Mining” Database and Expert Systems Applications inComputer Science, pp 232 – 242, Springer Berlin Heidelberg, 2001.
11. LiorRokach andOdedMaimon, “Data Mining with Decision Trees: Theory and Applications (Seriesin Machine Perception andArtificial Intelligence)”, ISBN: 981-2771-719,World Scientific Publishing Company, 2008. 12. Venkatadri.M and Lokanatha C. Reddy ,”Acomparative study on decision treeclassification algorithm in data mining” , International Journal OfComputer Applications In Engineering ,Technology And Sciences (IJCAETS), Vol.-2 ,no.- 2 , pp.
24- 29 , Sept 2010. 13. XingquanZhu, Ian Davidson, “Knowledge Discovery and Data Mining: Challenges and Realities”, ISBN 978-1-59904-252, Hershey, New York, 2007.
14. Zhao,Kaidiand Liu, Bing, Tirpark, Thomas M. and Weimin, Xiao, “A Visual Data MiningFramework for Convenient Identification of UsefulKnowledge”, ICDM ’05 Proceedings of the Fifth IEEE International Conference on Data Mining, vol.-1, no.-1,pp.
-530-537,Dec 2005. 15. Li Lin, Longbing Cao, Jiaqi Wang, Chengqi Zhang, “The Applications of GeneticAlgorithms in Stock Market Data Mining Optimisation”,Proceedings of Fifth International Conference on Data Mining, Text Mining and their Business Applications, pp-593-604,sept2005.
16. V.Gudivada and V. Raghavan. Content-based image retrieval systems.
IEEEComputer, 28(9):18-22, September 1995. 17. J.Han and M. Kamber. “Data Mining, Concepts and Techniques”, Morgan Kaufmann, 2000. 18.
Nikita Jain, VishalSrivastava “DATA MINING TECHNIQUES: A SURVEYPAPER” IJRET: International Journal of Research in Engineering andTechnology, Volume: 02 Issue: 11 | Nov-2013. 19. Peter Stanchev, “Image Mining for Image Retrieval”, In Proceedings of the IASTED Conference on ComputerScience and Technology, pp 214-218, 2003. 20. Jeyakumar, Balajee, MASaleemDurai, and Daphne Lopez. “Case Studies in Amalgamationof Deep Learning and Big Data.
” HCI Challenges and Privacy Preservation in Big Data Security, pp 159, 2017. 21. Ranjith, D., J.
Balajee, and C. Kumar. “In premises of cloud computing and models.
” International Journal of Pharmacy and Technology 8, no. 3,pp. 4685-4695, 2016. 22. Kamalakannan,S.
“G., Balajee, J., SrinivasaRaghavan., “Superior content-based videoretrieval system according toquery image”.” InternationalJournal of Applied Engineering Research 10, no. 3, pp7951-7957,2015.
23. Rajeshwari,A., T. C. Prathna, J. Balajee, N. Chandrasekaran, A.
B. Mandal, and A. Mukherjee.”Computational approach for particle size measurement of silvernanoparticle from electron microscopic image.
” Int. J. Pharm. Pharm. Sci.
5,no. 2 pp619, 2013.