Search strategy or Inventory Algorithm is the most dominantcomponent of a query optimizer. In this section of paper, we tried to discoverthe research on Query Optimization Techniques based on a number of OptimizationAlgorithms used in Distributed Database Queries.
SDD-1 was one of the earlyDistributed Database System. SSD-1 was designed for slow wide area network andmade use of semi joins to diminish the communication cost by producing staticunchangeable query plans without considering the horizontal or vertical datafragments of distributed database. One of the other Distributed Database Systemwas the R* system. it was used by DDB query optimizer.
It also generated static unchangeable query infaster networks but they neither employed semi joins nor handled horizontal orvertical fragmentation. One of the other Distributed Database System wasDistributed–INGRES. It was able to generate dynamic QEP at run-time on fastercommunication networks. Semi Join for the reduction of query size was not usedbut the system was able to handle horizontal fragmentation without replication.
Momentous amount of work has been carried out on producing optimal solutionsfor Join Order of the query. As we know that the Randomized strategiesdefinitely reduce the cost of the query optimization but they have a constantspace overhead issues and Randomized strategies are slower than heuristicsstrategies. On the other hand, there are deterministic Strategies. Theygenerate runtime Dynamic solutions but they have exponential time and spacecomplexity linked with them when the numbers of relations increase in thedistributed query. The next Algorithm is the Two-Phase Optimization Algorithm,it is a combination of Iterative Improvement and Simulated Annealing.
Two-Phasealgorithm performs a random walk along different solutions of search space, it producesan optimal solution but increases the space overhead of query optimization. Adynamic programming based solution procedure to diminish the sum ofcommunication cost and local processing cost by optimally determining JoinOrder and Join Approaches i.e nested-loop or sort-merge and Join Sites was alsoproposed by scientist. However, they expected data to be stored non-redundantly.Chen and Yu proposed a heuristic algorithm. Heuristic Algorithm determined theJoin Order and Join Sites with hypothesis that file copies are pre-selectedwhen multiple copies exist. A lot of research focused on the reduction phase ofthe distributed query processing and the objective is to invent a minimum costsemi join sequence that fully reduces the files referenced by the query. Thisis attained by applying join predicates in a query plan in order to reduce thesize of the intermediate query results consequently reducing the cost of theoperation.
Scientist drove an algorithm based on Dynamic Programming that recognizedbeneficial semi joins and resolute Join Order and Join Sites 18. Two newconcepts in the reduction phase of Distributed database: Gainful semi joins andpure join attributes 19 was also projected. Mi Xifieng and Fan designed a newalgorithm.
It based on the commonly used optimization algorithms. Itsignificantly reduces the amount of intermediate data and network communicationcost. It also improves the optimization efficiency 20. Since we observe thatbecause of large space complexity and complicated objective functions, DynamicProgramming is not a feasible solution optimization of Distributed Queries.Hence new set of techniques are now being discovered for solutions to DistributedQueries. These techniques are Genetic Algorithm, Ant Colony OptimizationAlgorithm and Hybrids of various Evolutionary Algorithms.
Genetic Algorithmsare a family of Computational models encouraged by nature or BiologicalEvolutions. John Holland discovered the concept of GA where randomly producedsolutions to a problem are assessed as chromosomes and these chromosomes are permittedto produce new set of individuals or children with better characteristics throughcrossover and mutations operators based on fitness function 23. The GA wasalso able to diminish the cost of the distributed query tree. A novelmeta-heuristic algorithm which is suitable for problems associated toCombinatorial Optimization is Ant Colony Algorithm (ACO). Like queryoptimization in distributed database because of its characteristics likeintelligent search techniques, global optimization, robust, distributedcomputing and ability to combine with other heuristics 26.
Three Italianscholars, Dorigo M, Colorni A and Maniezzo V first proposed ACO in 1992.AntColony Algorithm is a Bionic Optimization Algorithm encouraged by Ants thatuses probabilistic technique for solving computational problems. Ant ColonyAlgorithm was built on the mechanism of positive feedback, so it was veryrobust, it provides intelligent search and it can be used for Global OptimizationSolutions 27.
No doubt Ant Colony Optimization Algorithm have specialcharacteristics like distributed computing, robust nature and positive feedbackmechanism, but it also has some deficiencies:a) The preliminary construction needed by ACO has no systematicway of start-up.b) The conjunction speed of ACO is lesser at the beginningfor there is only a little pheromone difference on the path at that time butthe conjunction speed rises towards optimum answer because of positive feedbackmechanism. In 14, the scientist proposed a new GA based queryoptimizer called NGA.
NGA progresses the given QEP by considering Join Order,Join Site and Semi join reducers. By using new values for mutation, crossoversand outperformed Exhaustive Search, NGA was able to diminish the LocalProcessing Cost as well as Network Communication Cost. The potential of GeneticAlgorithm to optimize distributed queries on the problem of fully reducing allthe tables in a tree structured data model query was also worked upon 21. TheScientist was also proposed an Algorithm which is a combination algorithm ofGenetic Algorithm and Learning Automata 22 for making optimal Query ExecutionPlans on the basis of the Join Order Execution and Join Site Selection in DistributedDatabase. In 28, The author proposed an Algorithm which is a combination ofGA and Heuristics. It is used for solving Join Ordering Problems as TravellingSalesman Problem in Large Scale Database.
The computational experiments provedit also to be a viable solution for Distributed Systems. To find out theoptimal Query Execution Plan and Join Order by reducing Query Execution Timefor Multi-Join Query Optimization in Database, Hybrid of Genetic Algorithm andBest-Worst Ant Colony Algorithm was employed 24. The algorithm executedpositive feedback mechanism of ACO with global search ability of GA. AnotherHybrid of GA and ACO 30 was employed on Join Ordering problem in Databases(only nested loop joins considered) by incapacitating the shortcomings of boththe algorithms.
First the algorithm adopts Genetic Algorithm to give pheromoneto distribute and then it makes use of ant colony algorithm to give theprecision of the solution. The ability of Hybrid GA-ACO to search extensiveamplitude to answers for join queries in relational Database can be prolongedto optimize the join queries in distributed database where the most importantchallenge is to produce the best QEP for optimal results. Another Scientist Rhoet.al also proposed a Genetic Algorithm based solution method that quickly determineoptimal QEP 13. Identification of Copy which means redundancy of data, Identificationof Beneficial Semi joins, Selection of Join Site, Execution of Join Order, andLocal Processing Cost and Communication Cost included in this Model. HybridGA—ACO Tansel et.al.
also proposed an Algorithm which is acombination of Dynamic Programming and Ant Colony Optimization. It is known as DP-ACO(Dynamic Programming- Ant Colony Optimization) Algorithm. It is used for theoptimization of multi way chain equijoin queries in Distributed DatabaseEnvironment. As the size of the relations and number of joinsincreases in the query, Hence Dynamic Programming suffers from long executiontimes and very large memory requirements. It has been proved that DP-ACO is a feasiblesolution by producing good execution plans with 15 ways join queries withinlimited time and very limited memory space.
Another advantage of DP-ACO is thatit can be easily adapted to existing query optimizers that commonly use DP basedalgorithms.Due to the use of the properties of Ant ColonyAlgorithm and Particle Swarm Optimization, Hybrid algorithm is proposed tosolve the traveling salesman problems 28. The algorithm first adopts statisticstechnique through this technique it gets several initial better solutions andin accordance with them, gives information pheromone to distribute. Then itmakes use of the ant colony algorithm to get several solutions through informationpheromone accumulation and renewal. Lastly, through the use of across andmutation operation of particle swarm optimization, the effective solutions areobtained.
Hence it has been proved that The Hybrid Algorithm of ACO-PSO is mucheffective. With the increasing number of relations in a query, much use ofmemory and processing is needed. Almost all Commercial Applications involvedata from various sites.
DDBMS is now being used as a standard DBMS in allcommercial applications. The path designs the behaviour of ants is pragmatic todirect the ants towards the unknown areas of search space and visit all thenodes without knowing the graphic topology for production of optimal solutionsof distributed database queries. First these ants estimate the running times ofthe execution plans of the given query and then provide rapid, high performanceand optimal results in a cost-effective manner. In Distributed DatabaseManagement System, The Search strategy adopted by the Query Optimizer can helpto diminish the query execution time and the cost incurred on the query andhence increases performance of a query by selecting the best Query ExecutionPlan. It has been proved that the implementations of these probabilisticalgorithms generate viable solutions when the size of the query and the numberof joins in the query grows.
I. Conclusion It is noted that Optimization is much more thantransformations and query equivalence. The infrastructure for optimization ismomentous. It is hard to design an effective and correct SQL transformations,developing a vigorous cost metric is indefinable, and building an extensibleenumeration architecture is a momentous responsibility.
Despite many years ofwork, significant open problems are remaining. However, it is compulsory tounderstand the existing engineering frame work for making an effective involvementto the area of query optimization. My Research work is showing that therealization of hybrids of Ant Colony Optimization algorithm in the direction ofthe optimization of distributed database queries is still a novice field. AntColony Optimization Algorithm is used to solve various optimization problemsand research in the creation and implementation of hybrids of Ant ColonyOptimization Algorithm is still in progress. It also notes that the outcomesproved that hybrids of ACO are effective and feasible in optimization problems.Research work is showing that when the size of the query and the number ofjoins in the query are growing, the implementations of these probabilisticalgorithms have proved to generate feasible solutions in distributed as well asrelational database management system. Especially when the size and complexityof the relations increases with many parameters influencing the query, there isstill a lot of opportunity to produce optimized solutions and to enhance searchstrategies using hybrids of ACO for such type of Queries in DistributedDatabase.