TOPICAbstract: framework for tag based Smart Content service composition

TOPICAbstract: Service composition is now very popular because the semantic web service composition presents the features that an individual service cannot present. Smart Content is same as Semantic Web Service Composition (SWSC) which is defined as a process for synchronization of words when client is requested to search. Thus, most of the time semantic web service is not able to satisfy client request by single service component. So, we need Advance Aggregation and Navigation Technology using SCAN technology for generating aggregation of service components according to the requested task. The main objective of the project is to provide more efficient approaches for composition on semantic mobile services. The project is also provide the complete theoretically classification of the approaches. Thesis presents framework for tag based Smart Content service composition and its various models also. Here we try to search heterogeneous contents present in mobile’s SD Card but not discoverable due to the variety of contents and various file types. We try to develop a tag based repository that can be searched for desired keyword which finds file(s) having that keyword. The difficulty to achieve the result is due to large number of file types as for each extension we have to code differently. This thesis presents an implementation for Smart Content Tag Repository and mainly focuses Android Mobiles. We have also presented a comparative study of various smart content aggregation and navigation technology on various platforms and finally thesis presents an implementation of this technology on a widely used Android Operating System for Smart Phones.Keywords:Tag, Smart Content, Aggregation, SCAN, Paper Submission.The Smart ContentState of the art of the information management system, now days are going day by day more difficult to find the information due to growing amount of information. To solve this problem, we should have new technologies and tools for smart working with contents that strongly focused on semantic contents.Smart Content is content enhanced to be fit-to-purpose, meaning content that is organized and structured for customer tasks and needs (not just for the production, packaging and distribution of physical documents), suitable for in-depth analytics (not just for “there-and-gone” consumption), and is produced by turning content into data using semantic technologies (not just taxonomies). This bridge is realized by the application of analytics and semantics to:a) For producing of new product, new services with contemporary user experience.b) By this approach we get answer not result.c) To realize new dynamic monetization models.An integrated approach and content aggregationSmart content approach is based on how to deal with complex information and files on which searching is most difficult problem. Thus, there is a synthesis of few different techniques aimed at specific aspects of the problem. SCAN erases the boundaries put on information by different storage systems. It links the information items from multiple sources and of different formats into a seamless digital library, where they can be categorized, annotated navigated and searched by a uniform way. This provides a homogeneous searchable and explore semantic information space where files, web-pages, emails, other content items are equal documents organized by their natural semantic properties.TaggingTagging is the easier way to organize the information similar to popular services like delicious or flicker. Tag makes the document simple also process the documents automatically relevant to the user.Text analysis and concept extractionSCAN brings the power of automated text mining (2) and natural language processing to discover document semantics by extracting the valuable terms and their patterns from the document content.Text analysis is underlying the advanced semantic search functionality like finding the documents by similarity (pattern search) and associative guided search based on system suggestions.Metadata and facet navigationSCAN provides a rich set of metadata properties associated with the documents, including document title, description/annotation, author, creation date and others. The properties are set automatically on a document adding and can be quickly edited later. Metadata properties can be used in the structured search to find the documents matching specified criteria. In addition, some properties (e.g., author or creation date) may be used as navigation facets to browse the documents collection.SearchAdvanced semantic search techniques are driven by text analysis functionality. After any search request is performed, the results are analyzed to build a “see also” terms list for associative search. It provides step-by-step exploration of an area of interest following the system recommendations. Pattern search explores semantic compatibilities between different documents and allows finding the documents conceptually similar to a subject of a given one.Definitions of Smart ContentSmart Content is information, typically originating in unstructured formats that are findable, reusable, more profitable for the producer, and more useful for the consumer. All major search engines, including Google and Yahoo!, are moving aggressively trying to capture structured data. This isn’t exactly a surprise because it provides tremendous opportunity. For example, wouldn’t it be great to be able to search all the world’s products or jobs from a single site with a single interface? Most exciting is the role content management systems like Drupal play in getting to this point.1. Smart Content is a common language creating a product bridge between data analytics business and ‘expert opinion’ (publishing) business. Smart Content is content enhanced to be fit-to-purpose, meaning content that is organized and structured for customer tasks and needs, suitable for in-depth analytics, and is produced by turning content into data using semantic technologies.2. Smart Content can be seen as the “product bridge” between “meaningful use” data analytics business and “actionable expert opinion” business. The data supplying the “what” and text supplying the “why”. This bridge is realized by the application of analytics and semantics to:a) Create new products, new services with contemporary user experience.b) Deliver an answer not a result (drive better usability of existing products, increase discoverability and access of content, provide unified access / interoperability, integration with search), andc) To realize new dynamic monetization models.3. Smart Content organizes itself automatically depending on your context, goals, and workflow. It stands ready in the background to respond when you want to get updated, ask a question, or complete a task. Smart Content also is transparent, allowing you to see why it’s doing what it’s doing. 4. The phrase “smart content” is interesting in how it re-frames match-making between people and content. Conventional thinking locates intelligence and knowledge with the user, not the content. The user gets the content/information he needs because he knows what is relevant to the task or context or workflow or whatever.5. Edelman Strategy One and EVP Natasha Fogel says:”In my opinion, smart content is in the eye of the beholder. In other words, the definition of what is smart may be different from one ‘customer’ to another depending on why it’s sought. It’s vital to collect, synthesize and analyse this relevant content using primary and secondary research methods and business analytics that so that we have the most complete picture of what’s important to stakeholders and then can ultimately measure how we take action. While it’s intriguing to watch this landscape unfold, it presents a multitude of opportunities including but certainly not limited to measurement, authenticity, data normalization and schemas. It’s guaranteed to keep us busy for a long time.”6. Dries Buytaert created Drupal, an open-source content-management system that facilitates semantic publishing. According to Dries:”All major search engines, including Google and Yahoo!, are moving aggressively trying to capture structured data. This isn’t exactly a surprise because it provides tremendous opportunity. For example, wouldn’t it be great to be able to search all the world’s products or jobs from a single site with a single interface? I’d think so. I also think this is at the heart of what it really means to have “smart content.” The truth is, it is waiting to happen with the help of the semantic web; we just have to connect the dots and help people adopt it. Most exciting is the role content management systems like Drupal play in getting to this point.”7. According to Gilbane Group’s study – Smart Content in the Enterprise:The XML content that is driving contemporary enterprise applications is:• Granular at the appropriate level• Semantically rich• Useful across applications• Meaningful for collaborative interactionIn this dissertation, term “smart content” is used to define this class of content. Smart content is a natural evolution of XML structured content, delivering richer, value-added functionality.Some definitions of “smart content” has been given to reflect diverse perspectives and touch on the use of analytics, semantics, and flexible information storage and publishing to help information producers and consumers get more out of online, social, and enterprise content.Descriptions of the additional capabilities needed for smart content applications follow.Content Enrichment / Metadata Management: Once descriptive metadata taxonomy is created or adopted; its use for content enrichment will depend on tools for analyzing and/or applying the metadata. These can be manual dialogs, automated scripts and crawlers, or a combination of approaches. Automated scripts can be created to interrogate the content to determine what it is about and to extract key information for use as metadata.Component Discovery / Assembly: Once data has been enriched, tools for searching and selecting content based on the enrichment criteria will enable more precise discovery and access. Search mechanisms can use metadata to improve search results compared to full text searching. Information architects and organizers of content can use smart searching to discover what content exists, and what still needs to be developed to proactively manage and curate the content.Distributed Collaboration / Social Publishing: Componentized information lends itself to a more granular update and maintenance process, enabling several users to simultaneously access topics that may appear in a single deliverable form to reduce schedules.Federated Content Management / Access: Smart content solutions can integrate content without duplicating it in multiple places, rather accessing it across the network in the original storage repository.Proposed System ArchitectureHere’s the basic idea.  Content authors don’t want to worry about the technical details about how to structure and validate XML or HTML documents, they just want to be able to focus on the task of creating great content.  Authors also need to be able to preview the tour and tour stops to verify that the correct media is being used and that navigational connection are correct, etc.This could be done by deploying to the final device and checking there, but in reality, that process is a bit time-consuming and will slow the content development process down considerably.For the iPhone, museums might choose to either build an iPhone App which can interpret the TourML schema and reference media files local to the device, or they may choose to use webkitand dashcode based tools such as iWebKit and JQTouch to take advantage of the iPhones browser based user interface support.Specifics about user interface behavior, the look and feel, graphics design of the tour, and the need for wireless access are all the responsibility of the application layer.  Museums will likely make many different choices about how this application layer will work, but Museums-To-Go could provide a few sample application instances that understand and parse the TourML specification.  Likely candidates for these sample applications include: iPhone App, Mobile Web based app, Dashcode based iPhone app, etc.AUTHORINGLAYER         Create                                      Tour                                     Stops           Tour Author                                                      Render to XMLCONTENT LAYER     Media                                Format Files                                   Tour XML FilesSMART LAYER Render to HTML3. MethodologyProblem Statement-If one has consumed number of hours to find the required content sometimes ago, then why he need to do same as he is already having that file in his repository or memory card. How to fetch the required file without knowing:? What was the name of file?? What type of file was that?? What was the date of downloading or creating that file?? What was the heading of file?? At the time of creation what tags one has associated with file?  Issuesrelated to fetching or mining the required information is more complex in nature due to the heterogeneous nature of information.  Among many issues some major issues are listed below:? Choosing which SDK (Software Development Kit) will help me to provide appropriate platform.? Choosing which coding language will help me to reach maximum Smart phones.? Choosing which algorithm will suit the best.? Choosing which sort of files are to be taken into consideration.Methodology UsedManual Tagging: It is used to manage files through tags at the time of reviewing the downloaded file, if one founds the file as useful then one can associate keywords of the file as tags, so that later it can be searched.Searching among Pre-Tagged Files: Further times when one wants to search the file with some keyword, one can directly pass the keyword to application which then first search the required file with given keyword or will ask for auto-tagging.Auto tagging: Failing Manual tagging and Searching, one would like to tag all the files in the memory card of Smartphone for finding the required file or information which may take several minutes depending on the number of files in memory card. After tagging, all tags are stored in repository which can be searched later.Algorithm Development or FormulationAn effective searching algorithm is used to tag files using the occurrences of the words in the specific file. Top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriority, EM, PageRank, adaBoost, kNN, Naive Bayes, and CART. Derived algorithm as follows:ALGORITHM1. Put the words you want to search for in a hash table, with the words as keys and the values initialized to 0.2. Iterate over the words of the text, each time checking to see if the word is a key in the hash table, if so, then increment the value for that key.3. Iterate over the hash table finding the values which are non-zero, the keys for these are matched words, the values are the counts.Runs in O (N+M) where N is the number of words you’re searching for and M is the number of words you’re searching through.Algorithm Implementation1. Open file in read mode.2. Start reading the words from Start-Of-File (SOF) excluding the common words (like is, are, am, the, then, has, have, had, will, shall, and, or, etc.) and conjunctions.3. Store the keywords in Dictionary while End-Of-File (EOF) is reached with the following conditions:a. If key already exists increment the value by 1b. If key does not exist initialize value to 04. List the keys having top 10 values.5. Store these keys in project repository as tags.6. Save the file location corresponding to keys.Model Used Work FlowFigure 2.1 Tag Search ProcessTagging multiple files manually requires repetition of process.Figure 2.2 Auto-tag ProcessFigure Manual-tag Process4. ConclusionsIt can be concluded from various literature survey and case studies that the work can be integrated with semantic web, ontology, cloud, and other business structures to manage and fetch the documents on the go and there is lesser need of documentation at the end. Some future scope has been discussed below:4.1 PLUG-INS FOR NEW DOCUMENT FORMATSSCAN platform can be easily extended with plug-ins for new document formats, document locations (RSS feeds, web-sites, e-mail, etc.) and language analyzers. Whole new areas of functionality can be added with user interface extensions. An example of such extensions is the plug-in to browse the repository with a calendar (grouping the documents by their creation dates).Another text analysis application may be searching the documents similar to a specific one (search by pattern).What’s on blog now a days gives the idea of the research works going on this domain and integrating other domains and Content Technologies track? Some of those topics cover: standards, integration, content migration, search, open source, and relevant consumer technologies. The future work may contain: 1. Multi-lingual technologies and applications, XML, standards, integration, content migration, hand held devices, searching, open source, SaaS, semantic technologies, social software, SharePoint, XBRL, and relevant consumer technologies.2. Business-oriented applications of the SCAN technology; design and development of the server version for small and medium enterprise networks.3. Integration with Cloud Computing; by implementing and integrating their ideas and works with content technology.4.2 EXTENSIBILITY TO CLOUD APPLICATIONDigital marketers, senior IT folks and business analysts faced with the decision to deploy these technologies outside the server room. In it set out to answer the following questions:(9)• What do we mean by the cloud? There is a great deal of hype, sales, and marketing messaging around “the cloud.” We explore what it really is and the opportunities it represents for digital marketers. • What are the deployment options when working with a cloud platform partner? The decision around deploying to the cloud is not always a binary choice to host in the server room or not. We look at possible solution architecture options and the benefits of each.• What do organizations need to look for in a WEM solution in the cloud? If deploying into the cloud is an attractive option for an organization, we consider the key attributes that organizations should build into their selection criteria when choosing a solution.Outsell’s Content Technology Consulting services leverage Outsell’s information technology consulting for:• Content Strategy Development• Technology Assessments• Content Technology Vendor Selection and Evaluation• Product IT and Technology Function Benchmarks and Best PracticesCloud Content Management adds a low-cost, complementary layer to existing ECM systems that enables workers to better collaborate securely around active content. In the future, Cloud Content Management functionality could potentially expand to also address the capture and archival stages of the content lifecycle. On the front end of the content lifecycle, the ability to create digital documents using software hosted in the cloud has existed for several years now (e.g. Google and ZohoDocs). More recently, technologies have been introduced that allow multiple authors to collaborate simultaneously on the same document (i.e. Google Wave.) In the future, Cloud Content Management platforms could provide services that would allow users to scan documents, shoot photos, and record audio or video then tag and send the resulting digital files to a managed, active repository hosted in the cloud. In fact, these technologies already exist as mobile applications on smartphones; the next logical step would be to provide that functionality on other computing devices. (10)At the back end of the content lifecycle, Cloud Content Management’s existing search tools could be used for reliable eDiscovery of content in the active repository that has been locked to further editing or sharing by the file’s owner. Perhaps that locking activity could be automated by adding system rules based on established corporate records declaration policies. Locked files could also be deleted automatically from the Cloud Content Management system based on records retention requirements. Those rules could be applied manually when the content was created or uploaded into the repository. A comprehensive approach such as this would eliminate the need to have an archival area into which files designated as records are moved and, eventually, removed and destroyed. Many of these potential expansions of Cloud Content Management functionality to address content capture and retention/disposal requirements would be especially welcomed by small and medium organizations, who need to address those pieces of the content lifecycle, but in a lightweight manner. Those organizations do not have the resources to invest in the acquisition, deployment, and operation of an Enterprise Content Management system, but could be very well served by a less expensive, more agile Cloud Content Management platform. There is, of course, the potential for Cloud Content Management solutions to provide even more process-based control of active content. Common content collaboration patterns now treated as ad hoc workflows could be codified into standardized, yet flexible rule-based processes. For example, if an individual could subscribe to a specific content tag, she could be automatically notified and served a link whenever a new, publicly-accessible file bearing that tag appeared in the Cloud Content Management system. In effect, this would automate the sharing of that file, replacing the all-too-common practice of manually emailing the file to a poorly constructed and maintained distribution list. It is important, however, that future work to automate content collaboration patterns not be done at the expense of the simplicity and ease of use that is one of the primary benefits of Cloud Content Management today. 4.3 SMART CONTENT IN PUBLISHINGSmart content remains a work in progress. Geoffrey Bock expects to develop the prescriptive road map in the months ahead. Here’s a quick take on where he is right now.(30)• For publishers, it’s all about transforming the publishing paradigm through content enrichment – defining the appropriate level of granularity and then adding the semantic metadata for automated processing.• For application developers, it’s all about getting the information architecture right and ensuring that it’s extensible. There needs to be sensible storage, the right editing and management tools, multiple methods for organizing content, as well as a flexible rendering and production environment.For business leaders and decision makers, there needs to be an upfront investment in the right set of content technologies that will increase profits, reduce operating costs, and mitigate risks.