Vocabulary 1) Text segmentation—given a short text, ?nd the

Vocabulary – We download alist of english verbs and adjectives from an online dictionary – YourDictionary and harvest a collection of attributes, concepts, and instances froma wellknown knowledgebase Probase. Altogether, they constitute our vocabulary.To cope with the noise contained in short texts, we further extend thevocabulary to incorporate abbreviations and nicknames of instances.Knowledgebase – Aknowledgebase stores mappings between instances and concepts. Some existingknowledgebases also associate each concept with attributes.

In this work, we useProbase as our knowledgebase. Probase is a huge semantic network of concepts(e.g.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

, country), instances (e.g., china) and attributes (e.g., population). Itmainly focuses on two types of relationships, namely the isA relationshipbetween instances and concepts (e.g., china isA country) and the isAttributeOfrelationship between attributes and concepts (e.

g., population isAttributeOfcountry).Given a short text s written in anatural language, we generate a semantic interpretation of s represented as asequence of typed-terms and the semantics of each instance is labeled with thetop-1 concept cluster.Asshown in Fig. 1, the semantic interpretation of the short text, we divide thetask of short text understanding into three subtasks:1) Text segmentation—given a short text,?nd the most semantically coherent segmentation;2)Type detection—for each term, detect its best type;3)Concept labeling—for each ambiguous instance, rerank its concept clustersaccording to the context. Fig. 2 illustrates our framework for short textunderstanding.

we construct index on the entire vocabulary and acquireknowledge from web corpus and existing knowledgebases. Then, we pre-calculatesemantic coherence between terms which will be used for short textunderstanding. We perform text segmentation, type detection, and conceptlabeling, and generate a semantically coherent interpretation for a given shorttext.TextSegmentation – We can recognize all possible terms from a short text using thetrie-based framework 28. But the real question is how to obtain a coherentsegmentation from the set of terms. Let us consider two examples “april inparis lyrics” and “vacation april in paris” to illustrate our approach to textsegmentation. Obviously, {april in Paris lyrics} is a better cleavage of “Aprilin Paris lyrics” than {april Paris lyrics}, since “lyrics” is more semanticallyrelated to songs than two months or cities. Similarly, {vacation April paris}is a better segmentation of “vacation April in Paris”, due to higher coherenceamong “vacation”, “April”, and “Paris” than that between “vacation” and” April inParis”.

We segment a short text into a sequence of terms. We give the followingheuristics to determine a valid segmentation. Except for stop words, each word belongs to one and only one term;  Terms are coherent (i.e., terms mutuallyreinforce each other). We use a graph to represent candidate terms and theirrelationships. we generate a semantic interpretation ofs represented as a sequence of typed-terms and the semantics of each instanceis labeled with the top-1 concept cluster.

 such that termscannot overlap and every non-stopword in the short text should be covered by aterm.Af?nity Score (AS) is defined to measuresemantic coherence between typed-terms. In this work, we consider two types ofcoherence: similarity and relatedness (co-occurrence).

We believe that twotyped-terms are coherent if they are semantically similar or they oftenco-occur on the web. Therefore, the Af?nity Score between typed-terms x and ycan be calculated as follows: