WSDM cup,2017 Triple Scoring Task

Date:

Key Words: Python, NLTK ,word2Vec, GloVe, SVR ,PCA
​​
Triple-scoring task is to compute relevance scores for knowledge-base triples from type-like relations such as PERSON-PROFESSION-RELEVANCE SCORE or PERSON-NATIONALITY-RELEVANCE SCORE. For example, Julius Caesar has various professions, including Politician and Author. Given the (Julius Caesar,{ profession} Politician) triple and (Julius Caesar, {profession} Author) triple, the former triple is expected to have a higher relevance score (so called “triple score”) because politician is the major profession of Julius Caesar. Accurate estimation of such triple score benefits ranking for knowledge base queries. Since in the training data we just had three columns “Person”,”Profession”,”Score” so lot of feature engineering was performed to create vector space for training ,in our approach we propose to integrate knowledge from both latent features and explicit features with an ensemble method to address the problem. The latent features consist of person entities representations learned from word2vec model and profession/nationality values’ representations learned from GloVe model. In addition, we also incorporated the explicit features of person entities from the Freebase knowledge base and from Wikipedia. Experimental results show that our feature integration method with and Support Vector Machine algorithm training performed well and ranked 3rd place in the competition.
Methodology Paper was submitted to WSDM ,2017 conference
Title : “ Integrating Knowledge from Latent and Explicit Features for Triple Scoring”

Links : Git Repo