Infosys Limited [ AUG 2013 - DEC 2015]
Published:
Tool usage: R, SQL, CARET, NLTK
At Infosys, Jay worked for a European banking clients on projects related to Data Analysis and Machine Learning. He has experience working on different phases of end to end data science pipeline
- Requirements: Communicated with client and Subject Matter Experts to identify and document business needs and problem statements that require data driven solutions
- Data Extraction: Used SQL,PL/SQL, Hive query language to obtain data required for analysis from multiple sources such as Oracle DB, Hive tables
- Data Preparation: Used R packages such as dplyr, tidyr, lubridate for data preparation activities such as data cleaning,dealing with missing values,parsing the dates,creating new variables ,regular expressions to process text etc.
- Data Exploration: Used R packages ggplot2 , pca, alr4,dplyr to check distributional assumptions using statistical tests,for obtaining summary statistics and aggregated tables, checking associations among variables using correlation analysis/plots and removing high correlations in data, principal component analysis for dimensionality reduction etc.,
- Model Building: Used CARET,h20 packages in R to train different supervised learning models on data such as Regression, Logistic regression, Decision Tree, Random forests and methods such as cross validation,regularization to deal with over fitting
- Reporting: Presented and discussed the data driven model results with non technical audience (client) to identify the factors that can help in business decision making.
During his stint with Infosys,Jay got the opportunity to work on multiple projects and learned many things like text mining methods,HIVE and core banking domain.Apart from work routine, Jay delivered training sessions to peers on technology such as SQL,Big Data Basics and for the best efforts as a team player he was honored with “INSTA AWARD”.