a

Text Mining with Python

  • Machine Learning Introduction
  • Statistics vs Business Analytics vs Data Science vs Machine Learning vs Deep Learning vs Artificial Intelligence(Understanding the difference)
  • Machine learning project life cycle
  • Text Mining project life cycle Generalized architecture

2 Days

Tools and platforms used in Machine learning

  • Cloud-based platforms
  • Proprietary tools
  • Open source tools, Platforms
  • What is Python & History
  • Installing Python & Python Environment
  • Basic commands in Python
  • Data Types and Operations
  • Python packages
  • Loops
  • My first python program
  • If-then-else statement
  • Functions in Python
  • User defined Functions
  • Numpy
  • Scipy
  • Pandas
  • Matplotlib
  • Sklearn
  • nltk
  • Data importing
  • Connecting to External data sources
  • Working with datasets
  • Manipulating the datasets
  • Merging
  • Exporting the datasets into external files
  • Population and Sample o Data Types
  •  Measures of Central tendency o Measures of dispersion
  •  Percentiles & Quartiles
  •  Box plots and outlier detection o Creating Graphs and Reporting o Probability Distributions
  •  Hypothesis testing
  • Exploratory Data Analysis
  • Data Validation rules
  • Data Cleaning techniques

Deal with missing data Add default values

Remove incomplete rows

Deal with error-prone columns

Fixing the nan values and string/float confusion

  • Data Preparation for analysis

Normalize data types Change casing

Creating new variables Feature Scaling

Feature Standardization Label Encoding

One-Hot Encoding

  • Supervised Machine learning algorithms
  • Unsupervised Machine learning algorithms
  • Need of logistic Regression
  • Logistic regression models
  • Validation of logistic regression models
  • Multicollinearity in logistic regression
  • Individual Impact of variables
  • Confusion Matrix
  • Case study(Spam filtering)
  • What is text mining
  • The NLTK package
  • Preparing text for analysis
  • Information retrieval
  • Text Pre-processing
  • Text summarisation
  • Sentiment analysis
  • Text classification
  • News data classification
  • Topic Modelling
  • LDA
  • LDA on Python
  • Enterprise Business Intelligence/Data Mining, Competitive Intelligence
  • E-Discovery, Records Management
  • National Security/Intelligence
  • Scientific discovery, especially Life Sciences
  • Sentiment Analysis Tools, Listening Platforms
  • Natural Language/Semantic Toolkit or Service
  • Publishing
  • Automated ad placement
  • Search/Information Access
  • Social media monitoring

Text Mining best practices