Consulting
I have advised global clients from US, UK, EU, India on:
- GenAI use cases and pricing strategy
- AI & Data Science team building strategy
- AI Product development and strategy
- Data strategy for building AI capabilities
- AI team development and upskilling
- AI transformation roadmap
- AI infrastructure roadmap for startups
Papers
- For all my papers, visit: Google Scholar (h-index: 26; citations: ~3000) | Semantic Scholar
- Topics: NLP, NLU, Speech Recognition, Fake News, Translation, Content Moderation
Gupta A, Sukumaran R, John K, Teki S (2021)
Hostility Detection and Covid-19 Fake News Detection in Social Media
CONSTRAINT Workshop, AAAI 2021 (non-archival) [link] [conference]
Brahma AK, Potluri P, Kanapaneni M, Prabhu S, Teki S (2021)
Identification of Food Quality Descriptors in Customer Chat Conversations using Named Entity Recognition
CODS-COMAD 2021 Research Track. [link]
Vijjali R, Potluri P, Iyer S, Teki S (2020)
Two Stage Transformer model for Covid-19 Fake News Detection and Fact Checking
NLP for Internet Freedom Workshop co-located at COLING 2020 [link] [data] [conference] [video summary]
Rangan P*, Teki S* (2020) [*Equal contribution as first authors]
Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification
1st Workshop on Spoken Language Technologies for Multilingual Communities, INTERSPEECH 2020 [link]
Teki S (2019)
Internationalization of NLP Models for Sensitive Content Detection in Alexa Utterances
Amazon Machine Learning Conference
Blogs
AI: Leadership & Best Practices
AI: Data & Governance
AI: Use cases
Team development
Misc
- How to Automate MLOps?
- Data Engineer vs Data Scientist
- Top 10 MLOps tools
- Developing AI/ML Projects for Business - Best Practices
- Building AI/ML products [video]
- How to build AI Teams that Deliver?
- Why Corporate AI Projects Fail? Part 1
- Why Corporate AI Projects Fail? Part 2
- How to hire Data Science teams?
- Benefits of FAANG companies for Data Science & ML roles
- ML Engineer vs Data Scientist
- Best Practices for Improving Machine Learning Models
- The Case for Reproducible Data Science
- Reskilling India for an AI-First Economy
AI: Data & Governance
- How to Choose a Vector Database [New]
- Data Preparation Steps for Data Engineers
- Why is a Strong Data Culture Important to your Business
- How Big Tech Companies Define Business Metrics
- What are Best Practices for Data Governance?
- Choosing a Data Governance Framework for your Organization
- Why Data Democratization is important to your business?
- How to ensure Data Quality through Governance
- The Metric Layer and how it fits into the Modern Data Stack
- How to Generate Synthetic Data for Machine Learning Projects
- Understanding and Measuring Data Quality
- Surefire Ways to Identify Data Drift
- Data Labeling and Relabeling in Data Science
- Data Labeling: The Unsung Hero Combating Data Drift
AI: Use cases
- Mixtral - Mistral of Experts Large Language Model [New]
- How to choose the best time series forecasting model?
- Federated Machine Learning for Healthcare
- AI & Web3
- What are Fake Reviews?
- Knowledge Distillation: Principles, Algorithms & Applications
- TLDR: AI for Text Summarization & Generation of TLDRs
- Covid or just a Cough? AI for Detecting Covid-19 from Cough Sounds
- Fact-checking Covid-19 Fake News
- AI-enabled Conversations with Analytics Tables
Team development
- How to Manage Stakeholders Effectively?
- Effective Communication between Scientists and Non-scientists
- How to Improve Retention in Engineering Teams?
- Team Development Tips for Engineering and Product Leaders
- Five 5-minute Team-Building Activities for Remote Teams
Misc
Document AI at Docsumo
At Docsumo, I head a team of 25+ ML and Data engineers to build a Document AI platform using NLP, LLMs, Computer vision:
- State-of-the-art Transformer-based deep learning models and LLMs like GPT to extract information from structured and unstructured documents like invoices, bank statements, receipts, tax forms, insurance forms etc.
- Synthetic Data pipeline to augment training data resulting in significant improvement in performance of above models
- Chat with Documents powered by GPT-based models
- Data Annotation operations - tracking and visibility of key metrics like accuracy, velocity and reviews
- Table detection using classical ML models and structure recognition using deep learning models
- Quarterly OKR planning for the entire ML org
- Hiring of ML engineers and scientists
- ML Team mentorship via regular team and 1:1 meetings
- Business Strategy for the Intelligent Document Processing market
- Industry engagement with US customers and SaaS VC firms in India
- AI Stack: Transformer, BERT, LayoutLM, Large Language models like ChatGPT, GPT3.5+, open-source LMs like Llama2, Dolly; Python, PyTorch, Data augmentation, Synthetic Data, Data Annotation, Data Platform, ML Platform, Table detection and structure recognition, Vector database, Document embeddings, Chat with Documents
AI Research at Amazon Alexa AI
At Amazon Alexa AI, I worked on advancing Alexa's Speech Recognition and Natural Language Understanding AI capabilities.
- Developed state-of-the-art end-to-end sequence-to-sequence deep learning models for speech recognition
- Trained Speech recognition deep learning models on 20000+ hours of data using distributed multi-host, multi-GPU training
- Deployed high impact Deep learning NLP models deep learning models to detect offensive conversations between users and Alexa in multiple languages using Neural Machine Translation and Synthetic Data generation.
- Mentored software engineers and interns on machine learning and deep learning
- Founded Alexa AI blog followed by the Alexa leadership as well as 1500+ scientists & engineers across Alexa, AWS & Amazon
- Contributed to 'Dive into Deep Learning' book
- Published a paper at the Amazon Machine Learning Conference on detection of offensive and sensitive content in user interactions with Alexa in multiple languages
- Conducted research on homomorphic encryption, federated learning and privacy-preserving deep learning
- AI Stack: Transformer, BERT, Seq2Seq, AWS, EC2, CUDA, Python, Bash, Vim, Linux, Docker, TensorFlow, MXNet, NMT, AWS Translate, Sockeye, PyTorch, Fairseq, Tensor2Tensor, Data augmentation, Synthetic Text, Backtranslation
Applied AI at Swiggy
I led the AI Team at Swiggy, India’s largest food ordering and delivery platform where I developed novel Deep Learning technologies for multiple NLP and Speech use cases like Chatbot, Intent recognition, Product Classification, Sentiment analysis of user reviews, Speech recognition for Hinglish customer service conversations, Voice sentiment analysis amongst others.
- Managed cross-functional stakeholders across Product, Engineering, Business and Analytics teams and also defined, led and managed POCs with external startups and vendors.
- Led Swiggy's first ever AI paper on Speech and Language Recognition accepted at INTERSPEECH2020
- Led team to win 2nd place in Microsoft's Challenge on 'Speech Technologies for Code-Mixing in Multilingual Communities'
- I led the AI team on a paper on identification of bad food quality descriptors in customer chat, at CODS-COMAD 2021
- NLU modeling using weak supervision to decode intent in code-mixed chat
- Sentiment analysis to identify negative feelings, emotions and opinions in chat
- Language identification in code-mixed chat
- Predicting Social Media Escalations based on chat input
- Classification of Products into multiple categories based on text inputs
- Named Entity Recognition of various entities in chat conversations
- AI Stack: Transformer, BERT, RoBERTa, DeepSpeech, AWS, EC2, s3, SageMaker, Python, Jupyter, Bash, Vim, Linux, Docker, TensorFlow, PyTorch, Snowflake
My interview for IndiaAI