Évènements

Évènements à venir

Évènements antérieurs

2022

Meetup mai 2022

Upul Bandara, Machine Learning Architect at Airudi Inc., ‘Building Production-Ready NLP Systems : A case study of applying machine learning to occupational health and safety

Philippe Racine, Founder Stent AI & co-founder V3 Stent, ‘How AI will disrupt HR



Détails

May 26th 2:00pm - 3:30pm (GMT-5)

Agenda

2:00 PM Welcome - Fabrizio Gotti, NLP programmer, RALI & IVADO

2:05 PM Talk : Upul Bandara - Machine Learning Architect at Airudi Inc. - ‘Building Production-Ready NLP Systems: A case study of applying machine learning to occupational health and safety’

Abstract: NLP is a promising avenue for building intelligent products such as chatbots, and product recommendation systems. Modern libraries make NLP accessible to any developer. However, to develop production-ready applications, someone needs much more expertise than merely knowing how to use modern NLP tools. We have been designing NLP-enabled applications for occupational health and safety. In this presentation, we would like to share our practical experience in building large-scale, production-ready NLP systems.


2:35 PM Question period

2:45 PM Talk : Philippe Racine - Founder Stent AI & co-founder V3 Stent - ‘How AI will disrupt HR


Abstract: We will explain how we use AI techniques to find the best candidates in the process of hiring new talent.

3:15 PM Question period

3:25 PM Closing remarks - Fabrizio Gotti, NLP programmer, RALI & IVADO


Upul Bandara

Upul is a machine learning architect with several years of professional experience in machine learning systems, data preprocessing pipelines, and large-scale software systems. At Airudi, he is the technical leader for machine learning systems in the context of Human Resource Management. He previously worked at Imagia, where he contributed to Imagia's federated learning system. He holds a master's degree in Computer Science.

https://www.linkedin.com/in/upulbandara/


Philippe Racine

Philippe is Stent’s CEO and a senior partner of the group but above all a passionate leader. This concept creator has a long list of successful IT companies, an international patent pending and several groundbreaking innovations to his credit. AI, Internet of Things, Fintech, Data mining, etc. At V3 Stent, Philippe and his organizational skills guide the teams and offer his employees unfailing support in the face of all their challenges.







Meetup avril 2022

Mark Likhten - Strategy and legal innovation lead at the Cyberjustice Laboratory of the Université de Montréal’s Faculty of Law & Hannes Westermann - PhD candidate and lead researcher, Cyberjustice Laboratory, Université de Montréal - ‘’Using Artificial Intelligence and Natural Language Processing to Increase Access to Justice’


Olivier Salaün - PhD candidate, Laboratoire de recherche appliquée en linguistique informatique (RALI), Université de Montréal - ‘’Exploiting domain-specific knowledge for judgment prediction is no panacea’’

Détails

19 avril 2022 de 14h00 - 15h30 (GMT-5)


Agenda

2:00 PM Welcome - Fabrizio Gotti, NLP programmer, RALI & IVADO

2:05 PM Talk : Mark Likhten - Strategy and legal innovation lead at the Cyberjustice Laboratory of the Université de Montréal’s Faculty of Law & Hannes Westermann - PhD candidate and lead researcher, Cyberjustice Laboratory, Université de Montréal - ‘’Using Artificial Intelligence and Natural Language Processing to Increase Access to Justice’’

Abstract: The JusticeBot is an online tool that asks users questions and provides relevant legal information based on the answers of the user, using artificial intelligence. Our first publicly available implementation of the tool, focused on rental disputes, has been accessed over 9,000 times since the summer of 2021. In this presentation, we will share more about the project, how it was conceptualized and developed, and how we are using natural language processing to support the creation and the efficiency of the tool. We will also talk about other relevant projects conducted at the Cyberjustice Laboratory.

2:35 PM Question period

2:45 PM : Olivier Salaün - PhD candidate, Laboratoire de recherche appliquée en linguistique informatique (RALI), Université de Montréal - ‘’Exploiting domain-specific knowledge for judgment prediction is no panacea’’

Abstract: Legal judgment prediction (LJP) usually consists in a text classification task aimed at predicting the verdict on the basis of the fact description. The literature shows that the use of articles as input features helps improve the classification performance. In this work, we designed a verdict prediction task based on landlord-tenant disputes and we applied BERT-based models to which we fed different article-based features. Although the results obtained are consistent with the literature, the improvements with the articles are mostly obtained with the most frequent labels, suggesting that pre-trained and fine-tuned transformer-based models are not scalable as is for legal reasoning in real life scenarios as they would only excel in accurately predicting the most recurrent verdicts to the detriment of other legal outcomes.

3:15 PM Question period

3:25 PM Closing remarks - Fabrizio Gotti


Mark Likhten

Mark Likhten is strategy and legal innovation lead at the Cyberjustice Laboratory of the Université de Montréal’s Faculty of Law. He has been working for several years as legal advisor in the field of information technology for companies both locally and internationally. Mark coordinates various projects and collaborates with the Lab's partners in the private and public sectors around digital transformation. He also drives the development of technological tools for justice and contributes to the research on cyberjustice on a regular basis.


Hannes Westermann

Hannes Westermann is a PhD candidate in the field of artificial intelligence and law at the Université de Montréal. He works at the Cyberjustice Laboratory and is the lead researcher of the JusticeBot project. JusticeBot is an online platform that aims to improve public access to justice through the use of artificial intelligence. Since launching in July 2021, JusticeBot has been accessed by over 9,000 users. Hannes has published and presented his research on applying machine learning and natural language processing to legal documents at multiple international conferences, including the “International Conference on Artificial Intelligence and Law (ICAIL)” 2019 and 2021, and the “International Conference on Legal Knowledge and Information Systems (JURIX)” 2019 and 2020, where his work was given the Best Paper Award.


Olivier Salaün

After completing a curriculum in economics, Olivier Salaün earned an MSc in Data Science and Business Analytics at CentraleSupélec and ESSEC, respectively engineering and business schools in France, thus making a step in the field of natural language processing. He currently is a PhD candidate supervised by Professor Philippe Langlais at RALI (Recherche appliquée en linguistique informatique), an NLP lab of the Department of Computer Science and Operations Research at Université de Montréal. Within the framework of a partnership with Cyberjustice lab at the Law Faculty, his research focuses on application of NLP to legal documents.




Meetup mars 2022

Christopher Driscoll, Lead Solutions Architect at CAE Healthcare, ‘’The Use of NLP for Healthcare Education using Simulated and Virtual Patients’’


Jean-Noël Nikiema, Assistant Professor at École de santé publique de l’Université de Montréal (ESPUM), Université de Montréal and Co Director at Lab TNS, ‘’Clinical language processing: significance of ontologies for the analysis of medical texts’’

Détails

Agenda

2:00 PM Welcome - Fabrizio Gotti, NLP programmer, RALI & IVADO

2:05 PM Talk : Christopher Driscoll, Lead Solutions Architect at CAE Healthcare, ‘’The Use of NLP for Healthcare Education using Simulated and Virtual Patients’’

2:35 PM Question period

2:45 PM : Jean-Noël Nikiema, Assistant Professor at École de santé publique de l’Université de Montréal (ESPUM), Université de Montréal and Co Director at Lab TNS, ‘’Clinical language processing: significance of ontologies for the analysis of medical texts’’

3:15 PM Question period

3:25 PM Closing remarks - Fabrizio Gotti

Christopher Driscoll, Ph.D

Christopher Driscoll has a background in biomedical and mechanical engineering and has been developing medical simulators for the past 9 years at CAE Healthcare; currently holding the position of lead solutions architect. He previously worked in the fields of human performance research, health promotion education as well as aerospace engineering.

Jean-Noël Nikiema, Ph.D

Jean-Noël is an assistant professor in the Department of Health Management, Evaluation and Policy at the University of Montreal School of Public Health. He is a specialist in biomedical terminologies and ontologies. His research focuses on quality assurance and interoperability of knowledge organization systems and their application in natural language processing, knowledge discovery, and information integration. Jean-Noël has a MD from the Nazi Boni University (Burkina Faso) and a Ph.D. in health informatics from the Université de Bordeaux (France).

Détails

ATELIER DE FORMATION

Durée: 6 heures en ligne + 2 heures de travaux

Niveau: Débutant à Intermédiaire

Langue: Anglais

Vendredis 11 et 18 Février 2022

9h00 à 12h00 ( HNE, UTC-5)


DESCRIPTION


DESCRIPTION de l'atelier actuel, le 4 ème d'un série de 4

"Graphes de connaissances: extraction et utilisation de savoir à partir de documents non structurés"

Ce cours vise à implémenter à partir de zéro un pipeline de TALN simple mais complet qui combine plusieurs sources de connaissances afin de créer un graphe de connaissances (KG) à l'aide d'outils libres et commerciaux. Entre autres choses, ce pipeline extraira des connaissances automatiquement (ou semi-automatiquement) à partir de texte non structuré (par exemple, un corpus spécifique à un domaine de taille moyenne). Le cours démystifiera des concepts tels que les ontologies, les taxonomies, le web sémantique, les entités, etc. Pour montrer tout le potentiel des KG, le présentateur montrera comment les connaissances peuvent alimenter un exemple d'application avec un intérêt commercial évident, par ex. moteur de recherche, moteur de recommandation, analyse d’affaires ou chatbot. Si le temps le permet, les sujets de la curation et nettoyage des données, ainsi que l'enrichissement des KG pourraient également être expliqués.

Techniques de résolution / technologies abordées

L'objectif ici est de sélectionner un exemple (par exemple, KG d'entreprise pour les employés et les contrats dans une banque, ressources touristiques, etc.) et de parcourir l'ensemble du processus de spécification, de création, de déploiement et de test d'un KG.

  • Étude préalable : identification des sources de données disponibles, usage envisagé, acteurs, etc.

  • Outils web sémantiques pour construire une ontologie afin de définir un KG utile: schema.org, Protegé, semantify.it, etc.

  • Exploration des KG actuels et de leur utilité pour la tâche à accomplir: Freebase, YAGO, Google KG, et autres. Même si la gestion d'un si gros KG n'est pas l'objectif principal, il est intéressant de les présenter.

  • Collaboration humaine pour KG : outils manuels de collecte de connaissances auprès de collaborateurs humains, et les limites de cette approche, notamment l'effort et la nécessité d'une approche fédératrice.

  • Extraction de connaissances automatisée et semi-automatisée.

  • Prétraitement des corpus, à l'aide d'une boîte à outils NLP mature, comme Allen AI, spaCy, NLTK, Stanford NLP, Open NLP, etc.

DESCRIPTION de la série d'ateliers en TALN (4 courts ateliers)

Pour tous les cours, nous nous concentrerons sur les aspects critiques de l'exploration de solutions TALN efficaces dans un cadre industriel, à savoir :


  • Utiliser et démontrer des frameworks TALN polyvalents et commerciaux (par exemple, un des frameworks nltk, Stanford TALN, Allen TALN etc.)

  • Fournir deux ou trois approches pour aborder le problème en question, qui diffèrent en termes de complexité, de performance, de temps de formation et d'inférence, afin d'offrir des alternatives au praticien, qui doit faire face à des contraintes de mise en œuvre.

  • Des métriques pertinentes utilisées pour guider l'effort scientifique.

  • Développement d'applications commerciales à l'aide de la TALN.

  • Comment diagnostiquer les problèmes et comment les résoudre tout en développant une solution TALN.

  • Les jeux de données utiles et les approches semi-supervisées possibles lorsque les données font défaut.


FORMATION CERTIFIÉE PAR SCALE AI

50% de rabais sur l'Admission générale pour tous les professionnel.le.s travaillant au Canada.

Code promo: IVA_Scale_NLP0222_4

OBJECTIFS D'APPRENTISSAGE

  • Apprendre les concepts de base et la théorie des graphes de connaissances (KG) et comprendre leur valeur.

  • Apprenez à construire un graphe de connaissances (KG) à partir de zéro.

  • Comprendre comment les graphes de connaissances (KG) peuvent être intégrés dans diverses applications, par exemple les moteurs de recherche, les systèmes de réponse aux questions, les chatbots, etc.

AUDIENCE VISÉE

  • Personnel technique (informaticien.nes, scientifiques de données, ingénieur.es, etc.)

  • Personne travaillant dans le domaine et désireux de faire de courtes formations axées sur la résolution de problèmes pratiques.

PÉREQUIS

  • Base en mathématiques et programmation avec Python

  • Aucune connaissance en TALN n'est requise

IMPLICATION ATTENDUE

  • Être présent.e aux 2 sessions en ligne en direct (synchrone) pour obtenir l'attestation.

  • 2 sessions de 3 heures

  • Pour chaque session: 1 heure théorique, 2 heures pratiques et 1 heure de travaux

FORMATEUR

Bang Liu

Bang Liu est professeur adjoint au Département d’informatique et de recherche opérationnelle (DIRO) de l’Université de Montréal et membre de Mila – Institut québécois d’intelligence artificielle. Il a obtenu son baccalauréat d’ingénierie en 2013 du University of Science and Technology of China (USTC), ainsi que sa maîtrise et son doctorat de l’Université de l’Alberta en 2015 et 2020. Ses recherches portent principalement sur le traitement du langage naturel (NLP), l’exploration des données et l’apprentissage automatique appliqué.


PROGRAME

Horaire

  • Vendredi 11 Février, 9h00 à 12h00 : Concept de base des graphes

  • Vendredi, 18 Février 9h00 à 12h00 : Intégration des graphes

FRAIS D'INSCRIPTIONS*

  • Admission générale 500$ + tx

  • Universitaires 400$ + tx**

  • Étudiant.e.s/postdocs 100$ + tx**

*Les prix sont en dollars canadiens. Les taxes (TPS, TVQ) sont applicables pour les participants canadiens seulement.

**Les prix réduits ne sont applicables qu'aux citoyens canadiens et/ou aux personnes travaillant au Canada sur présentation d'une carte d'identité universitaire. envoyée à formations@ivado.ca. Pour les autres participants, l'admission générale s'applique.


Les inscriptions se terminent le mercredi, 10 février 2022

Détails

ATELIER DE FORMATION

Durée: 6 heures en ligne + 2 heures de travaux

Niveau: Débutant à Intermédiaire

Langue: Français

Vendredis 21 et 28 Janvier 2022

9h00 à 12h00 ( HAE, UTC-4)

FORMATION CERTIFIÉE PAR SCALE AI

50% de rabais sur l'Admission générale pour tous les professionnel.le.s travaillant au Canada.

Code promo: IVA_Scale_NLP0122_3

DESCRIPTION

DESCRIPTION de l'atelier actuel, le 3ème d'un série de 4

" Création d'un agent conversationnel avec objectif (chatbot)"

Cet atelier se centrera sur la conception, la mise en œuvre et l’évaluation d'un agent de dialogue orienté tâche, (chatbot ou voicebot – les aspects vocaux seront présentés sommairement). Les enjeux éthiques et pratiques liés aux agents conversationnels seront présentés, ainsi que les différentes méthodes et technologies disponibles. Dans la portion pratique de l’atelier, un chatbot simple sera construit et amélioré étape par étape : identification de la tâche à automatiser, création d’un prototype avec l’approche ‘Wizard-of-Oz’, implémentation du chatbot avec Rasa, et finalement optimisation du modèle de TALN et perfectionnement du dialogue de manière itérative avec les données des utilisateurs.

DESCRIPTION de la série d'ateliers en TALN (4 courts ateliers)

Pour tous les cours, nous nous concentrerons sur les aspects critiques de l'exploration de solutions TALN efficaces dans un cadre industriel, à savoir :

  • Utiliser et démontrer des frameworks TALN polyvalents et commerciaux (par exemple, un des frameworks nltk, Stanford TALN, Allen TALN etc.)

  • Fournir deux ou trois approches pour aborder le problème en question, qui diffèrent en termes de complexité, de performance, de temps d'entraînement et d'inférence, afin d'offrir des alternatives au praticien, qui doit faire face à des contraintes de mise en œuvre.

  • Des métriques pertinentes utilisées pour guider l'effort scientifique.

  • Développement d'applications commerciales à l'aide de la TALN.

  • Comment diagnostiquer les problèmes et comment les résoudre tout en développant une solution TALN.

  • Les jeux de données utiles et les approches semi-supervisées possibles lorsque les données font défaut.

OBJECTIFS D'APPRENTISSAGE

  • Introduire les étudiants et professionnels aux différentes méthodes et technologies de création d’agents conversationnels

  • Présenter les différentes données et métriques utiles à la création et l’amélioration d’un agent de dialogue orienté tâche

  • Apprendre à utiliser un engin de dialogue populaire (Rasa)

AUDIENCE VISÉE

  • Personnel technique (informaticien.nes, scientifiques de données, ingénieur.es, etc.)

  • Personne travaillant dans le domaine et désireux de faire de courtes formations axées sur la résolution de problèmes pratiques.

PRÉREQUIS

  • Base en mathématiques et programmation avec Python

  • Aucune connaissance en TALN n'est requise

IMPLICATION ATTENDUE

  • Être présent.e aux 2 sessions en ligne en direct (synchrone) pour obtenir l'attestation.

  • 2 sessions de 3 heures

  • Pour chaque session: 1 heure théorique, 2 heures pratiques et 1 heure de travaux

FORMATEUR

Karine Déry

Karine a étudié en psychologie et en informatique et est conceptrice de logiciels chez Nu Echo Inc. depuis 4 ans, où elle a travaillé sur différents projets d’agents virtuels textuels et vocaux. Elle travaille principalement à l’implémentation du dialogue avec des technologies variées dont Dialogflow.ai, Rasa.ai, Oracle Digital Assistant et des cadriciels propriétaires, ainsi qu’au développement d’outils d’expérimentation sur les données utilisateurs servant à l’optimisation des modèles TALN d’agents conversationnels.

PROGRAME

Horaire

  • Vendredi 21 Janvier, 9h00 à 12h00 : Création d’un agent conversationnel

  • Vendredi, 28 Janvier 9h00 à 12h00 : Optimisation et améliorations d’un agent conversationnel

FRAIS D'INSCRIPTIONS*

  • Admission générale 500$ + tx

  • Universitaires 400$ + tx**

  • Étudiant.e.s/postdocs 100$ + tx**

*Les prix sont en dollars canadiens. Les taxes (TPS, TVQ) sont applicables pour les participants canadiens seulement.

**Les prix réduits ne sont applicables qu'aux citoyens canadiens et/ou aux personnes travaillant au Canada sur présentation d'une carte d'identité universitaire. envoyée à formations@ivado.ca. Pour les autres participants, l'admission générale s'applique.

Les inscriptions se terminent le mercredi, 19 janvier 2022

Meetup janvier 2022

Lewis Tunstall, Machine Learning Engineer at HuggingFace -

‘’Accelerating Transformers with Hugging Face Infinity and Optimum’’


Mehdi Rezagholizadeh, Staff Machine Learning Research Scientist - NLP Team Lead at Huawei -

‘Efficient Data Augmentation for Knowledge Distillation in NLP’’


Mathieu Fortier, Director of NLP R&D at Coveo -

‘’Productizing Modern NLP’’ *This presentation will not be recorded

Agenda

Agenda

2:00 PM Welcome - Fabrizio Gotti, NLP programmer, RALI & IVADO

2:05 PM Talk: Lewis Tunstall, Machine Learning Engineer at HuggingFace -

‘’Accelerating Transformers with Hugging Face Infinity and Optimum’’

Abstract: Since their introduction in 2017, Transformers have become the de facto standard for tackling a wide range of NLP tasks in both academia and industry. However, in many situations accuracy is not enough — your state-of-the-art model is not very useful if it’s too slow or large to meet the business requirements of your application.

In this talk, I’ll give an overview of Hugging Face’s efforts to accelerate the predictions and reduce the memory footprint of Transformer models. I’ll discuss Infinity (https://huggingface.co/infinity), which is a containerised solution that delivers millisecond-scale latencies in production environments. I’ll also introduce a new open-source library called Optimum (https://github.com/huggingface/optimum), which enables developers to train and run Transformers on targeted hardware.

2:30 PM Question period

2:40 PM Talk : Mehdi Rezagholizadeh, Staff Machine Learning Research Scientist - NLP Team Lead at Huawei -

‘Efficient Data Augmentation for Knowledge Distillation in NLP’’

Abstract: Knowledge distillation (KD) is a prominent neural compression technique which is widely used for compressing pre-trained language models in NLP. While KD aims at transferring the knowledge of a large pre-trained model to improve generalizability of a smaller student network, it is shown in the literature that KD in its original form has limited success in dealing with pre-trained students. Data augmentation can be a potential solution for this shortcoming of KD. In this presentation, we will highlight the importance of tailoring existing DA techniques for KD. In this regard, we deploy a MiniMax approach to spot regions in the input space where the teacher and student networks diverge the most from each other and generate some augmented data from the training samples to cover these maximum divergence regions accordingly. The new augmented samples will enrich the training data to improve KD training. The other issue is finding data augmentation techniques that can produce high-quality human-interpretable examples has always been challenging. Recently, leveraging kNN such that augmented examples are retrieved from large repositories of unlabelled sentences has made a step toward interpretable augmentation. In contrast to existing kNN augmentation techniques that blindly incorporate all samples, we introduce a method which dynamically selects a subset of augmented samples that maximizes KL-divergence between the teacher and student models. This step aims to extract the most efficient samples to ensure our augmented data covers regions in the input space with maximum loss value. We evaluate our methods, using BERT-based models, on the GLUE benchmark and demonstrate they outperform competitive data augmentation baselines for KD.

3:05 PM Question period

3:15 PM Talk: Mathieu Fortier, Director of NLP R&D at Coveo -

‘’Productizing Modern NLP’’ *This presentation will not be recorded

Abstract: Technical overview of how the Coveo ML team leveraged modern NLP to develop and productize solutions to real world client needs.

3:40 PM Question period

3:50 PM Closing remarks - Fabrizio Gotti


Lewis Tunstall

Lewis is a Machine Learning Engineer in the open-source team at Hugging Face. He has several years experience building machine learning powered applications for startups and enterprises in the domains of natural language processing, topological data analysis, and time series. His current work focuses on developing tools for the NLP community and educating people on how to use them effectively.


Mehdi Rezagholizadeh

Mehdi Rezagholizadeh is a Staff Machine Learning Research Scientist at Huawei’s Montreal Research Centre leading the NLP team. Mehdi obtained his PhD in Electrical and Computer Engineering from Centre of Intelligent Machines in McGill University in Dec. 2016. He joined Huawei in January 2017 and his research focus has been on different deep learning and NLP projects such as efficient pre-trained language models, model compression for NLP, generative adversarial networks, neural machine translation, and adversarial neural text generation. The result of his work in Huawei is published in different NLP and machine learning conferences such as NeurIPS, ACL, NAACL, EMNLP, AAAI, EACL, Interspeech and ICASSP.


Mathieu Fortier

Mathieu Fortier recently joined Coveo as Director of NLP R&D. He was previously at ElementAI/ServiceNow, Keatext and PatSnap. He has over 8 years of experience in applied research and operationalizing NLP in the industry, with a strong focus on making an impact in products and for end users.

2021

Meetup novembre 2021

Lingfei Wu, Principal Scientist at JD.COM Silicon Valley Research Center -

‘’Deep Learning On Graphs for Natural Language Processing’’

Bang Liu, Assistant Professor, RALI & Mila, Université de Montréal -

‘’Graph Representation and Learning for Hot Events Tracking and User Interests Modeling’’

Yu (Hugo) Chen, Research Scientist at Facebook AI -

‘’Graph4NLP Library and Demo’’

Présentation

Agenda

10:00 AM Welcome - Fabrizio Gotti, NLP programmer, RALI & IVADO

10:05 AM Talk: Lingfei Wu, Principal Scientist at JD.COM Silicon Valley Research Center -

‘’Deep Learning On Graphs for Natural Language Processing’’

Abstract: Due to its great power in modeling non-Euclidean data like graphs or manifolds, deep learning on graph techniques (i.e., Graph Neural Networks (GNNs)) have opened a new door to solving challenging graph-related NLP problems. There has been a surge of interest in applying deep learning on graph techniques to NLP, and has achieved considerable success in many NLP tasks, ranging from classification tasks like sentence classification, semantic role labeling and relation extraction, to generation tasks like machine translation, question generation and summarization. Despite these successes, deep learning on graphs for NLP still face many challenges, including automatically transforming original text sequence data into highly graph-structured data, and effectively modeling complex data that involves mapping between graph-based inputs and other highly structured output data such as sequences, trees, and graph data with multi-types in both nodes and edges. In this talk, I will talk about relevant and interesting topics on applying deep learning on graph techniques to NLP, ranging from the foundations to the applications.

10:50 AM Question period

11:00 AM Talk : Bang Liu, Assistant Professor, RALI & Mila, Université de Montréal -

‘’Graph Representation and Learning for Hot Events Tracking and User Interests Modeling’’

Abstract: In this talk, I will share with the audience my research experiences on a range of NLP tasks for hot events mining and user interests modeling. I will demonstrate that the graph is a natural way to capture the connections between different text objects, such as words, entities, sentences, and documents. By combining graph-structured representations of text objects at various granularities with Graph Neural Networks (GNNs), significant benefits can be brought to various NLP tasks, such as fine-grained document clustering, document matching, heterogeneous phrase mining and so on. Finally, I will share my experience in deploying our algorithms in industry applications for hot event discovery, query and document understanding, as well as news feed recommendation.

11:25 AM Question period

11:35 AM Talk: Yu (Hugo) Chen, Research Scientist at Facebook AI -

‘’Graph4NLP Library and Demo’’

Abstract: In this section, we will introduce Graph4NLP, the first open source library for researchers and practitioners for easy use of GNNs for various NLP tasks. Graph4NLP provides both full implementations of state-of-the-art GNN models and also flexible interfaces to build customized models with whole-pipeline support. In addition, we will have a hands-on demo session to help the audience gain practical experience on applying GNNs to solve challenging NLP problems using the Graph4NLP library.

11:50 AM Question period

12:00 PM Closing remarks - Fabrizio Gotti

Lingfei Wu

Dr. Lingfei Wu is currently a Principal Scientist at JD.COM Silicon Valley Research Center, leading a team of 30+ machine learning/natural language processing/recommendation system scientists and software engineers to build next generation intelligent ecommerce systems for personalized and interactive online shopping experience in JD.COM. Previously, ​he​ was a research staff member at IBM Research and led a research team (10+ RSMs) for developing novel Graph Neural Networks for various AI tasks, which leads to the #1 AI Challenge Project in IBM Research and multiple IBM Awards including Outstanding Technical Achievement Award. He has published more than 90 top-ranked conference and journal papers, and is a co-inventor of more than 40 filed US patents. Because of the high commercial value of his patents, he has received several invention achievement awards and has been appointed as IBM Master Inventors, class of 2020. He was the recipient of the Best Paper Award and Best Student Paper Award of several conferences such as IEEE ICC’19, AAAI workshop on DLGMA’20 and KDD workshop on DLG'19. His research has been featured in numerous media outlets, including NatureNews, YahooNews, Venturebeat, and TechTalks. He has co-organized 10+ conferences (KDD, AAAI, IEEE BigData) and is the founding co-chair for Workshops of Deep Learning on Graphs (with KDD'21, AAAI’21, AAAI’20, KDD’20, KDD’19, and IEEE BigData’19). He has currently served as Associate Editor for IEEE Transactions on Neural Networks and Learning Systems, ACM Transactions on Knowledge Discovery from Data and International Journal of Intelligent Systems, and regularly served as an AC/SPC of the following major AI/ML/NLP conferences including KDD, WSDM, IJCAI, and AAAI.

Bang Liu

Bang Liu is an Assistant Professor in the Department of Computer Science and Operations Research (DIRO) at the University of Montreal and Mila – Quebec Artificial Intelligence Institute. His research interests primarily lie in the areas of natural language processing (NLP), data mining, and deep learning. He has produced visible values to both academia and industry. His innovations have been deployed in real-world applications (e.g., QQ Browser, Mobile QQ, and WeChat), serving over a billion daily active users. He has published 25+ papers on top conferences and journals, such as SIGMOD, ACL, NAACL, KDD, WWW, NeurIPS, ICDM, CIKM, TKDD, TWEB, INFOCOM, and TON.

Yu (Hugo) Chen

Yu (Hugo) Chen is a Research Scientist at Facebook AI. He got his PhD degree in Computer Science from Rensselaer Polytechnic Institute. His research interests lie at the intersection of Machine Learning (Deep Learning) and Natural Language Processing, with a particular emphasis on the fast-growing field of Graph Neural Networks and their applications in various domains. His work has been published at top-ranked conferences including but not limited to NeurIPS, ICML, ICLR, AAAI, IJCAI, NAACL, KDD, WSDM, ISWC, and AMIA. He was the recipient of the Best Student Paper Award of AAAI DLGMA’20. He delivered a series of DLG4NLP tutorials at NAACL'21, SIGIR'21, KDD'21 and IJCAI'21. He is a co-inventor of 4 filed US patents.

Détails

Inscription

COURTS ATELIERS DE FORMATION

Durée: 6 heures en ligne + 2 heures de travaux

Niveau: Débutant à Intermédiaire

Langue: Anglais

Vendredi 22 et 29 Octobre 2021

9:00 AM to 12:00 PM ( HAE, UTC-4)

FORMATION CERTIFIÉE PAR SCALE AI

50% de rabais sur l'Admission générale pour tous les professionnel.le.s travaillant au Canada

Code promo: IVA_Scale_NLP1021_2

CETTE FORMATION SERA DONNÉE EN ANGLAIS

DESCRIPTION de la série d'ateliers en TALN (4 courts ateliers)

Pour tous les cours, nous nous concentrerons sur les aspects critiques de l'exploration de solutions TALN efficaces dans un cadre industriel, à savoir :

  • Utiliser et démontrer des frameworks TALN polyvalents et commerciaux (par exemple, un des frameworks nltk, Stanford TALN, Allen TALN etc.)

  • Fournir deux ou trois approches pour aborder le problème en question, qui diffèrent en termes de complexité, de performance, de temps de formation et d'inférence, afin d'offrir des alternatives au praticien, qui doit faire face à des contraintes de mise en œuvre.

  • Des métriques pertinentes utilisées pour guider l'effort scientifique.

  • Développement d'applications commerciales à l'aide de la TALN.

  • Comment diagnostiquer les problèmes et comment les résoudre tout en développant une solution TALN.

  • Les jeux de données utiles et les approches semi-supervisées possibles lorsque les données font défaut.

DESCRIPTION de l'atelier actuel, le 2ième de la série de 4

" Question-réponse: du document à l'extrait"

Ce cours offre une vue d'ensemble sur les Questions-réponses: du document à l'extrait, en mettant l'accent sur la présentation de certaines des recherches de pointe dans le domaine de TALN. Nous commencerons par présenter la tâche, les défis qui y sont associés ainsi que les ensembles de données utiles. Ensuite nous expliquerons l'évolution des systèmes d'AQ, des pipelines hautement sophistiqués jusqu' à la formation moderne de réseaux neuronaux profonds.


CONTENU

  • Réviser l'histoire de l'AQ dans un domaine ouvert (textuel) : Premiers systèmes d'AQ, concours d'AQ TREC, projet DeepQA d'IBM.

  • Approches de récupération et de lecture en deux étapes, y compris l'entraînement multi-passage, le reclassement des passages et le débruitage des données supervisées à distance.

  • Récupérateur dense et formation de bout en bout de réseaux neuronaux profonds, y compris la récupération de passages denses, la formation conjointe du récupérateur et du lecteur, l'indexation de phrases denses et éparses.

  • Approches sans récupérateur

  • AQ dans un domaine ouvert utilisant les KB et le texte

FORMATEUR

Siva Reddy, Ph.D.

Siva Reddy est professeur adjoint à la School of Computer Science and Linguistics de l'Université McGill. Il est titulaire d'une chaire d'IA de l'ICAR Facebook et membre du corps professoral de l'Institut d'IA du Mila au Québec. Avant McGill, il a été chercheur postdoctoral au département d'informatique de l'Université de Stanford. Il a obtenu son doctorat à l'Université d'Édimbourg en 2017, où il était boursier de Google. Ses recherches portent sur le traitement du langage naturel et la linguistique computationnelle. Il a reçu le prix VentureBeat AI Innovation Award 2020 dans le domaine du traitement du langage naturel.

PROGRAME

Horaire

  • Vendredi 22 octobre, 9:00 AM à 12:00 PM

  • Vendredi, 29 octobre 9:00 AM à 12:00 PM :

FRAIS D'INSCRIPTIONS*

  • Admission générale 500$ + tx

  • Universitaires 400$ + tx**

  • Étudiant.e.s/postdocs 100$ + tx**

*Les prix sont en dollars canadiens, et les taxes locales (TPS, TVQ) ne sont applicables que pour les participants canadiens.

**Les prix réduits ne sont applicables qu'aux citoyens canadiens et/ou aux personnes travaillant au Canada sur présentation d'une carte d'identité universitaire. envoyée à formations@ivado.ca. Pour les autres participants, l'admission générale s'applique.


Les inscriptions se terminent le 19 octobre 2021 à 9h00.

Meetup octobre 2021

Yutao Zhu, PhD student, Université de Montréal -

‘’Open-Domain Knowledge-Grounded Human-Machine Conversation’’


Guillaume Le Berre, PhD Student, Université de Lorraine & Université de Montréal -

‘’Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning’’

Florian Carichon, PhD Student, HEC Montréal -

‘’Abstractive Unsupervised Summarization of Temporal Events’’

Agenda

9:00 AM Welcome - Fabrizio Gotti, NLP programmer, RALI & IVADO

9:05 AM Talk: Yutao Zhu, PhD student, Université de Montréal -

‘’Open-Domain Knowledge-Grounded Human-Machine Conversation’’

Abstract: Developing an intelligent open-domain conversation system is a challenging task in natural language processing. Although the existing methods are capable of providing fluent responses based on conversation history, they are unable to dive deeply into a specific topic. One primary reason is the lack of proper knowledge in the generation of responses. Knowledge-grounded conversation aims at generating responses based on different forms of knowledge. We have explored several problems involved in knowledge-grounded conversation: knowledge selection, knowledge-based proactive conversation, and robust model training. In this presentation, I will describe the results already obtained, as well as the future work.

9:30 AM Question period

9:40 AM Talk : Guillaume Le Berre, PhD Student, Université de Lorraine & Université de Montréal -

‘’Unsupervised multiple-choice question generation for out-of-domain Q&A fine-tuning’’

Abstract: Pre-trained models have shown very good performances on a number of question answering benchmarks especially when fine-tuned on multiple question answering datasets at once. In this work, we generate a fine-tuning dataset model thanks to a rule-based algorithm that generates questions and answers from unannotated sentences. We show that the state-of-the-art model UnifiedQA can greatly benefit from such a system on a multiple-choice benchmark about physics, biology and chemistry it has never been trained on. We further show that improved performances may be obtained by selecting the most challenging distractors (wrong answers), with a dedicated ranker based on a pretrained RoBERTa model.

10:05 AM Question period

10:15 AM Talk: Florian Carichon, PhD Student, HEC Montréal -

‘’Abstractive Unsupervised Summarization of Temporal Events’’

Abstract: In recent years, the quantity of available streaming data has continued to increase. This data includes social networks such as Twitter, news and blog feeds, or specific data such as customer interactions in call centers. The task of update summarization consists in creating summaries that are updated according to the continuous input of new information. The main objective of these systems is to push new, unseen, or non redundant information to the user. Moreover, for many industries, access to labelled data that can be directly applied to their specific field is very scarce. Thus, in this research we propose a new method based on deep learning techniques in order to provide an unsupervised abstractive document summarization system that deals with the challenge of update summarization.

10:40 PM Question period

10:50 Closing remarks - Fabrizio Gotti


Yutao Zhu

Yutao Zhu is a 3rd year Ph.D. candidate at DIRO in University of Montreal under the supervision of Prof. Jian-Yun Nie. He received his B.S. and M.S. degree in computer science and technology from Renmin University of China. His current research interests are in dialogue systems and information retrieval. He has published several papers on top conferences and journals. He also served as a PC member on some conferences such as AAAI and WSDM.


Guillaume Le Berre

Guillaume Le Berre is a 4th year PhD student co-supervised by Philippe Langlais (University of Montréal) and Christophe Cerisara (University of Lorraine - France). He graduated from Mines de Nancy engineering school in 2018 with an engineering degree and a major in computer science. He is working on new ways to train deep learning models on tasks where the quantity of data is limited and to mitigate the effect of biases that can exist in these datasets. He applies his research to Question Answering and Optical Character Recognition.


Florian Carichon

Florian Carichon graduated as an industrial engineer in 2013. He then worked until 2018 as a manager of a development team specialized in intellectual property analysis. In 2018, he joined the master's program in Business Intelligence at HEC Montreal, and started his PhD in natural language processing in 2019 in partnership with IVADO, and La Fédération des Caisses Desjardins du Québec. His research specialty concerns unsupervised methods for automatic document summarization.

Vendredi 8 octobre 2021 9:00am - 12:00pm

Vendredi 15 octobre 2021 9:00am - 12:00pm

Détails

Inscriptions

COURTS ATELIERS DE FORMATION

Durée: 6 heures en ligne + 2 heures de travaux

Niveau: Débutant à Intermédiaire

Langue: Anglais

Vendredi 8 et 15 Octobre 2021

9:00 AM to 12:00 PM ( HAE, UTC-4)


FORMATION CERTIFIÉE PAR SCALE AI

50% de rabais sur l'Admission générale pour tous les professionnel.le.s travaillant au Canada.

Code promo: IVA_Scale_NLP1021

CETTE FORMATION SERA DONNÉE EN ANGLAIS

DESCRIPTION de la série d'ateliers en TALN (4 courts ateliers)

Pour tous les cours, nous nous concentrerons sur les aspects critiques de l'exploration de solutions TALN efficaces dans un cadre industriel, à savoir :

  • Utiliser et démontrer des frameworks TALN polyvalents et commerciaux (par exemple, un des frameworks nltk, Stanford TALN, Allen TALN etc.)

  • Fournir deux ou trois approches pour aborder le problème en question, qui diffèrent en termes de complexité, de performance, de temps de formation et d'inférence, afin d'offrir des alternatives au praticien, qui doit faire face à des contraintes de mise en œuvre.

  • Des métriques pertinentes utilisées pour guider l'effort scientifique.

  • Développement d'applications commerciales à l'aide de la TALN.

  • Comment diagnostiquer les problèmes et comment les résoudre tout en développant une solution TALN.

  • Les jeux de données utiles et les approches semi-supervisées possibles lorsque les données font défaut.

DESCRIPTION de l'atelier actuel, le 1er de la série de 4

" Data mining in product reviews"

  • Application de techniques d'exploration de texte à l'analyse d'avis textuels sur des produits de taille moyenne à importante.

  • L'objectif est de montrer comment une organisation peut mieux comprendre ses clients et son paysage marketing, et améliorer ses produits et ses marques.

  • L'atelier se concentrera sur des sessions de résolution de problèmes ainsi que sur les théories de l'exploration de textes : modélisation de sujets, analyse de sentiments et applications telles que la carte des marques et des produits.

OBJECTIFS D'APPRENTISSAGE

  • Analyse des sentiments pour les évaluations de produits (données textuelles)

  • Identification des dimensions du produit ou de la marque à l'aide de la modélisation thématique

  • Suivi de l'évolution des dimensions et des préférences en matière de produits et de marques au fil du temps

  • Comparaison des produits des concurrents

  • Outils et techniques de visualisation


FORMATEUR

Hyunhwan "Aiden" Lee, M.S., Ph.D.

Hyunhwan "Aiden" Lee est professeur adjoint en marketing au HEC Montréal et membre chercheur au Tech3Lab. Ses recherches portent sur la gestion moderne des marques et le marketing de contenu à l'aide de l'apprentissage automatique, du traitement du langage naturel, du traitement vidéo, de l'analyse géo-spatiale, de l'analyse des big data et des modèles stochastiques. Plus précisément, professeur Lee étudie comment mesurer les associations de marques à travers des emplacements géographiques, mesurer la satisfaction des clients par l'exploration de texte des commentaires verbaux des clients, et extraire le contenu multimédia des publicités en utilisant l'apprentissage profond pour déterminer ce qui rend les publicités efficaces.


PROGRAME

Horaire

  • Vendredi 8 octobre, 9:00 AM à 12:00 PM : Concept de base de TALN

  • Vendredi, 15 octobre 9:00 AM à 12:00 PM : Application des TALN

FRAIS D'INSCRIPTIONS*

  • Admission générale 500$ + tx

  • Universitaires 400$ + tx**

  • Étudiant.e.s/postdocs 100$ + tx**

*Les prix sont en dollars canadiens, et les taxes locales (TPS, TVQ) ne sont applicables que pour les participants canadiens.

**Les prix réduits ne sont applicables qu'aux citoyens canadiens et/ou aux personnes travaillant au Canada sur présentation d'une carte d'identité universitaire. envoyée à formations@ivado.ca. Pour les autres participants, l'admission générale s'applique.


Les inscriptions se terminent le 4 octobre 2021

Meetup septembre 2021

Anne-Marie Di Sciullo, Professeur, Département de linguistique et de didactique des langues, Université du Québec à Montréal -

‘’Internal language and External language, and why both matter for NLP’’


Francis Charette-Migneaut, Software Developer, Vision and Imagery & Lise Rebout, NLP advisor, Computer research Institute of Montréal (CRIM) -

FrVD: a new dataset of video clip references, video descriptions and actions in French for deep learning


Tanmay Gupta, Research Scientist, Allen Institute for Artificial Intelligence -

‘’Towards General Purpose Vision Systems’’

Agenda

10:00 AM Welcome - Hessam Amini

10:05 AM Talk: Anne-Marie Di Sciullo, Professeur, Département de linguistique et de didactique des langues, Université du Québec à Montréal -

‘’Internal language and External language, and why both matter for NLP’’

Abstract: It is common practice in NLP to analyze linguistic expressions as sequences, be they sequences of sounds/written characters/tokens, etc. However, results from language and neurosciences studies indicate that the externalized language, accessible by the sensorimotor system, is the result of dedicated computation of the brain for language that goes beyond sequences. I argue that both internal language and external language matter for NLP. I discuss major approaches to NLP in computer sciences (deep (machine) learning) and in language sciences (generative grammar). I suggest that the integration of core concepts in these fields may lead to further understanding of language computation in the brain and its processing by computers. I focus on the operations generating the articulate structure of linguistic expressions and the mapping of the latter to the semantic and the sensorimotor representations. I show how the effects of these operations can be recovered by syntactic parsing, including the generation of unpronounced but interpreted constituents along with semantic composition and scope of operators. I draw consequences for the optimization of so-called NLP tasks and machine intelligence.

10:30 AM Question period

10:40 AM Talk : Francis Charette-Migneaut, Software Developer, Vision and Imagery & Lise Rebout, NLP advisor, Computer research Institute of Montréal (CRIM) -

FrVD: a new dataset of video clip references, video descriptions and actions in French for deep learning

Abstract: We will present our work on the creation of the FrVD dataset consisting of annotated French video descriptions (VD) and built to aid research in automated VD production. VD provides an audio description of visual content to make it more accessible to visually impaired audiences. It is usually produced manually. Since 2015, deep neural network models have offered very encouraging results to assist in the automated production of video description. However, there are very few annotated resources in French to train such models. The objective of this project is to take advantage of the French VDs produced by CRIM since 2008, for which we have already identified the scenes and characters, and to enrich it thanks to the automated detection of actions in order to compile a new dataset: FrVD. We will present the methods that allowed us, from manual (textual) and automated (textual and visual) annotations, to produce this corpus that can be used to train automated VD models in future work. This project was made possible thanks to the support of the Broadcasting Accessibility Fund (BAF).

11:05 AM Question period

11:15 AM Talk: Tanmay Gupta, Research Scientist, Allen Institute for Artificial Intelligence -

‘’Towards General Purpose Vision Systems’’

Abstract: Today, all vision models are N-purpose models. They are designed and trained specifically for a pre-selected set of tasks - N=1 for single-purpose models and N > 1 but still pre-selected and small for multi-purpose models. In contrast, general purpose models are designed and trained with the expectation to encounter and adapt to new tasks within a broad domain that are unknown at design time. In this work, we first establish 3 tenets of general purpose systems to serve as guiding principles for building and evaluating GPVs. These tenets demand generality of architecture, generality of concepts across skills, and generality of learning. Next, as a step towards GPVs, we build a task-agnostic vision-language architecture named GPV-1, that can be trained to take image and task descriptions as inputs and produce boxes with relevance scores and text as outputs. We demonstrate that GPV-1 can indeed be jointly trained on a diverse set of tasks such as localization, classification, visual question answering, and captioning. To evaluate GPV-1's generalization to novel skill-concept combinations, we create the COCO-SCE benchmark. Finally, we test GPV-1's sample efficiency and extent of forgetting while learning a new task.

11:40 PM Question period

11:50 Closing remarks - Hessam Amini

Anne-Marie Di Sciullo

Dr. Anna Maria Di Sciullo is Fellow of the Royal Society, associate professor at the University of Quebec in Montreal and visiting professor at SUNY. Her research areas are Theoretical Linguistics, Computational Linguistics and Biolinguistics. She has received several awards including major funding from SSRC and FRQ for two Major Collaborative Research Initiatives on Asymmetries in natural language and processing by the external systems. Her publications include two MIT Press monographs on morphological asymmetries. Her work in computational linguistics led to the formulation of the Asymmetry Recovering Parser. She developed a search engine sensitive to asymmetric relations, as well as a semantic mining system based on syntactic asymmetries and semantic composition. The former has been used by the Government of Quebec and the later by American banks and Hedge Funds. More recently, she co-founded Generative Mind that derives a plurality of actionable content indicators from social media chatter.

Lise Rebout

Lise Rebout holds a Master's degree in NLP from the University of Montreal. After several years in the industry, she joined the CRIM team in 2017. She assists CRIM's clients in the adoption of SOTA technologies in NLP and manages applied research projects in AI.

Francis Charette-Migneault

Francis Charette-Migneault is a research software developer in imagery, vision, machine learning and DevOps. With a master's degree in automated production engineering, he has gained experience on both fronts of applied research development and servicing of leading edge technologies through operational pipelines and distributed server processing.

Tanmay Gupta

Tanmay Gupta is a research scientist in the PRIOR team at the Allen Institute for Artificial Intelligence (AI2). Tanmay received his PhD for his thesis on "Representations from Vision and Language" from UIUC where he was advised by Prof. Derek Hoiem and closely collaborated with Prof. Alex Schwing. Some of Tanmay's recent work involves general-purpose vision, evaluation of deep networks using learning curves, and situation understanding in videos. For more details on Tanmay's background and research, please visit http://tanmaygupta.info/

Meetup juin 2021

Eloi Brassard-Gourdeau, Data Scientist, Two Hat Security -

"Using Sentiment Information for Preemptive Detection of Harmful Comments in Online Conversations"


Hannah Devinney, PhD student, Umeå University -

"Gender Bias in NLP: A short introduction"


Carolyne Pelletier, Co-Founder & CTO, skritswap -

"Gender Bias in Natural Language Processing"

Agenda

10:00 AM Welcome - Hessam Amini

10:05 AM Talk: Eloi Brassard-Gourdeau, Data Scientist, Two Hat Security -

‘’Using Sentiment Information for Preemptive Detection of Harmful Comments in Online Conversations’’

Abstract: The challenge of automatic moderation of harmful comments online has been the subject of a lot of research recently, but the focus has been mostly on detecting it in individual messages after they have been posted. Some authors have tried to predict if a conversation will derail into harmfulness using the features of the first few messages. In our work, we combined that approach with previous work we did on harmful message detection using sentiment information, and show how the sentiments expressed in the first messages of a conversation can help predict upcoming harmful messages. Our results show that adding sentiment features does help improve the accuracy of harmful message prediction, and allow us to make important observations on the general task of preemptive harmfulness detection.

10:30 AM Question period

10:40 AM Talk: Hannah Devinney, PhD student, Umeå University -

‘’Gender Bias in NLP: A short introduction’’

Abstract: This talk presents an introduction to the subject of gender bias in Natural Language Processing. We will explore some common definitions and measurements of biases, how they manifest in NLP, and ways we may be able to mitigate the harms they cause.

11:05 PM Question period

11:15 AM Talk : Carolyne Pelletier, Co-Founder & CTO, skritswap -

‘’Gender Bias in Natural Language Processing’’

Abstract: Language understanding systems pre-trained on a very large corpus of unlabelled data then fine-tuned for a specific task are often the state-of-the-art models for language understanding and representation. They have so much representational capacity, that any bias or unfairness represented in the pre-training phase may be propagated to downstream tasks. This presentation will first define what gender bias is in the context of written text vs. NLP. It will then cover ways to understand, measure, and mitigate gender bias in language understanding models like BERT.

11:40 AM Question period

11:50 Closing remarks - Hessam Amini

Eloi Brassard-Gourdeau

Éloi Brassard-Gourdeau is a data scientist at Two Hat Security since 2019, where he works to use AI to detect harmful content in the biggest online communities. He received his Bachelor's Degree in Mathematics and Computer Science in 2017 and his Master's Degree in Computer Science, with a focus on natural language processing for harm detection, in 2019, both at Universté Laval. He has 4 years of experience on automatic harm detection both from an academic and business standpoint, and has 2 published papers on the impact of sentiment information in harmful content detection.

Hannah Devinney

Hannah Devinney is a PhD candidate at Umeå University, in the Department of Computing Science and Umeå Centre for Gender Studies. They hold a bachelors degree in Computer Science, Linguistics, and French from Trinity College Dublin, and an MSc in Speech and Language Processing from the University of Edinburgh. Their research focuses on understanding the power structures and biases at play in the types of language data used to train Natural Language Processing tools, and developing methods to identify and mitigate these patterns in order to reduce algorithmic harms.

Carolyne Pelletier

Carolyne is the Co-Founder & CTO of skritswap - an AI startup that simplifies the jargon in contracts. She has a Master's in machine learning from Mila – the Quebec AI Institute. At Mila, she co-founded BiaslyAI which is a group of researchers in professor Yoshua Bengio’s AI for Humanity group committed to detecting and correcting biases in AI, such as gender and racial bias. She received a Microsoft Research & Mila research grant to investigate ways in which gender bias can be mitigated when deploying natural language processing (NLP) systems. Previous to this, she did software development with companies such as McKinsey and Klipfolio, and was on the leadership team of a start-up in the intellectual property industry that got acquired in 2015.

Meetup mai 2021

Pan Liu, PhD student, HEC Montréal -

"Applying NLP for Fraud Detection in Derivatives Market"


Dominique Boucher, Architecte de solutions principal, AI Factory, Banque Nationale du Canada -

"Building an AI Assistant Factory"


Elham Kheradmand, Postdoctoral fellow, Université de Montréal -

"Development of machine learning methods to improve ESG scores and responsible investment decisions"

Agenda

10:00 AM Welcome - Hessam Amini

10:05 AM Talk: Pan Liu, PhD student, HEC Montréal -

‘’Applying NLP for Fraud Detection in Derivatives Market’’

Abstract: With increasing activities in the financial derivatives market, exchange regulators are seeking to build smarter market surveillance systems that detect potential frauds. Current systems are often based on rules capturing known suspicious patterns reflected in structured market and trading data, while not having the ability to process information conveyed by unstructured textual data such as business news, which, however, can have crucial real-time impact to the market. Thus, it is of great interest to leverage NLP to extract analytics from textual data, so that the surveillance system could assess trading behaviors more accurately due to the added awareness of the business context.

In this talk, I will introduce a work in progress that develops a financial events analysis NLP pipeline that extracts important events from financial news then analyzes their characteristics, and a framework to leverage those analytics to help detect financial frauds, especially illegal insider trading, in the derivatives market.

10:30 AM Question period

10:40 AM Talk : Dominique Boucher, Architecte de solutions principal, AI Factory, Banque Nationale du Canada -

‘’Building an AI Assistant Factory’’

Abstract: In this talk, we will present our approach to building a platform that supports the fast delivery of chatbots and AI assistants without compromising on performance and customer experience. We will discuss the main components of the solution and the inherent challenges, from both the architecture and AI science sides.

11:05 AM Question period

11:15 AM Talk: Elham Kheradmand, Postdoctoral fellow, Université de Montréal -

‘’Development of machine learning methods to improve ESG scores and responsible investment decisions’’

Abstract: The Principles for Responsible Investment (PRI) provide a framework for improving the analysis of environmental, social and governance issues (ESG) in the investment process and help companies exercise responsible practices in managing their investments. However, there is not yet a methodological standard for measuring ESG performance. The information necessary to understand the ESG impact of a company is in an unstructured relational format and artificial intelligence can be used in these efforts. We propose to develop a rating framework to automatically extract relevant ESG information from companies’ disclosures using Natural Language Processing (NLP) methods. In this work, we employ NLP approaches such as cosine similarity, sentiment analysis, and named entity recognition. Our framework consists of five components, downloader, reader, cleaner, extractor, and analyzer. We first automatically download the disclosures which are publicly available. We then develop the framework further to read these disclosures. Following this, we clean and pre-process the text data. The next stage is extracting useful information to be deployed in the analyzer component. Finally, we will design and develop novel algorithms to classify all qualitative and unstructured text data and transfer them to quantitative measurements and structured data.

11:40 PM Question period

11:50 Closing remarks - Hessam Amini

Pan Liu

Pan is currently a PhD candidate in Data Science at HEC Montreal. He works on projects that develop NLP and ML models to solve problems in finance. Before starting his PhD in 2018, he had received a double degree of M.Sc. and Titre d'Ingénieur from Beihang University.

Dominique Boucher

Dominique Boucher is Chief Solutions Architect in the AI Factory at National Bank of Canada where he is responsible for the development of NBC's dialogue system platform from the technical side. His main interests revolve around the application of AI to complex business problems, and in the use of conversational interfaces in particular. Prior to that, he was the CTO of Nu Echo where he led the Omnichannel Innovations Lab. He has been in the speech recognition and conversational AI industry for more than 20 years. He holds a PhD from the University of Montreal.

Elham Kheradmand

Elham is a Postdoctoral researcher at University of Montreal who is working on development of machine learning methods in sustainable finance. She is also a research and development scientist at Axionable. At AI Launch Lab, she is an AI project manager and support students to successfully accomplish AI-based projects. She has specialties in algorithm development, machine learning and optimization. She got her PhD in Mathematics from Polytechnique Montreal last year. Before her PhD she was the head of banking software development team in a bank in Iran. She came in Montreal to start a new journey by continuing her education. Her interest is mainly applying artificial intelligence in finance.

Meetup avril 2021

Suyuchen Wang, PhD student, Université de Montréal -

"Fully Exploit Hierarchical Structure for Self-Supervised Taxonomy Expansion"


Tomas Martin, PhD student, UQÀM -

"Leveraging a domain ontology in (neural) learning from heterogeneous data"


Torsten Scholak, Applied Research Scientist, Element AI / ServiceNow -

"Text-to-SQL translation with DuoRAT"

Agenda

10:00 AM Welcome - Hessam Amini

10:05 AM Talk: Suyuchen Wang, PhD student, Université de Montréal -

‘’Fully Exploit Hierarchical Structure for Self-Supervised Taxonomy Expansion’’

Abstract: Taxonomy is a hierarchically structured knowledge graph that plays a crucial role in machine intelligence. The taxonomy expansion task aims to find a position for a new term in an existing taxonomy to capture the emerging knowledge in the world and keep the taxonomy dynamically updated. Previous taxonomy expansion solutions neglect valuable information brought by the hierarchical structure and evaluate the correctness of merely an added edge, downgrading the problem to node-pair scoring or mini-path classification. In our recent paper, we proposed the Hierarchy Expansion Framework (HEF), which can fully exploit multiple aspects of the hierarchical structure’s properties to maximize the coherence of expanded taxonomy, and achieved state-of-the-art on three benchmark datasets. The work will be presented in The Web Conference (WWW) 2021. In this talk, I will introduce the previous works of taxonomy expansion, the motivation, design, experimental results, potential future works of our model, and the most recent progress of a more advanced taxonomy expansion task.

10:30 AM Question period

10:40 AM Talk : Tomas Martin, PhD student, UQÀM -

‘’Leveraging a domain ontology in (neural) learning from heterogeneous data’’

Abstract: Injecting domain knowledge into a neural learner in order to alleviate the problem of insufficient data or to improve explainability of the neural model is an actively researched subject. The main challenge is rooted in the « impedance mismatch » between the symbolic nature of the dominant knowledge representation paradigms (knowledge graphs, ontologies), on one hand, and the sub symbolic level of expression of the neural learning architectures. Current approaches focus primarily on knowledge graph embeddings while minor trends would look at sensible ways to embed ontology entities (classes, properties) in a way to impact the embedding of their respective instances.

We propose an alternative approach whose keystone is the mixture between descriptive knowledge from a domain ontology with trends from the data set in hand. The approach amounts to finding common structural patterns in data graphs, described using the ontological vocabulary, and use them as additional descriptors in data embeddings. Main technical challenge to face here is the efficient mining of ontological graph patterns as current graph mining methods are not up to the task. The other intriguing issue is how these patterns should be input into a neural net-based learner. We discuss two alternative approaches to that problem and show our current state of progress.

11:05 AM Question period

11:15 AM Talk: Torsten Scholak, Applied Research Scientist, Element AI / ServiceNow -

‘’Text-to-SQL translation with DuoRAT’’

Abstract: Language user interfaces to SQL databases allow non-specialists to retrieve and process information that might otherwise not be easily available to them. Recent research has resulted in neural models that can generalize to new SQL databases without any human intervention. Given a database schema and content, these models can translate a user's question directly into an SQL query that can then be executed on the database to produce the answer. In this talk, I will demonstrate the capabilities of DuoRAT, a state-of-the-art text-to-SQL model based on a encoder-decoder transformer architecture.

11:40 PM Question period

11:50 Closing remarks - Hessam Amini


Suyuchen Wang

Suyuchen Wang is a first-year Ph.D. student in natural language processing at RALI lab, Université de Montréal, advised by Bang Liu. His current research focuses on expanding hierarchically structured knowledge graphs like taxonomies and compositional generalization of NLP models. Previously, he graduated with a BEng from Beihang University.


Tomas Martin

Tomas Martin received the M.Sc. degree in Computer Science at UQAM in 2016, where he is currently pursuing a Ph.D in Computer Science. Since 2019, he has been working on a project aiming to improve canadian dairy production in collaboration with UQAM's bioinformatics lab and LactaNet. His main research interests are frequent pattern mining, ontologies and machine learning applications.

Torsten Scholak

Torsten is an applied research scientist at Element AI, the research lab of Service Now. He researches and develops language user interfaces for databases and services. You can find him on twitter @tscholak where he tweets about neural program synthesis and programming in functional languages.

Meetup mars 2021

Malik Altakrori, PhD student at McGill University / Mila -

"Authorship Analysis: identification vs. anonymization"


Jonas Pfeiffer, PhD student, Technical University of Darmstadt -

"Adapters in Transformers. A New Paradigm for Transfer Learning…?"


Jian Yun Nie, Professor, Université de Montréal -

"VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification"

Agenda

10:00 AM Welcome - Hessam Amini

10:05 AM Talk: Malik Altakrori, PhD student at McGill University / Mila -

‘’Authorship Analysis: identification vs. anonymization’’

Abstract: In this talk, we will discuss two interlocked problems: Authorship Identification, and Anonymization. Security and Privacy on the internet are among the most important issues that concern the public. While internet anonymity allows people to share their opinions freely without fear of being harassed or persecuted, it allows criminals to commit cybercrimes without being caught. Researchers have developed techniques to help identify the author of an anonymous text. Such techniques depend on hidden cues in the investigated text which resemble the author's unique writing habits. These writing habits, also known as the author's writing style are used to uncover the author's identity of an investigated, anonymous text. The existence of authorship identification tools raised some valid concerns among the public. These concerns are due to the potential misuse of authorship identification tools. One example is when such tools are used by oppressing regimes to suppress freedom of speech. To protect the authors' identities, several anonymization techniques were developed to hide the authors' writing styles and reduce the risk of revealing their identities.

10:30 AM Question period

10:40 AM Talk : Jonas Pfeiffer, PhD student, Technical University of Darmstadt -

‘Adapters in Transformers. A New Paradigm for Transfer Learning…?’’

Abstract: Adapters have recently been introduced as an alternative transfer learning strategy. Instead of fine-tuning all weights of a pre-trained transformer-based model small neural network components are introduced at every layer. While the pre-trained parameters are frozen only the newly introduced adapter weights are fine-tuned achieving an encapsulation of the down-stream task information in designated parts of the model. In this talk we will provide an introduction to adapter-training in natural language processing. We will go into detail on how the encapsulated knowledge can be leveraged for compositional transfer learning as well as cross-lingual transfer. We will further briefly touch the efficiency of adapters in terms of trainable parameters as well as (wall-clock) training time. Finally we will provide an outlook to recent alternative adapter approaches and training strategies.

11:05 AM Question period

11:15 AM Talk : Jian Yun Nie, Professor, Université de Montréal -

‘VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification’’

Abstract: Much progress has been made recently on text classification with methods based on neural networks. In particular, models using attention mechanisms such as BERT have shown to have the capability of capturing the contextual information within a sentence or document. However, their ability to capture global information about the vocabulary of a language is more limited. This latter is the strength of Graph Convolutional Networks (GCN). In this work, we propose the VGCN-BERT model which combines the capability of BERT with a Vocabulary Graph Convolutional Network (VGCN). Local information and global information interact through different layers of BERT, allowing them to influence mutually and to build together a final representation for classification. In our experiments on several text classification datasets, our approach outperforms BERT and GCN alone, and achieves higher effectiveness than that reported in previous studies.

11:40 AM Question period

11:50 Closing remarks - Hessam Amini

Malik Altakrori

Malik Altakrori is a 4th year PhD student at McGill / Mila. He is co-suprevised by Prof. Benjamin Fung and Prof. Jackie Cheung, and his thesis is on "Evaluating techniques for authorship analysis.


Jonas Pfeiffer

Jonas Pfeiffer is a 3rd year PhD student at the Ubiquitous Knowledge Processing lab at the Technical University of Darmstadt. He is interested in compositional representation learning in multi-task multilingual and multi-modal contexts and in low-resource scenarios. Jonas has received the IBM PhD Research Fellowship award in 2020. He has given invited talks at academia and industry such as NEC Labs, University of Cambridge, University of Colorado - Boulder, and the University of Mannheim.


Jian-Yun Nie

Jian-Yun Nie’s main research area is information retrieval (search engine) and natural language processing. His work covers a wide range of topics, including information retrieval models, cross-language information retrieval, query expansion, suggestion and understanding, sentiment analysis, recommender systems, dialogue systems and question answering. Jian-Yun Nie has published more than 200 papers in journals and conferences in this area. He has served as a general chair and PC chair of SIGIR conferences and senior PC member for NLP conferences such as ACL and COLING.

Meetup février 2021 - NLP Tools

Reza Davari, Deep Learning NLP Team Lead, Coveo -

"Transformer Based Text Similarity Engine Optimized for Industrial Use"


Karine Déry, Conceptrice logiciel, NuEcho -

"Dialogflow CX and Rasa: Two Diametrically Opposed Dialogue Engines / Dialogflow CX et Rasa: Deux engins de dialogue diamétralement opposés"


Nathan Zylbersztejn, Founder, Botfront -

"Botfront, an open source authoring platform for Rasa based conversational systems"


Lise Rebout, NLP Coordinator, CRIM -

"Les outils Open Source pour les projets de recherche appliquée en TALN"

Agenda

10:00 AM Welcome - Hessam Amini

10:02 AM Opening remarks – Guillaume Poirier

10:05 AM Talk: Reza Davari, Deep Learning NLP Team Lead, Coveo -

‘’Transformer Based Text Similarity Engine Optimized for Industrial Use’’

Abstract: Text similarity has a wide range of applications from information retrieval to question answering. Recently, transformed based language models have achieved the state-of-the-art performance in several tasks such as text similarity. In this presentation, we compare three strategies to take advantage of these models in the context of information retrieval. We address the shortcomings and advantages of these techniques in the industry setting and discuss some of the steps we took to address them.

10:25 AM Question period

10:35 AM Talk: Karine Déry, Conceptrice logiciel, NuEcho -

‘’Dialogflow CX and Rasa: Two Diametrically Opposed Dialogue Engines / Dialogflow CX et Rasa: Deux engins de dialogue diamétralement opposés’’

Abstract / Résumé: How do Dialogflow CX, Google’s new conversational engine, and Rasa, a widely used open-source one, compare? We’ll briefly discuss the advantages and disadvantages of both for building vocal and textual conversational applications. /

Comment comparer Dialogflow CX, le nouvel engin conversationnel de Google, et Rasa, son équivalent open-source largement utilisé? Nous aborderons brièvement les avantages et désavantages de chacun pour la création d’applications conversationnelles textuelles et vocales.

10:55 AM Question period

11:05 AM Talk: - Nathan Zylbersztejn, Founder, Botfront -

‘’Botfront, an open source authoring platform for Rasa based conversational systems’’

Abstract: The Botfront team has been working on making Rasa accessible to non-technical team members for two years, and all this work is now open sourced (it was only partially open source before). The purpose of this presentation is to showcase the platform so it can support the NLP community work related to conversational systems.

11:20 AM Question period

11:30 AM Talk: Lise Rebout, NLP Coordinator, CRIM -

‘’Les outils Open Source pour les projets de recherche appliquée en TALN’’

Résumé: Le CRIM, tout au long de ses projets de recherche expérimentale appliquée à l'innovation en industrie, utilise plusieurs outils et frameworks Open Source qui lui permettent d'assurer qualité, reproductibilité et adaptabilité. Après vous avoir expliqué notre méthodologie de travail, nous vous présenterons quelques-uns des outils que nous utilisons dans nos environnements d'expérimentation, par exemple Kedro pour la gestion des pipelines de préparation des données, Tensorboard pour le suivi des entraînements, MLFlow pour le suivi des expérimentations, etc.

11:50 AM Question period

12:00 PM Closing remarks - Hessam Amini

Meetup janvier 2021 - Annotation and knowledge extraction

Marco Rospocher, Associate Professor, University of Verona -

"Joint Posterior Revision of NLP Entity Annotations via Ontological Knowledge"


Koustuv Sinha, PhD candidate at McGill University / Mila -

"Unnatural Language Inference"


Philippe Langlais, Professor, Université de Montréal -

"Information Extraction: What ? Why ? How ? and some challenges"

Agenda

10:00 AM Welcome - Hessam Amini

10:02 AM Opening remarks – Leila Kosseim, Professor, Concordia University

10:05 AM Talk: Marco Rospocher, Associate Professor, University of Verona -

‘’Joint Posterior Revision of NLP Entity Annotations via Ontological Knowledge’’

Abstract: In this work I will overview some recent work on exploiting ontological knowledge for improving the Natural Language Processing (NLP) annotation of entities in textual resources. The work investigates the benefit of performing a joint posterior revision, driven by ontological background knowledge, of the annotations resulting from NLP entity analyses such as named entity recognition and classification (NERC) and entity linking (EL). The revision is performed via a probabilistic model, called JPARK, that given the candidate annotations independently identified by tools for different analyses on the same textual entity mention, reconsiders the best annotation choice performed by the tools in light of the coherence of the candidate annotations with the ontological knowledge. The results of a comprehensive evaluation along various dimensions (e.g., generalisability to different tools and background knowledge resources) empirically confirm the capability of JPARK to improve the quality of the annotations of the given tools, and thus their performances on the tasks they are designed for.

10:30 AM Question period

10:40 AM Talk : Koustuv Sinha, PhD candidate at McGill University / Mila -

‘’Unnatural Language Inference’’

Abstract: Natural Language Understanding has witnessed a watershed moment with the introduction of large pre-trained Transformer networks. These models achieve state-of-the-art on various tasks, notably including Natural Language Inference (NLI). Many studies have shown that the large representation space imbibed by the models encodes some syntactic and semantic information. However, to really ``know syntax'', a model must recognize when its input violates syntactic rules and calculate inferences accordingly. In this work, we find that state-of-the-art NLI models, such as RoBERTa and BART are invariant to, and sometimes even perform better on, examples with randomly reordered words. With iterative search, we are able to construct randomized versions of NLI test sets, which contain permuted hypothesis-premise pairs with the same words as the original, yet are classified with perfect accuracy by large pre-trained models, as well as pre-Transformer state-of-the-art encoders. We find the issue to be language and model invariant, and hence investigate the root cause. To partially alleviate this effect, we propose a simple training methodology. Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.

Link: https://arxiv.org/abs/2101.00010

11:05 AM Question period

11:15 AM Talk: Philippe Langlais, Professor, Université de Montréal -

‘’Information Extraction: What ? Why ? How ? and some challenges’’

Abstract: Extracting useful information from an unstructured text is an old challenge of natural language processing that witnesses a regain of interest. Many things can be extracted from a text, such as the terminology of a domain up to a full ontology. In this presentation, I will concentrate on discussing approaches aimed at extracting relations between entities of interest, in order to populate a Knowledge Graph. I will contrast approaches that are designed to recognize a few specific relations, from those that aim at capturing any relation in the text. I will then discuss a few challenges with the latter approaches, including their evaluation. The presentation will be illustrated by examples of research conducted at RALI.

11:40 AM Question period

11:50 AM Closing remarks - Hessam Amini


Hessam Amini

Hessam is currently a PhD student of computer science at Concordia University, working under the supervision of Prof. Leila Kosseim. He is also a machine learning developer at Coveo. His research and work is focused on the use of machine learning techniques in natural language processing applications.

Leila Kosseim

Leila Kosseim (https://users.encs.concordia.ca/~kosseim/ ) is a Professor in Computer Science & Software Engineering at Concordia University in Montreal. She received her PhD in Computer Science from the University of Montreal in 1995. Her PhD, supervised by Guy Lapalme, focused on Natural Language Generation. Recipient of a NSERC postdoctoral fellowship, she then worked at Druide informatique towards the development of Antidote. In 2001, she co-founded the CLaC (Computational Linguistics at Concordia) laboratory. Her current research interests are in the area of Natural Language Processing in particular Generation and Discourse Analysis.

Marco Rospocher

Marco Rospocher (https://marcorospocher.com/) is Associate Professor of Informatics at the University of Verona, within the Department of Foreign Languages and Literatures. He received his PhD in Information and Communication Technologies from the University of Trento in 2006. His current research interests are in the area of Artificial Intelligence, focusing in particular on Ontologies, formalisms for Knowledge Representation and Reasoning, methodologies and tools for Knowledge Acquisition and Information Extraction, and Digital Humanities. He has been involved in a number of international research projects including APOSDLE (EU-FP6), PESCaDO (EU-FP7), and NewsReader (EU-FP7). He co-authored more than 90 scientific publications in international journals, conferences and workshops.

Koustuv Sinha

Koustuv is a PhD candidate at McGill University / Mila, supervised by Joelle Pineau and William L. Hamilton, and currently interning at Facebook AI Research, Montreal. His primary research interests lie in understanding and advancing systematic generalization capabilities of neural models in discrete domains, such as language and graphs. He organizes the annual Machine Learning Reproducibility Challenge. He also serves as an editor at ReScience, an open journal promoting reproducible science, and serves as Reproducibility co-chair NeurIPS. More details can be found in this webpage: http://cs.mcgill.ca/~ksinha4

Philippe Langlais

Philippe is full professor within the Computer Science and Operational Research Department (DIRO) of University of Montreal. He is a member of RALI, a laboratory dedicated to Natural Language Processing, where he conducts research in the area of machine translation, and information extraction.

2020

Meetup novembre 2020 - Question answering

Svitlana Vakulenko, Post-doctoral researcher at UvA / IRLab (former ILPS) -

"Question Rewriting for Conversational Question Answering"


Jianfeng Gao, Distinguished Scientist at Microsoft Research -

"A review on neural approaches to conversational question answering"


Filipe Mesquita, Research Manager at Diffbot -

"Diffbot Knowledge Graph: A Fully Automated Knowledge Graph of the Web"

Agenda

10:30 AM Welcome - Hessam Amini

10:32 AM Opening remarks – Jian-Yun Nie, Professor, Université de Montréal

10:35 AM Talk: Svitlana Vakulenko, Post-doctoral researcher at UvA / IRLab (former ILPS) -

‘’Question Rewriting for Conversational Question Answering’’

Abstract: Conversational question answering (QA) requires the ability to correctly interpret a question in the context of previous conversation turns. We address the conversational QA task by decomposing it into question rewriting and question answering subtasks. The question rewriting (QR) subtask is specifically designed to reformulate ambiguous questions which depend on the conversational context into unambiguous questions that can be correctly interpreted outside of the conversational context. We show that the same QR model improves the performance on the passage retrieval and answer span extraction subtasks. On the other hand, our evaluation results also uncover sensitivity of both QA models to question formulation. The benefit of QR is that it allows us to pinpoint and group such cases enabling an automated error analysis. We show how to use this methodology to verify whether QA models are really learning the task and better understand the frequent types of error they make.

11:00 AM Question period

11:10 AM Talk : Jianfeng Gao, Distinguished Scientist at Microsoft Research -

‘’A review on neural approaches to conversational question answering’’

Abstract: Conversational question answering (CQA) systems allow users to query a large-scale knowledge base (KB) or a document collection such as Web documents via conversations. The former is known as KB-QA systems, which are often based on a semantic parser. The latter is known as text-QA, whose core component is a machine reading comprehension model. In this talk, we present a review of state of the art neural methods of CQA developed in the last 6 years, and discuss the challenges still being faced.

11:35 AM Question period

11:45 AM Talk: Filipe Mesquita, Research Manager at Diffbot -

‘’Diffbot Knowledge Graph: A Fully Automated Knowledge Graph of the Web’’

Abstract: Knowledge graphs are valuable resources for developing intelligence applications, including search, question answering, and recommendation systems. However, high-quality knowledge graphs still mostly rely on human curation. In this talk, I will present the Diffbot Knowledge Graph, a fully automated knowledge graph of the web containing over 10B entities.

12:10 PM Question period

12:20 Closing remarks - Hessam Amini

Hessam Amini

Hessam is currently a PhD student of computer science at Concordia University, working under the supervision of Prof. Leila Kosseim. He is also a machine learning developer at Coveo. His research and work is focused on the use of machine learning techniques in natural language processing applications.

Jian-Yun Nie

Jian-Yun Nie’s main research area is information retrieval (search engine) and natural language processing. His work covers a wide range of topics, including information retrieval models, cross-language information retrieval, query expansion, suggestion and understanding, sentiment analysis, recommender systems, dialogue systems and question answering. Jian-Yun Nie has published more than 200 papers in journals and conferences in this area. He has served as a general chair and PC chair of SIGIR conferences and senior PC member for NLP conferences such as ACL and COLING.

Svitlana Vakulenko

Svitlana Vakulenko has joined the Computer Science Department of the University of Amsterdam (UvA) in October 2019 after an internship at Apple Seattle in summer 2019. She obtained her PhD degree from the Vienna University of Technology, Faculty of Informatics. Her main research area is Conversational Search with the main focus on discourse analysis and question answering.

Jianfeng Gao

Jianfeng Gao is a Distinguished Scientist of Microsoft Research, and the Partner Research Manager of the Deep Learning (DL) group at Microsoft Research, AI. He leads the development of AI systems for natural language processing, Web search, vision language understanding, dialogue, and business applications. He is an IEEE fellow, and has received awards at top AI conferences such as SIGIR, ACL, and CVPR. From 2014 to 2017, he was Partner Research Manager at Deep Learning Technology Center at Microsoft Research, Redmond, where he was leading the research on deep learning for text and image processing. From 2006 to 2014, he was Principal Researcher at Natural Language Processing Group at Microsoft Research, Redmond, where he worked on Web search, query understanding and reformulation, ads prediction, and statistical machine translation. From 2005 to 2006, he was a Research Lead in Natural Interactive Services Division at Microsoft, where he worked on Project X, an effort of developing natural user interface for Windows. From 2000 to 2005, he was Research Lead in Natural Language Computing Group at Microsoft Research Asia, where he and his colleagues developed the first Chinese speech recognition system released with Microsoft Office, the Chinese/Japanese Input Method Editors (IME) which were the leading products in the market, and the natural language platform for Microsoft Windows.


Filipe Mesquita

Filipe Mesquita is a Research Manager at Diffbot and holds a Ph.D. from the University of Alberta. He works on advancing the state-of-the-art in automated knowledge graph construction. His work has been published at top tier conferences and journals, such as EMNLP, NAACL, SIGMOD, TWEB, and SIGIR. Previously, he has served as Vice President of Data Science at Mitre Media, the parent company of Dividend.com and ETFdb.com.

Meetup octobre 2020 - Ethics

Jennifer Williams, PhD Student at CSTR (Edinburgh, UK) and NII (Tokyo, Japan) - "Recent Ethical Issues in NLP"


Siva Reddy, Assistant Professor at McGill University / MILA, Facebook CIFAR AI Chair - "Measuring stereotypical bias in large pretrained language models"


Golnoosh Farnadi, Post-doctoral IVADO fellow at UdeM / MILA - "Bias and Algorithmic Discrimination in Machine Learning"

Agenda

9:00 AM Welcome - Hessam Amini

9:02 AM Opening remarks – Sébastien Gambs, Professor at UQAM, Canada Research Chair in Privacy-preserving and Ethical Analysis of Big Data

9:05 AM Talk: Jennifer Williams, PhD Student at CSTR (Edinburgh, UK) and NII (Tokyo, Japan) - ‘’Recent Ethical Issues in NLP’’

9:30 AM Question period

9:40 AM Talk : Siva Reddy, Assistant Professor at McGill University / MILA, Facebook CIFAR AI Chair - ‘’Measuring stereotypical bias in large pretrained language models’’

10:05 AM Question period

10:15 AM Talk: Golnoosh Farnadi, Post-doctoral IVADO fellow at UdeM / MILA - ‘’Bias and Algorithmic Discrimination in Machine Learning’’

10:40 AM Question period

10:50 AM Closing remarks - Hessam Amini

Hessam Amini

Hessam is currently a PhD student of computer science at Concordia University, working under the supervision of Prof. Leila Kosseim. He is also a machine learning developer at Coveo. His research and work is focused on the use of machine learning techniques in natural language processing applications.

Sébastien Gambs

Sébastien Gambs has joined the Computer Science Department of the Université du Québec à Montréal (UQAM) in January 2016, after having held a joint Research chair in Security of Information Systems between Université de Rennes 1 and Inria from September 2009 to December 2015. He currently holds the Canada Research Chair (Tier 2) in Privacy-preserving and Ethical Analysis of Big Data since December 2017. His main research area is the Protection of Privacy, with a particular strong focus on location privacy. He is also interested to solve long-term scientific questions such as addressing the tension between privacy and the analysis of Big Data as well as the fairness, accountability and transparency issues raised by personalized systems.

Jennifer Williams

Jennifer is in the final years of her PhD in speech processing. Her PhD research involves machine learning methods to disentangle information contained in the speech signal for various applications. Her primary research is for text-to-speech, but her interests also include privacy and security for voice technology. Before starting her PhD, she spent 5 years on the technical staff at MIT, where she worked on various NLP projects. She was also a visiting scholar at Institute for Infocomm Research (I2R) in Singapore working on Chinese-English simultaneous machine translation. She has numerous publications in speech and NLP at conferences such as ACL, EACL, EMNLP, and Interspeech.

Siva Reddy

Siva Reddy is an Assistant Professor in the School of Computer Science and Linguistics at McGill University working on Natural Language Processing. He is a Facebook CIFAR AI Chair and a core faculty member of Mila. Before McGill, he was a postdoctoral researcher at Stanford University and a Google PhD fellow at the University of Edinburgh. He received the 2020 VentureBeat AI Innovation Award in NLP for his work in bias in large neural models of language.

Golnoosh Farnadi

Golnoosh obtained her PhD from KU Leuven and UGent in 2017. During her PhD, she addressed several problems in user modeling by applying and developing statistical machine learning algorithms. She later joined the LINQS group of Lise Getoor at UC Santa Cruz, to continue her work on learning and inference in relational domains. She is currently a post-doctoral IVADO fellow at UdeM and Mila, working with professor Simon Lacoste-Julien and professor Michel Gendreau on fairness-aware AI. She will be joining the decision sciences department at HEC Montréal as an assistant professor this Fall. She has been a visiting scholar at multiple institutes such as UCLA, University of Washington, Tacoma, Tsinghua University, and Microsoft Research, Redmond. She has had successful collaborations that are reflected in her several publications in international conferences and journals. She has also received two paper awards for her work on statistical relational learning frameworks. She has been an invited speaker and a lecturer in multiple venues and the scientific director of IVADO/Mila "Bias and Discrimination in AI" online course.

Meetup septembre 2020 - Dialogue systems

Harm de Vries, Research Scientist at Element AI "Towards Ecologically Valid Research on Language User Interfaces"


Nouha Dziri, PhD student, University of Alberta / Research intern at Google AI NYC – "Evaluating Coherence in Dialogue Systems Using Entailment"


Sarath Chandar, Assistant Professor, Polytechnique Montréal / MILA – "How to Evaluate Your Dialogue System?"

Agenda

2:00 PM Welcome - Hessam Amini, PhD student, Concordia University​ / Machine learning developer at Coveo

2:02 PM Opening remarks - Laurent Charlin, Assistant Professor, HEC Montréal / MILA

2:05 PM Talk: Harm de Vries, Research Scientist at Element AI "Towards Ecologically Valid Research on Language User Interfaces"

2:30 PM Question period

2:40 PM Talk : Nouha Dziri, PhD student, University of Alberta / Research intern at Google AI NYC – "Evaluating Coherence in Dialogue Systems Using Entailment"

3:05 PM Question period

3:15 PM Break

3:20 PM Talk: Sarath Chandar, Assistant Professor, Polytechnique Montréal / MILA – "How to Evaluate Your Dialogue System?"

3:45 Question period

3:55 Closing remarks - Hessam Amini


Hessam Amini

Hessam is currently a PhD student of computer science at Concordia University, working under the supervision of Prof. Leila Kosseim. He is also a machine learning developer at Coveo. His research and work is focused on the use of machine learning techniques in natural language processing applications.

Laurent Charlin

Laurent Charlin is an assistant professor of artificial intelligence at HEC Montréal and a member of Mila–Quebec Artificial Intelligence Institute. He earned a master’s degree and a PhD respectively from the universities of Waterloo and Toronto and was a postdoc at Columbia, Princeton, and McGill universities. He develops machine learning models, including deep learning models, to analyze large collections of data and to help in decision-making. His main contributions are in the field of dialogue systems and recommender systems. The Toronto paper matching system (TPMS), a system to recommend and match papers to reviewers that he co-developed, was adopted by dozens of major conferences over the last five years (it has recommended papers for over six thousand reviewers). He has published 30 papers in international conferences and won a second-best paper award at the 2008 Uncertainty in Artificial Intelligence (UAI) conference.

Harm de Vries

Harm de Vries is a research scientist at Element AI (Montreal, Canada). He obtained his Ph.D. from the Montreal Institute for Learning Algorithms (Mila) under the supervision of Aaron Courville. The focus of his current work is on machine learning methods for language user interfaces.

Nouha Dziri

Nouha Dziri is a Ph.D. student at the University of Alberta where she investigates generative deep learning models and natural language processing methods under the supervision of Osmar Zaiane. In particular, her research focuses on developing data-driven approaches for computational natural language understanding, primarily in the context of enabling machines to converse with humans in natural language. Further, she is working on different methods for the fiendishly difficult problem of evaluating conversational AI. Before her Ph.D, she completed a MSc degree in Computer Science at the University of Alberta where she worked on dialogue modelling and quality evaluation. She has interned twice at Google AI in New York city where she investigated dialogue quality modelling and worked also at MSR Montreal where she designed novel deep learning methods for improving response generation.

Sarath Chandar

Bio to come

Mini-workshop février 2020 - Reasoning and knowledge extraction


Agenda

1:00-1:15pm - Opening remarks (William L. Hamilton)

1:15-2:00pm - Prof. Jian Tang: Reasoning with Unstructured Text, Knowledge Graphs, and Logic Rules

2:00-2:30pm - [Industry presentation] CAE

2:30-3:00pm - Networking and coffee break

3:00-3:45pm - [Industry presentation] Coveo: Understanding Natural Languages in Commerce and Support at Coveo

3:45-4:15pm - [Industry presentation] Dialogue: Intent and entities extraction from patients’ dialogue utterances

4:15pm-5pm - Prof. William L. Hamilton: Meta Learning and Logical Induction on Graphs

2019

Évènement de réseautage novembre 2019

Cet événement de networking est destiné à permettre aux partenaires de se connaître dans le cadre du consortium. Chaque partenaire se présentera, détaillera les enjeux clés auxquels ils sont confrontés en PNL, et leurs attentes dans ce projet de recherche collaborative.

Agenda

Welcome address

Partner presentation

AMF

Bciti

Bombardier

Botler AI

CAE

Coveo

Desjardins

Coffee break (10h-10h30)

Partner presentation

Dialogue

Druide Informatique

InnoDirect

Irosoft

Keatext

Lexum

Mr.Bot/Botfront

Nu Echo

Prologue AI

Technologies Korbit

Closing remarks

Coffee and informal Networking and casual discussions for interested people

NLP Montreal Workshop principal - septembre 2019


Information

A workshop on natural language processing and understanding, with the aim to facilitate interaction between our partners, students and faculty members. The intended audience is individuals who are already familiar with the area of natural language processing. Members of the scientific committee are still accepting proposals for short seminar-style talks (contact: prato.gab@gmail.com). Students are especially encouraged to submit proposals, and note that ensuring diversity in both speakers and topics is an important objective for the final program.

Note that the second afternoon is reserved for Alliance research group discussions.

Organizers: Alain Tapp, Will Hamilton and Gabriel Prato

Scientific program: Will Hamilton, Jackie Cheung, Philippe Langlais and Alain Tapp

Date: September 17-18

Location: Mila Auditorium

6666 St Urbain St, Montreal, QC H2S 3H1

Number of participants: 120 (maximum)


Registration

First arrived first serve. Only applications from MILA/IVADO/RALI/CRM partners, students and researchers will be accepted. Participants must be individuals actively involved in NLP research and/or development and a priority to individuals committed to the Alliance NSERC NLP project will be granted.

Cost: 20$

REGISTRATION FORM


Program

September 17

  • 8h30 : Registration (Charlie Mainville and Charles Guille-Escuret)

  • 9h00 : Alain Tapp, Welcome address, introduction to Mila, IVADO and CRM, presentation of the MTL NLP Consortium (Alliance) and brief description of the Aristotle project.

  • 9h40 : Sasha Luccioni, Mila, Detecting Climate Change Misinformation in the Media

  • 10h10 : Koustuv Sinha, Mila, Unreferenced approaches to evaluating dialogue agent performance

  • 10h40 : Coffee break

  • 11h00 : Will Hamilton, McGill Mila, CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text

  • 12h00 : Lunch break with lunch box provided

  • 13h30 : Philippe Langlais, RALI UdeM, Named-entity recognition @ RALI

  • 14h00 : Laurent Charlin, HEC, Conversational recommenders

  • 14h30 : Leila Kosseim, Concordia, NLP at CLaC

  • 15h00 : Coffee break

  • 15h20 : Jackie Cheung, McGill Mila, New Directions in Automatic Text Summarization

  • 16h00 : End of the day

September 18 AM

Alliance grant proposal specific private meetings (September 18 PM)

  • 12h00 - 13h00 : Lunch break with lunch box provided for Alliance grant proposal participants

  • 13h00: Alliance/Prompt Grant status

    • Researchers, partners, administrative team

  • The 6 R&D themes

    • Academic lead, industrial lead

  • Roles & responsibilities

    • What to expect

  • Consortium Website

  • Upcoming activities

    • Monthly meetups

    • Opportunity: Mitacs Summer 2020

    • Opportunity: Problem solving workshop summer 2020

  • The Aristotle framework (seminar by Alain Tapp)

  • Open discussion