A 5+ year Python & AI developer and researcher using the latest AI technology to create end-to-end projects in production settings.
This product helps medical professionals write documents by leveraging papers from the PubMed Portal. It uses QdrantDB to index medical paper data, ensuring accuracy and relevance. The entire application is Dockerized, making it easy to deploy online. It handles both single requests and features a multi-turn chat mode. A key feature is its ability to cite all data sources, which helps avoid AI hallucinations and provides reliable, verifiable information. The system uses a multi-agent workflow to ensure a comprehensive and accurate response.
My custom day-to-day application to organize my daily tasks and workflow. A complex MCP-based client that connects many functionalities as MCP Servers (Azure DevOps, Github, ArxivAPI, Web Fetcher) and sends daily reminders of state of work & alerts via Telegram to my work phone. It also features memory and storage systems (MongoDB) to save and improve the queries.
A multimodal Search Engine in-domain to help users improving the queries using the latest AI Information Retrieval techniques. Uses re-ranking and double sparse & dense information retrieval algorithms. Also the system has a custom implementation of HyDE techniques.
A custom debugger that enhances the developer experience with a local, privacy-preserving AI. It integrates with an MCP server to provide intelligent assistance, using on-device LLMs to analyze code, suggest fixes, and explain bugs without sending your proprietary data to external cloud services.
MELENDI (Medical Expert Linguist for Evaluating Nosology and Diagnosis Information) is a recommendation system that helps medical professionals find relevant articles by analyzing patient discharge summaries and scientific publications. The system, which was positively evaluated by medical specialists, suggests articles based on a patient's diagnosis, aiming to efficiently keep doctors updated on the latest literature.
A new automated technique called RadIA uses advanced speech recognition and text classification to effectively monitor radio advertisements. Unlike traditional methods, RadIA doesn't need prior knowledge of ad content, allowing it to detect impromptu or new ads. The model, trained on carefully segmented text data, achieved an impressive F1-macro score of 87.76. This technology has the potential to help companies monitor ad broadcast compliance and analyze competitors' ad strategies.
Dictionaries are one of the oldest and most used linguistic resources. Building them is a complex task that, to the best of our knowledge, has yet to be explored with generative Large Language Models (LLMs). We introduce the "Spanish Built Factual Freectianary" (Spanish-BFF) as the first Spanish AI-generated dictionary. This first-of-its-kind free dictionary uses GPT-3. We also define future steps we aim to follow to improve this initial commitment to the field, such as more additional languages.
TLinguistic ambiguity is and has always been one of the main challenges in Natural Language Processing (NLP) systems. Modern Transformer architectures like BERT, T5 or more recently InstructGPT have achieved some impressive improvements in many NLP fields, but there is still plenty of work to do. Motivated by the uproar caused by ChatGPT, in this paper we provide an introduction to linguistic ambiguity, its varieties and their relevance in modern NLP, and perform an extensive empiric analysis. ChatGPT strengths and weaknesses are revealed, as well as strategies to get the most of this model.
This paper presents our approaches to SMM4H’22 task 5 - Classification of tweets of self-reported COVID-19 symptoms in Spanish, and task 10 - Detection of disease mentions in tweets – SocialDisNER (in Spanish). We have presented hybrid systems that combine Deep Learning techniques with linguistic rules and medical ontologies, which have allowed us to achieve outstanding results in both tasks.
This paper presents a novel and linguistic-driven system for the Spanish Reverse Dictionary task of SemEval-2022 Task 1. The aim of this task is the automatic generation of a word using its gloss. The conclusion is that this task results could improve if the quality of the dataset did as well by incorporating high-quality lexicographic data. Therefore, in this paper we analyze the main gaps in the proposed dataset and describe how these limitations could be tackled.
Have a question or a project in mind? Feel free to reach out.