ADVERTISEMENT

Is your data AI-ready?

Published Jul 22, 2025 12:05 am  |  Updated Jul 21, 2025 05:00 pm
TECH4GOOD
The adage "garbage in, garbage out" (GIGO), which became popular at the dawn of computing technology, has never been more relevant than in the realm of AI. AI models learn by identifying patterns and relationships within the data they are trained on. If this data is riddled with errors, inconsistencies, missing values, or biases, the AI will internalize these flaws, leading to inaccurate predictions, unreliable insights, and ultimately, flawed decision-making. In the AI economy, clean data is not just important; it is the unseen, yet utterly critical, foundation upon which all meaningful progress is built.
Artificial intelligence learns patterns from examples, which are derived from data. The more data AI has, the better it can understand. All AI systems rely on data to learn, infer, and decide. Machine learning models absorb historical patterns by analyzing large datasets, then extrapolate insights or predictions. But it is not just about quantity—it’s about quality. The quality of this internal logic is directly proportional to the quality of the data fed into the system.
AI needs data that is complete, accurate, consistent, and up-to-date. Consider training a medical diagnostic AI on patient records that contain typos, missing symptoms, or outdated test results. The AI's output would be unreliable at best and dangerous at worst. Or think about an AI-powered financial forecasting system trained on data with mis-recorded transactions or outdated market information. Its predictions would be wildly off the mark, potentially leading to significant economic losses for businesses.
What is clean data anyway? Clean data refers to information that’s free of errors, accurately labeled, and prepared for machines to learn from. It involves data validation to ensure that the information falls within acceptable parameters, is well-organized, free of duplication, and properly labeled. This work is tedious, but essential. Without it, AI algorithms wander blindfolded through a fog of bad inputs.
AI’s societal acceptance depends on the trust that stakeholders—businesses, regulators, consumers—place in its outputs. Trustworthy AI is transparent, explainable, and consistent, all of which require a foundation of clean data. When AI consistently produces unreliable or biased results, stakeholders lose faith in its capabilities. This loss of trust can hinder adoption, stifle innovation, and ultimately undermine the societal benefits that AI has the potential to offer.
A significant risk in deploying AI, particularly in domains such as hiring, lending, or law enforcement, is the perpetuation or amplification of existing biases. Many infamous AI failures—like facial recognition systems misidentifying people of color—originated from training data that was dirty, incomplete, or unrepresentative. AI systems trained on biased data sets will invariably perpetuate and amplify those biases. If a recruitment AI is trained on historical hiring data that disproportionately favored specific demographics, the AI will learn to discriminate, reinforcing existing societal inequalities.
Ensuring data is representative, diverse, and free from embedded prejudices is a paramount responsibility in the AI economy, and it begins with clean data. However, producing clean data in the AI economy presents significant challenges. The sheer volume and velocity of data being generated today make manual data cleaning a daunting task. Data often originates from disparate sources, in various formats, requiring sophisticated integration and standardization. Furthermore, identifying and addressing subtle biases within massive datasets demands advanced analytical capabilities and a deep understanding of ethical considerations.
To navigate these complexities, organizations must prioritize data governance. This involves establishing clear policies, processes, and responsibilities for collecting, storing, processing, and utilizing data. It requires investing in automated data cleaning tools, robust data validation mechanisms, and continuous monitoring of data pipelines to ensure accuracy and integrity. Crucially, it also necessitates fostering a data-centric culture where data quality is understood as a shared responsibility across the organization. It will also drive innovation because when data is open, others build new solutions on top of it.
The cost of cleaning and organizing data may seem overwhelming, but the alternative is far worse if we want to reap the benefits of AI. Poor data leads to poor decisions, and a lack of trust in the systems meant to support us. Governments and enterprises must take deliberate steps now to prepare clean and accessible data for an AI-driven future.
The AI economy is not just about groundbreaking algorithms or powerful hardware; it is fundamentally about data. Clean, accurate, consistent, and unbiased data is the lifeblood of effective AI. Without this critical foundation, AI systems risk becoming purveyors of misinformation and perpetuating inequality, ultimately failing to deliver on their immense potential. As we accelerate deeper into the AI era, prioritizing data quality will not be a mere best practice, but an absolute imperative for unlocking the true power of artificial intelligence.
(The author is an executive member of the National Innovation Council, lead convener of the Alliance of Technology Innovators for the Nation (ATIN), vice president of the Analytics and Artificial Intelligence Association of the Philippines, and vice president of the UP System Information Technology Foundation. Email: [email protected])
ADVERTISEMENT
.most-popular .layout-ratio{ padding-bottom: 79.13%; } @media (min-width: 768px) and (max-width: 1024px) { .widget-title { font-size: 15px !important; } }

{{ articles_filter_1561_widget.title }}

.most-popular .layout-ratio{ padding-bottom: 79.13%; } @media (min-width: 768px) and (max-width: 1024px) { .widget-title { font-size: 15px !important; } }

{{ articles_filter_1562_widget.title }}

.most-popular .layout-ratio{ padding-bottom: 79.13%; } @media (min-width: 768px) and (max-width: 1024px) { .widget-title { font-size: 15px !important; } }

{{ articles_filter_1563_widget.title }}

{{ articles_filter_1564_widget.title }}

.mb-article-details { position: relative; } .mb-article-details .article-body-preview, .mb-article-details .article-body-summary{ font-size: 17px; line-height: 30px; font-family: "Libre Caslon Text", serif; color: #000; } .mb-article-details .article-body-preview iframe , .mb-article-details .article-body-summary iframe{ width: 100%; margin: auto; } .read-more-background { background: linear-gradient(180deg, color(display-p3 1.000 1.000 1.000 / 0) 13.75%, color(display-p3 1.000 1.000 1.000 / 0.8) 30.79%, color(display-p3 1.000 1.000 1.000) 72.5%); position: absolute; height: 200px; width: 100%; bottom: 0; display: flex; justify-content: center; align-items: center; padding: 0; } .read-more-background a{ color: #000; } .read-more-btn { padding: 17px 45px; font-family: Inter; font-weight: 700; font-size: 18px; line-height: 16px; text-align: center; vertical-align: middle; border: 1px solid black; background-color: white; } .hidden { display: none; }
function initializeAllSwipers() { // Get all hidden inputs with cms_article_id document.querySelectorAll('[id^="cms_article_id_"]').forEach(function (input) { const cmsArticleId = input.value; const articleSelector = '#article-' + cmsArticleId + ' .body_images'; const swiperElement = document.querySelector(articleSelector); if (swiperElement && !swiperElement.classList.contains('swiper-initialized')) { new Swiper(articleSelector, { loop: true, pagination: false, navigation: { nextEl: '#article-' + cmsArticleId + ' .swiper-button-next', prevEl: '#article-' + cmsArticleId + ' .swiper-button-prev', }, }); } }); } setTimeout(initializeAllSwipers, 3000); const intersectionObserver = new IntersectionObserver( (entries) => { entries.forEach((entry) => { if (entry.isIntersecting) { const newUrl = entry.target.getAttribute("data-url"); if (newUrl) { history.pushState(null, null, newUrl); let article = entry.target; // Extract metadata const author = article.querySelector('.author-section').textContent.replace('By', '').trim(); const section = article.querySelector('.section-info ').textContent.replace(' ', ' '); const title = article.querySelector('.article-title h1').textContent; // Parse URL for Chartbeat path format const parsedUrl = new URL(newUrl, window.location.origin); const cleanUrl = parsedUrl.host + parsedUrl.pathname; // Update Chartbeat configuration if (typeof window._sf_async_config !== 'undefined') { window._sf_async_config.path = cleanUrl; window._sf_async_config.sections = section; window._sf_async_config.authors = author; } // Track virtual page view with Chartbeat if (typeof pSUPERFLY !== 'undefined' && typeof pSUPERFLY.virtualPage === 'function') { try { pSUPERFLY.virtualPage({ path: cleanUrl, title: title, sections: section, authors: author }); } catch (error) { console.error('ping error', error); } } // Optional: Update document title if (title && title !== document.title) { document.title = title; } } } }); }, { threshold: 0.1 } ); function showArticleBody(button) { const article = button.closest("article"); const summary = article.querySelector(".article-body-summary"); const body = article.querySelector(".article-body-preview"); const readMoreSection = article.querySelector(".read-more-background"); // Hide summary and read-more section summary.style.display = "none"; readMoreSection.style.display = "none"; // Show the full article body body.classList.remove("hidden"); } document.addEventListener("DOMContentLoaded", () => { let loadCount = 0; // Track how many times articles are loaded const offset = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; // Offset values const currentUrl = window.location.pathname.substring(1); let isLoading = false; // Prevent multiple calls if (!currentUrl) { console.log("Current URL is invalid."); return; } const sentinel = document.getElementById("load-more-sentinel"); if (!sentinel) { console.log("Sentinel element not found."); return; } function isSentinelVisible() { const rect = sentinel.getBoundingClientRect(); return ( rect.top < window.innerHeight && rect.bottom >= 0 ); } function onScroll() { if (isLoading) return; if (isSentinelVisible()) { if (loadCount >= offset.length) { console.log("Maximum load attempts reached."); window.removeEventListener("scroll", onScroll); return; } isLoading = true; const currentOffset = offset[loadCount]; window.loadMoreItems().then(() => { let article = document.querySelector('#widget_1690 > div:nth-last-of-type(2) article'); intersectionObserver.observe(article) loadCount++; }).catch(error => { console.error("Error loading more items:", error); }).finally(() => { isLoading = false; }); } } window.addEventListener("scroll", onScroll); });

Sign up by email to receive news.