
Can we trust AI?
Not everyone sees the blue skies on the horizon. “Mark my words, AI is far more dangerous than nukes. I am really quite close to the cutting edge in AI, and it scares the hell out of me,” admitted Tesla and SpaceX founder Elon Musk four years ago at the South by Southwest Conference. Also, once the ex-Google’s star ethics researcher, Timnit Gebru, highlighted the risks of large language models in her paper, which play a crucial role in Google’s business, she announced on Twitter that the company had forced her out. In the paper in question, Timnit, together with her co-authors addressed two key issues regarding large language models: bias and energy efficiency. Both are caused by the enormous size of the training dataset but have different impacts. The impossibility of validating the data is causing the final model to be heavily imbalanced, thus introducing numerous social issues targeting mostly underrepresented groups. The same cause is posing a question of the cost and energy consumption needed for training such models – and we are talking about up to 10s of thousands of dollars per single training cycle! The company’s reaction caused by the mere possibility of the paper going public is the most concerning, though. On the other hand, hadn’t there been such a reaction, the global AI community might not have placed such a focus on this burning issue. As much as we would love AI to be objective, it is much more likely that our personal opinions, attitudes, and views will enter the very core of our models, thus creating various kinds of biases. In a recent paper, A Framework for Understanding Source of Harm throughout Machine Learning Lifecycle, Suresh and Gutag clearly present diverse types of bias we can expect to find in different learning models: Historical bias, representation bias, measurement bias, learning bias, evaluation bias, aggregation bias, deployment bias.
How Can We Trust AI?
By making it responsible. HTEC Group has been working with different global organizations on identifying ways of minimizing bias and discrimination and maximizing fairness in their models. “In one of our projects, we worked on text summarization, where, based on a full-text document as an input, the model generated an abstracted summary of the text. In the process, we noticed that for a significant group of test examples, the summary always had the same stream of words: ‘for confidential support, call the national suicide prevention lifeline…’. This kind of behavior potentially implies that the problem might be in the model itself and that there are a few things that can cause this problem. One of the things which we always keep an eye on is the data used to train that model. The core of the problem is that the model does not have a good context; it uses all the medical articles with the word “therapeutical” in them to create news about suicide.” — explains Ivan Petrovic, Machine Learning Tech Lead at High Tech Engineering Center “In this specific case, we used BART large CNN model, which was trained for this specific task on the news articles collected from CNN and Daily Mail websites. We analyzed the data set this model was trained on to see whether there were any potential suicidal references. What we found out was that around 3 percent of the entire data set came from articles that had the word “suicide” in them. But, since these are CNN articles that do not only report on suicides but also suicide bombers (terrorist attacks), we concluded that these two concepts were not properly separated during the training phase causing the model to acquire knowledge in a much wider context compared to the input examples. We additionally noticed that the phrase “National Suicide Prevention line” often appeared in CNN articles used for training. Consequently, since the summary is not entirely in accordance with the input, the end consumers will get the information that is not correct. This may cause an end-consumer to skip on that particular text and continue searching for more relevant ones.” — says Ivan.


“However, to truly have an effect, governance needs to be integrated into every step of the AI pipeline, starting with data collection, over storing, processing, and analyzing data, all the way to validation and deployment.”