gotten my own evidence of how far GenAI, based on the Transformer model, is going (or is going nowhere) yesterday while finalising my last piece of homework for 2025. GenAI, based on the Transformer model, works fundamentally by predicting what word(s) come next. and what made this ‘prediction’ possible? the dataset used in the training that the models have gone through informs this. in short, the Transformer, while ‘creative’, is creating based on existing patterns derived from dataset. and who created this dataset(s)? human thinking, thoughts, ideas, formed into words in the pre-GenAI era. and that dataset has long runout by now. you may read this article by de Gregorio to see all the ideas i have mentioned fall together.
long story short, whatever LLM provides you, it’s something that existed out there in its mega training dataset.
so, now back to my observation. this is the statement i wrote/created:
“With the advent of generative artificial intelligence (GenAI), cyber actors have harnessed it for autonomising complex hacking activities”
after feeding the statement into PAIR (powered by claude), platform suggested:
“autonomising” –> “automate“ (clearer expression)
what’s clear, what’s not clear is subjective. but, “clearer” here is a conclusion of the algorithms based on the dataset. and why is “autonomising” less ‘clear’? by design, ‘clearness’ has to be interpreted based on its training dataset. begs another question, autonomising vs. automating/automate, which term is likely to appear more often in the dataset, and thus lends to the prediction of ‘clearness’? from my author’s point-of-view, PAIR’s suggestion is definitely not ‘clearer’ in representing what i intended for my readers. and, ‘autonomising’ is likely a relatively rare concept out there at the moment. to me, in this case, LLMs’ greatest limitation of being bounded by its dataset is somewhat revealed. asking a far stretch question, is the current conception of LLM/transformer going to lead to AGI? i think the answer is clear.
