I recently wrote an article that was quite critical of generative AI, but the focus was on fairly simple tasks, which these AI tools could already not properly deal with, such as compiling lists. A modicum of domain knowledge makes you realize that the information you got was incomplete, sometimes comically so. In order to further my understanding of the AI hype in normieland, I chose more difficult tasks that required a bit of reasoning ability. This is how it went.
My goal was to get a better idea of how well an AI like Grok can solve tasks that are multifaceted, while still being based on pure information retrieval. I thought a good example is asking it what some random professional on LinkedIn may be worth, in terms of net worth, to be precise. The reason was simple: I picked someone who had a standard educational and professional career, a mid-50s accountant. The LinkedIn profile alone told you job titles, cities, and years. You could guess the age based on the graduation year.
To my great surprise, Grok did not do very well on this task. Rather stupidly, Grok assumed that this 50+ person was single and that this guy had always lived in the same city he currently claimed to live in, despite having lived in three or four other cities before. It looked up the cost of living in this particular city, based on data taken from Numbeo and then collected salary ranges of the various jobs that were listed. I was baffled by how bizarre this approach was. On top, Grok claimed that this person surely also owns a car that may be worth 30k euros. However, apparently this car materialzed out of thin air as the value was just added to the sum, i.e. net worth plus total earnings per role and year minus taxes minus cost of living per year plus the value of the car. There was also some reasoning about home equity. Social security contributions were also completely ignored.
A much better approach would have been to look at each role, calculate earnings per year minus taxes and social security contributions. Of course, salaries would have to be adjusted based on city and year. There is inflation and wage growth. An accountant who makes 70.000 Euros in 2020 probably did not make the same amount of money in 2010, even if the title is unchanged. Similarly, you get paid more if you work for larger companies and/or in a big city. Using cost-of-living numbers based on Numbeo is fine, but these need to be historically adjusted as well. Also, Grok believed that this guy saved every penny of his salary after deductions and cost of living expenses, which struck me as quite unrealistic.
I followed up by asking Grok to assume that this person is married with two children. Grok then just picked Numbeo figures for a family of four, did not care to ask if the wife was working or not, or what her income may be, which would be relevant in case the guy does not cover 100% of the living expenses with his own income. Then, Grok surprised me by telling me that the estimated net worth is negative but could not really figure out why this was the case. Obviously, it was because it completely stupidly assumed that this guy has had to cover expenses for a family of four since the start of his career. Instead, Grok moved the guy into a smaller apartment.
After this little exercise, my view on AI tools is even more negative than before. I can see some value in image creation, but if you want Grok or other AIs to do more than the most basic research tasks, you will be severely disappointed. Quite frankly, to me Grok looks like a total joke. As a little addendum, I just asked Grok to tell me the net worth of Taylor Swift, with a detailed breakdown. It was new to me that she does not have to pay any taxes. Grok just took estimates of her net worth and adjusted some numbers to arrive at that value, i.e. value of her back catalogue, tour income, and real-estate holdings. However, my angle was that her net worth may be higher or lower than published estimates. Grok cannot tell you much about this. Well, perhaps it could if someone wrote a detailed analysis. Grok is a regurgitation machine, and not a very good one.
Of course, you may think that questions about the net worth of Joe Average or Taylor Swift are irrelevant. However, you do not get good answers about businesses either. Grok will, completely unsurprisingly, not provide you with new insights. Even worse, it may even suppress interesting outlier opinions because it wants to present answers that are “fair and balanced”. Apparently, the goal is to build the ultimate IQ100 normie AI. It is not clear what the value in that is. At best, you end up with the IQ100 crowd believing that it is even smarter than it already thinks it is. This is another clown world in the making.
I don’t have such a sophistication level of analysis.
I just asked Chatgpt generating exercises for me in Latin, because I am learning it right now. I am looking for a particular set of exercises that could help me to practice declensions of possessive pronouns and adjectives.
It came up with quite many strange examples, which I pulled them to a Latin forum.
Experienced learners there also found these examples very strange and even incorrect.
Well, you could ask ChatGPT to do stock or options prices in near future and see how it is.
My guess is “best luck!”
AI is a buzzword for a algorithm that is about massive amounts of data being processed through high numbers of parallel processors, using linear algebra. This is great for a task such creating and modeling protein molecular structures, upscaling graphics, creating a data pattern that lines up with images or sound and so on as well as flowchart boilerplate documents. It’s not actual AI
I have mixed feelings about AI.
I had an illness recently. The general practitioners were unable to diagnose it – blaming „stress“. I then asked Grok and got the correct diagnosis and treatment (this got confirmed by a specialist and lab results).
Was that luck? I haven’t used Grok frequently for medical advice and if I would, I would be unable to evaluate it as I have no expertise in medicine.
However, in the field in which I am knowledgeable I often find flaws in the output generated by LLMs. What is worse is that people not competent in my field will believe the output. So I can totally confirm Aaron’s sentence „you end up with the IQ100 crowd believing that it even smarter than it already thinks it“.