A recent study conducted by Apple’s artificial intelligence (AI) researchers has raised significant concerns about the reliability of large language models (LLMs) in mathematical reasoning tasks. Despite the impressive advancements made by models like OpenAI’s GPT and Meta’s LLaMA, the study reveals fundamental flaws in their ability to handle even basic arithmetic when faced with slight variations in the wording of questions.