General-purpose large language models (LLMs) that rely on in-context learning do not reliably deliver the scientific ...