Comparing Prompt Results - A Rose By Any Other Name
You might want to test an expected response from a prompt sent to a large language model, but string comparisons will not help you. The inherent variability in large language model (LLM) responses will require you to find new ways to compare generated prompt results. There are a few reasons why a generated prompt result will not exactly match a prior result: the prompt itself may have changed, the model parameters may have changed, or the model’s inherent variability may inject a small amount of change in the results. ...