Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Evaluate on test dataset using evaluate() with SimilarityEvaluator returns NaN #3381

Open
bhonris opened this issue Jun 6, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@bhonris
Copy link

bhonris commented Jun 6, 2024

Describe the bug
When running the evaluation a dataset using evaluate() using the similarity evaluator I have come across some scenarios where the result is not a number.
How To Reproduce the bug
Model config
{azure_deployment= "gpt4-turbo-preview", api_version="2024-02-01"}
jsonl file
{"Question":"How can you get the version of the Kubernetes cluster?","Answer":"{\"code\": \"kubectl version\" }","output":"{code: kubectl version --output=json}"}
Evaluate Config

result = evaluate(
    data="testdata2.jsonl",
    evaluators={
        "similarity": SimilarityEvaluator(model_config)
    },
    evaluator_config={
        "default": {
            "question": "${data.Question}",
            "answer": "${data.output}",
            "ground_truth": "${data.Answer}"
        }
    }
)

Expected behavior
Value returned is number

Running Information(please complete the following information):

  • Promptflow Package Version using pf -v:
{
 "promptflow": "1.1.1",
 "promptflow-azure": "1.11.0",
 "promptflow-core": "1.11.0",
 "promptflow-devkit": "1.11.0",
 "promptflow-evals": "0.3.0",
 "promptflow-tracing": "1.11.0"
}
  • Operating System: Windows 11
  • Python Version using python --version: 3.10.11

Additional context

  • Checking the actual logged value in _similarity.py suggests the actual returned value is the string 'The'.
  • I notice that this issue usually occurs when the answer does not match what the LLM response based on the question would be. For example, {Question: What is the capital of France?, Answer: Washington DC, }
@bhonris bhonris added the bug Something isn't working label Jun 6, 2024
@bhonris
Copy link
Author

bhonris commented Jun 6, 2024

I have added to similarity.prompty the following text: "You will respond with a single digit number between 1 and 5. You will include no other text or information", and this seems to fix the issue.

@brynn-code
Copy link
Contributor

Hi @singankit and @luigiw , could you please help take a look at this issue?

@luigiw
Copy link
Member

luigiw commented Jun 14, 2024

@bhonris , thank you for reporting the issue and sharing a workaround. It is a known issue that some preview OpenAI models will cause NaN results. Please also try with stable version models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
4 participants