-
Notifications
You must be signed in to change notification settings - Fork 787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Visualizing the evaluations should look different from the promptflow traces, should provide some kind of data visualization #3492
Comments
Thank you for your suggestion! Add screenshot of your example below: @tyler-suard-parker one thing I'd like to confirm: in which step you get the above trace UI page? I see there are two runs in the url, so I guess you are getting this from the line If so, how about changing it to |
Yes, I am getting this during the line pf.visualize([base_run, eval_run]). I will try using pf.visualize(base_run) and let you know what happens. I'm glad you like my suggestion, note that you can click on each question to expand it. Having the traces like you already have is nice, and it would be helpful to have some kind of quick summary I can look at just to make sure all my evaluations came out ok. For example, a bar chart for each input-output pair showing correctness, etc. and you can get an explanation if you click on a bar. |
Thank you for your trial, and the description of your scenario! Yes, I think something like a report shall help you better, and more intuitive; while trace UI page does not support it well for now. Engage PM Chenlu @jiaochenlu on this topic. |
Is your feature request related to a problem? Please describe.
Right now, when we visualize the evaluations, it is not easy to understand the results. For example the result of visualizing in this notebook promptflow\examples\flex-flows\chat-async-stream\chat-stream-with-async-flex-flow.ipynb looks like this:
It is not easy to see which evals failed and which succeeded, or a proportion of successes vs failures.
Describe the solution you'd like
It would be nice to have a clearer visualization for the evaluations, because their purpose is different from the traces. For an evaluation we usually just want a simple pass/fail, whereas with a trace we want the full details. Here is an example:
eval report.zip
The text was updated successfully, but these errors were encountered: