Libove Blog

Personal Blog about anything - mostly programming, cooking and random thoughts







Visualizing Likert Scale Survey Data

I'm currently helping to evaluating a large market research survey, which ueses Likert Scales. To visualize the data I've tried several plots. The plots below where created with artificially created data to expose the strengths and weaknesses of different plot types.

Data distribution:

 

You can find the code for all plots here!

 

Bar Plot: Mean Values

Pros:

  • Easy to create
  • Simple to read
  • Q3 – Q5 distinguishable

Cons:

  • Hides a lot complexity
  • Doesn't show spread
  • Creates high confidence in shown values

Likert Scale Mean

Bar Plot: Mean values and Standard Deviation

Extension of first plot.

Pros:

  • Still simple
  • Introduces skepticism into shown data
  • Hints at spread in data

Cons:

  • Hides a lot complexity
  • Still doesn't show distribution

Likert Scale Mean + Std

Violin Plot

Pros:

  • High Resolution
  • Shows distribution of data

Cons:

  • Harder to read
  • Noisy on small sample sizes
  • Shows data as being continuous

Likert Scale Violin

Vertical Histograms

Pros:

  • High Resolution
  • Shows distribution of data

Cons:

  • Harder to read
  • Shows data as being continuous

Likert Scale Histogram

Scatter Plot

Pros:

  • High Resolution
  • Actually shows complete data
  • Shows distribution of data

Cons:

  • Hard to read
  • Introduced noise and overdraw can distort data

Scaled Dots

Pros:

  • High Resolution
  • Shows distribution of data
  • Shows data as discrete values

Cons:

  • Visual distortion of proportions
  • Humans can't easily compare circle sizes

Likert Scale Scaled Dots

Scaled Dots

Pros:

  • Looks scientific
  • Contains Median, Quantiles
  • Shows outliers

Cons:

  • Not designed for discrete data
  • Doesn't show distribution correctly (e.g. Q4 + Q5)

Likert Scale Boxplot

 

#statistics #data #datascience #likert



F1-Score rises while Loss keeps increasing

I've recently run into a paradoxical situation while training a network to distinguish between to classes.

I've used cross entropy as my loss of choice. On my training set the loss steadily decreased while the F1-Score improved. On the validation set the loss decreased shortly before increasing and leveling off around ~2, normally a clear sign for overfitting. However the F1-Score on the validation set kept rising and reached ~0.92, with similarly high presicion and recall.

As I never took a closer look at the relation between F1-Score and the cross-entropy loss, I've decided to do a quick simulation and plotted the results. The plots show the cross-entropy in relation to the F1-Score. On the first graph I varied the range in which the misses landed, while hits where always perfect. The graph shows that misses with a _confidence _create higher losses, even at high F1-Scores. In contrast the confidence of hits has a negligible influence on the loss as depicted in the second graph.

It seems likely that my network developed a high confidence in its prediction, only answering **** or 1, which provoked a high loss while still achieving reasonable high accuracy.

 

F1-Score and Cross Entropy with varying Miss Ranges

F1-Score and Cross Entropy with varying Confidence