3PO-LABS: ALEXA, ECHO AND VOICE INTERFACE
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact

3PO-Labs: Alexa, Echo and Voice Interface

The Power of Alexa Skills Reviews

2/6/2016

2 Comments

 
Last week, we wrote about the rising trend of self-reviewing happening among Alexa skills.  That article made the argument that astroturfing is happening, and it's decidedly uncool.  Today, I want to attempt to quantify this with a little bit of data.


tldr;  Amazon's algorithm is imperfect, and doesn't properly weight number of reviews. As a result (and partially due to self-reviewing), single-review skills end up at the top of the list.  Further, our metrics show that good reviews have an extremely significant, causal relationship with user engagement.  Ergo, it's important to solve this problem ASAP.

How reviews are sorted

In order to understand why any of this matters, it's important to understand the way the current ranking algorithm for skills works.  

In sorting by Avg. Customer Review, the following rules are applied by Amazon:
  1. Sort by average review score, with highest average coming first
  2. Among all skills with a given average score, sort by number of reviews, with most reviews coming first.
  3. Among all skills with a given average score, and a given number of reviews, randomize all entries (note: this may not be an explicit randomization, they may just be randomized to begin with, and no additional sorting is done).

Lets look at how this actually works with current skills.  As of February 6, 2016, if you open up the list of skills and sort by Avg. Customer Review, this is what you will see at the top of your list:
Picture
The top two skills currently are Daily Affirmation, and Presidential Trivia, both of which have perfect star ratings.  Following these two skills are seven additional skills, each with a single 5-star rating.  As mentioned in step 3 above, these seven skills will be randomly ordered each time you reload the page.  If you look at the very last element on this page, however, you'll notice something interesting...
Picture
The final skill currently on the front page is this called "Buddy for Destiny" (and in fact is one of the more unique skills available currently).  This skill shows itself as having a 5-star rating with almost five times as many reviews as Daily Affirmation. Shouldn't this one be at the top?  Well, not exactly.  You see, Buddy for Destiny doesn't actually have a 5-star rating, but the UI only shows increments of half-stars.  It turns out, Buddy for Destiny's real average rating is actually "just" a 4.9:
Picture
As it turns out, Buddy for Destiny only has 69 of a possible 70 review points, which gives it an average rating of about 4.92.  That single missed point came from the following review:
Picture
Wow, that's harsh.  A user was "very impressed", and thought the skill was "Awesome", "brilliant", and "creative", and so naturally they didn't decide to give full points?  And in so doing, they sabotaged Buddy for Destiny's ability to compete with the lesser reviewed skills.¹

Does Any of This Even Matter?

Now, obviously this is a flawed system.  Buddy for Destiny (and Automatic, the first skill on page 2 of the rankings) should clearly be on top of the rankings, with an extremely strong score applied over a much larger base of reviews.  To understand how bad this is, then, we need to make the case that reviews actually matter in this ecosystem.  I believe I can do that using data from our own two skills, CompliBot and InsultiBot.
When we were building these two skills, our estimates were that InsultiBot would be the more popular of the two skills - its content was more interesting, we believed, and there was a lot more of it, to boot.  When we launched the two skills, our guesses were validated.  You can see here that InsultiBot was about twice as popular as CompliBot:
Picture
While the above graph is "total utterances", the same thing held true for "unique customers" hitting our skills:
Picture
It's worth pointing out that the peaks here are weekends, where we regularly see much higher engagement.  That's for a future post, though!
Anyhow, everything was going according to plan for that first week, but then we noticed something strange happen - CompliBot leapfrogged InsultiBot in both categories all of a sudden.  As it just so happens, this exactly coincided with the moment we got our first review, a 5-star review for CompliBot.  The metrics graph shows it very clearly:
Picture
And, as before, the same ratios are observed in the unique customers count:
Picture
Conveniently for the purposes of this article, the review came almost exactly a week after we first launched, which allows us to do a much better apples-to-apples comparison.² 

​Whereas the values for InsultiBot are fairly similar (the pattern and numbers of week two are pretty close to the first week), CompliBot completely diverges starting on the 26th, with a continued upward trajectory that is not matched by InsultiBot.  It immediately becomes the higher-utilized of the two by both metrics, and easily holds on to that mantle.

Ergo...

The takeaway here is that we have a very strong indication that average reviews - and more specifically placement in the average review rankings - does have a significant effect on the usage that a given skill sees.  As such, it  is all the more vital that Amazon do three things:
  1. Quash astroturfing.
  2. Give us an avenue to work with those who give us negative reviews.
  3. Fix the ranking algorithm.  Right now, anything other than 5-stars is a "bad" review.  Alexa is certainly not the first to face this issue...
Here's to hoping they act quickly on this front.  With the flood of trivia skills coming from the "1-Hour Skill" tutorial I'd expect a lot of the best skills will be buried deep in no time.

Let us know if you've run into any situations like this, or have similar supporting data.  You can drop a comment below, or hit the contact page to email us directly.
1: This morning, as we were writing this article, we had this exact thing happen to one of our skills.  CompliBot was a perfect 5-stars, tied for top overall with Daily Affirmation, until someone gave it a positive-yet-terrible-4-star review, and dropped it to the second page of rankings.  We wanted to give full disclosure in that regard, although it had nothing to do with causing us to write this article that was already in progress.
2: It's also worth noting that, during this time period, we made no changes to the services on our side, nor did we do any advertising nor retrieve any official press. The only variable that changed during this period was the single 5-star review granted to CompliBot.  We clip this data at the 29th, because at that point we got our first InsultiBot review, got an additional CompliBot review, and did our first content patch.

2 Comments
KIm
11/11/2016 05:20:03 am

Isn't there any way to measure the number of app downloads through alexa app? I think it will be perfect if you guys reveal that data with your fantastic review system.

Reply
Eric Olson link
11/11/2016 07:40:41 am

Currently there's no way to measure enablement, although if you look at invocations from unique users that's a pretty good approximation. That doesn't actually tell you anything about how many people have it actively enabled at a given time, though.

On that note, one idea we pitched to them many months ago was that we should get a ping on enable/disable. That would allow us to keep track of how long people are keeping our skills active, but it would also allow for some helpful things like preprocessing if you need to do initialization for a user on their first visit, for example. Now that you've reminded me, I should add that to the new feature groups forum.

Reply



Leave a Reply.

    Author

    We're 3PO-Labs.  We build things for fun and profit.  Right now we're super bullish on the rise of voice interfaces, and we hope to get you onboard.



    Archives

    May 2020
    March 2020
    November 2019
    October 2019
    May 2019
    October 2018
    August 2018
    February 2018
    November 2017
    September 2017
    July 2017
    June 2017
    May 2017
    April 2017
    February 2017
    January 2017
    December 2016
    October 2016
    September 2016
    August 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    December 2015

    RSS Feed

    Categories

    All
    ACCELERATOR
    ALEXA COMPANION APPS
    BOTS
    BUSINESS
    CERTIFICATION
    CHEATERS
    DEEPDIVE
    EASTER EGG
    ECHO
    FEATURE REQUESTS
    MONETIZATION
    RECAP
    RESPONDER
    TESTING
    TOOLS
    VUXcellence
    WALKTHROUGH

Proudly powered by Weebly
  • Blog
  • Bots
  • CharacterGenerator
  • Giants and Halflings
  • The Pirate's Map
  • Responder
  • Neverwinter City Guide
  • About
  • Contact