Measuring and Managing Honesty in a Participant Pool Online

Intro

SocialSci has been cultivating our participant pool during our open beta period for the past 6 months.  We've grown our pool to thousands of participants, conducted dozens and dozens of studies, and produced well over 2 million answers for researchers using our platform. During this time we've learned a thing or two about how to gauge and manage participant honesty.  Many of the things discussed here have been implemented; others are under development or on the product road map.  Our goal is to provide a means for researchers to get their results faster by bringing their studies online, to the SocialSci platform, and letting us worry about acquiring participants, getting responses, compensating participants, and dealing with cheaters.

Framing our participants for honest engagement

Before starting this discussion, we should explain how SocialSci frames participants to be honest, and weeds out some who may be pre-disposed to lying from the get-go:

  • We differentiate ourselves from online marketing surveying by using the tagline "Real Science"
  • We require a valid cell phone number to limit account creation
  • We ask you to enter a username which doesn’t identify you
  • We ask for a password
  • We encourage you to sign up for our mailing list by entering your email address
  • As we don't associate your email address or phone number to your username, we ask you to complete 5 questions from a list of security questions, such as “how you like your eggs prepared?”

If you are signing up for an account, you'll have to be truthful with at least your cell phone number, and if you ever intend to get back into your account - the username, password, and security survey questions should be something you know, and it is easier to remember true answers here than falsehoods. [1] If you hope to continue taking studies and earning more - you'll likely sign up for our newsletter with a valid email address. We have a 30% percent click-through on each newsletter that goes out, indicating a very high response rate, with valid emails. Approximately 75% of our participants are on our mailing list.  Others may find out about new studies via Twitter or Facebook.

In SocialSci's system we offer participants points when completing studies when they're first released by researchers. We also offer 'archived studies' worth no points, but only social rewards such as badges. Historically 3.3% of participants in our system have been flagged for cheating, with only 0.3 % of total users cheating a second time.

The information discussed here is based on our experiences and interactions with that subset of cheaters (still numbering in the hundreds), compared to the majority of honest participants. While I work with researchers on a daily basis, I am not a researcher, and so take this discussion in the context of a curious entrepreneur, not a rigorous scientific investigation.

Motivations for lying

I'll break these down into four broad categories: anonymity, curiosity, intention and desperation.

  • Anonymity: Those who either believe that providing non-consistent answers helps to maintain their privacy or their anonymity. Answers are provided to mislead.
  • Curiosity: Those who are just curious about our service or a particular study, and submit answers which are not intended to be serious. Many of these never actually fully complete the study.  They're not driven by attaining the points, but merely understanding what a study is like, what we offer, etc. These are the people who answer 'human' to the ethnicity questions, provide age answers of 2 and 1000. Mostly harmless, as such outlier responses are auto-flagged as invalid by us and researchers.
  • Intention: Creating an account or taking studies with the pure intent being to maximize profit. These are some of the easiest to catch, because they magically qualify for EVERY study.  Even the ones which conflict.  They're male AND female, Caucasian and African-American, and Smoker and a non-smoker!  Trying to maximize profit by answering dishonestly while not being caught by our algorithm is an inherently impossible thought-provoking challenge: You can't see into the future as to what studies we'll be offering, and at what point values -- to determine what should be your consistent false-persona over time.
  • Desperation: These are survey takers who start out taking studies, answering them honestly. Until they get to a point where they realize they're only one more survey away from getting a reward… so they begin to evaluate whether they should lie to get more points and the ability to cash in for a reward.  This typically happens in one session, their first. Participants who have been taking studies honestly over time rarely become suddenly desperate and lie to take a study they do not qualify for.

Delving more into desperation, an easy-to-understand pattern emerges: Even the best intentioned participants begin to run into an ethical quandary when presented with the prospect of rewards.  When points are offered on a study, a small subset of users choose to lie to gain additional points, primarily driven by reaching a goal of cashing out points.

The most common pattern we see for lying participants goes as follows:

  1. Participants sign up for a SocialSci account
  2. Either through announcements within our system or word of mouth, participants gain an expectation of earning a reward.
  3. They begin taking studies, and earning points
  4. They realize, for instance, that they're only one more study's worth of points away from redeeming for a reward.
  5. They also realize they don't qualify for the study -- by having read the title or the researcher’s terms agreement.
  6. They decide to 'become' who the researcher is looking for -- lie on the study, and attempt to earn points.
  7. They redeem their points for a reward.

From this, we can observe that lying is seldom a participant’s primary intent: They answered honestly on the first few studies -- the ones they did qualify for. They only started lying out of desperation to reach their reward goal.

Lets dive further into the patterns and minds of those lying due to intention or desperation, and how we detect that they're lying.

Lack of attention - Not quite lying

Despite having the intention of lying, few provide the proper attention required to do so well.  If ages change wildly for instance, this is a clear example of lying through not paying attention. Plenty of people don't want to take the online survey process seriously, and so attempt to complete them as quickly as possible, putting in the first answers that pop into their mind, and blindly clicking away at the options.  This is the easiest to detect: Wildly unrealistic survey completion times, skipping any answer possible, submitting pages w/o required questions answered, missing gold standard questions, and more easily caught errors.

Another phenomena we see has to do with how closely participants pay attention to the researcher's survey terms agreement. Often-times researchers publish a long multi-page terms sheet that participants are expected to read and agree to before taking a study.  Most of the information contained within is of little significance to most participants, and so they accept the terms without reading them. Buried among those terms is a paragraph explaining who qualifies to participate in the study.

For one researcher who asked participants to confirm that they are the demographic they just agreed to in the terms -- we saw a 9% reduction in the number of invalid participants continuing on in the study, which was nearly 100% more effective than people who lied to that question, but stopped taking the survey at some point later (lying difficulty?). Finally, this method was 80% as effective as those who lied, completed the study, and were then caught by us.

SocialSci does this for many of our researchers. If our system doesn't have a high enough confidence based on past responses (or lack of responses for new users), we present the user with a qualifying study.  We don't tell them what the researcher is looking for. We just ask them to answer a few demographic questions, so we can determine if they qualify for that researcher's set of studies.  This is extremely effective: When people aren't provided information about what is being looked for, they have no incentive to answer differently than the truth.

The more directly participants are asked to confirm who they are, and if they qualify, the higher percentage who will disqualify themselves.

Consistent Lying is Difficult

Outside of SocialSci, I've observed that most people aren't good liars. They're especially not good consistent liars. This holds doubly true within SocialSci, where our system is designed to compare participants' responses over time for consistency.

Whether with full intention, or only out of desperation, sometimes participants will decide to lie, and having read the terms, will take the study not as themselves, but emulating who the researcher is looking for. This is easier said than done. Even if they start a study with the intent of lying, we've found that the further into the study people are asked for their demographic info, the more likely they are to forget to lie, and will provide the true info -- getting caught by our system.

For instance, if you're a white male, and you decide to take a study that requires you to be an African-American female, age 18-30, who is bisexual, we find that studies where cheaters are asked for this information within the first few pages of answering questions, will lie effectively. If instead that information is collected on the last page of a 15 page, 20 minute study, they often will not remember who they are supposed to be: And without a way to go back and review the terms, once having started the study, they eventually fail at lying consistently to match the study requirements.

Detecting lying - Onion layer approach

Thankfully I don't believe sharing our approach necessarily lessens its effectiveness, so I’m happy to share. I've spent 5 years working in the computer security field, and have taken away a number of lessons which I've applied to our lie detection strategy.  Many people have the mindset that if something isn't 100% foolproof, it's not worth doing.  Nothing is 100%, and in the security world, we combat this by the concept of 'onion' layers of security. You layer one approach on top of another: If someone gets through one method, as it’s not 100% effective, they will have to contest with another layer, and so-on, increasing the overall probability of success in detection. For example, a cheater taking surveys will try to optimize their monetary gain, by taking studies faster as they become accustomed to whats being asked of them. We'll detect this as out of the norm. Or they may forget their persona for one of the studies, and we'll catch them. By having a multi-layered approach to detecting a lack of truthfulness, we can gauge whether someone made a one-time error, or has a history of deceitful answers and actions.

The final goal is that once we've added enough layers, or hurdles for someone to jump through, it becomes not economically viable (nor intellectually possible) for them to be smart enough, take enough time, and be persistent enough to hop through all the hurdles just to get a $5 Amazon Gift card from us. And even if after all that, they still do? Its not going to be a statistically significant proportion of users who participate in studies to skew the results.  Such lying exists in the offline world also.  Researchers take that into account. We're not promising a fool-proof system, we're promising a much better system than what currently exists online for honest, payable and anonymous participants.

Lie detection is a tricky business, but with enough data and time; it becomes increasingly more apparent whether a participant has lied. Some more examples of our onion layers:

  • Gold Standard Questions - Perhaps asking what color the sky is, or to type a word instead of choosing from the options.  If the user isn't taking the study seriously enough to correctly answer that question, then its a red flag to the researcher for that result.  Our researchers do this. SocialSci does this.  We interject such questions along with our qualifying studies, and incorporate it into our algorithm which determines the 'credibility score' of participants.
  • Hidden Complex Questions make it more difficult to lie: What race is your mother, what race is your father? -- Will the liar think through the complexity of ensuring that this correlated with their own self-identified race-lie in a different study? What about asking what year you were born vs. your age?
  • Liars lie fast. If a lie is prepared (as in you've taken the study before, and you're attempting to game our system) -- this is easily detected, and taken into account. Even attempts to game this, by taking far too long to submit answers will raise flags, as it will be outside of the standard deviation for question, page, and survey answer times. The only real way to not flag this is to honestly take the survey!

In the Security world, there's the concept of 'As a defender, I must defend all avenues all the time from everyone' -- As an attacker I must find only one chink in the armor at one point in time to gain access.  I believe this concept similarly applies to catching liars: They must make only one slip-up in one study to begin to arouse suspicion in our algorithm -- a far more difficult task for the cheater when compared to just being honest. Sir Walter Scott's famous couplet "Oh, what a tangled web we weave / When first we practice to deceive!" describes the often difficult procedure of covering up a lie so that it is not detected in the future.

Detecting lying - Human review

We combine our automated algorithm with human reviews of participants’ activities. We implement a 24 hour review period for all orders, as the largest incentive for lying is to profit through rewards. This gives SocialSci staff a window of time to review any flags on a participant, and always keep on top of new advances in ways participants may try to lie, or cheat the system.  Our staff has the ability to positively or negatively flag accounts with different tags, worth different points, much like SpamAssassin would do for spam emails, as part of a Bayesian filter. We combine these manual flags with the algorithm output for an overall profile of participants, and that feedback goes into future development iterations to improve our algorithms.

Notifying participants of cheating

Based on all these tactics for detection, we take a variety of actions to alert participants.  We have two competing incentives: It costs us to acquire participants, and we want to have a large and diverse pool of participants for our researchers to access.  At the same time, we cannot tolerate having poor quality participants, who lie, or do not otherwise take our researchers’ studies seriously.  We take a graduated approach of notifying participants about the importance of being honest, and taking the studies seriously.  We hope to correct any poor behavior early and often, and aim for improvement by notifying participants of disqualification while taking a study, or when redeeming rewards. Opportunities are given to improve their cred score by taking archived studies, which are not worth points. If warning them is not possible, or the behavior is so egregiously fraudulent, we may be forced to block them from redeeming rewards, or participating in future studies. We always leave the door open for participants to discuss these decisions with us, and contest any account flags.  Sometimes we or our algorithms make mistakes, and we wish to manually correct this, and improve the algorithms.

Priming participants for honesty, and keeping them honest

Based on everything discussed, and the patterns and behaviors we see in participants, there are a number of things we can do to encourage honesty within the system, and quickly correct dishonesty.

Our biggest challenge is separating ourselves from other surveying online.  We are not marketing surveys. We're real scientific studies. Your answers matter: They affect the researchers’ science.  We care about participants’ honesty, researchers care, and we want them to care!  

For starters, we can ensure that we have sufficient new studies to take for any new participant, or when notifying our existing  pool. If they are able to earn enough points to cash in an award, then they have less incentive for opportunistic lying.  For those who have had poor experiences with online surveying in the past, this helps us gain credibility with participants, and begin changing their mindset.

As discussed earlier, we can also improve the survey terms agreement. We can emphasize specific requirements, and point-blank ask participants to agree to being the age and gender called for.  Many people who do not qualify will not continue when prompted in this way.

Case Study: Mass cheating -- what researchers experience online today

Just before publishing this post, we had our most egregious case of cheating yet: One or more persons created over 160 accounts, participated in studies, and submitted orders for rewards. The spoils for their efforts? A grand total of $5.

Combating situations like this is precisely why we designed SocialSci. We want researchers to be able to conduct their research, and get trustworthy results fast -- without having to play the cat and mouse game with cheaters, or design complex systems as SocialSci has: Let us worry about that.

Our onion layers of protection worked as intended, and enabled us to catch this cheater, and prevent them from skewing results, or pilfering our bank accounts. Here's what they did, and how we caught them.

Just like any user, they signed up for a new account by receiving a text message from us, but at the time didn't qualify for any studies.  A few days later we announced new studies and they signed up for another account. They began taking studies, finding two studies they qualified for, and earned enough points to redeem for a gift card. As everything checked out with their account, our system awarded the gift card 24 hours later.

In the meantime, the prospect of illicit gains was just too much for this user, and they decided to try to game the system.  They found an online SMS platform where they could receive messages from us, and signed up for multiple accounts.  As the accounts, their responses, and their orders piled up, the flags began going off.  We could identify them all as rogue: The only question was how they were signing up for multiple accounts.  For debugging purposes we only further obscure phone numbers a short period of time after they're submitted. By reviewing the recent phone numbers of new accounts, we saw an unusual pattern: many numbers in the same area code and exchange. We called one of them and were prompted with an automated message from an online texting service.  Sure enough we visited their website (they shall go unnamed) and reviewed the service. We've since followed their guidelines to submit our texting numbers to their block list to prevent this service from being used in such a way for the future.

But what did the user do that led to us being able to identify them?

  • For starters, they conducted much of their business from the same IP addresses. We don't store IP addresses, but we do store a one-way bcrypt hash of them for comparison purposes.
  • Similarly, they would log out of one account and right into another: a clear pattern of cheating.
  • The survey completion time was too short - average of 20 minutes vs. their 3 minutes.
  • They provided very similar responses: They were always a white homosexual male.
  • They chose the same studies to complete, and in the same order.
  • The human element: We visually could see a pattern in the username creation that was not normal

Our 24 hour review period worked exactly as intended: We were able to mass block the accounts, deny the orders, and flag the responses as invalid, not paying out a single dime to their cheating ways.

Conclusion

Although detecting and dealing with cheaters can be a difficult business, it is also a rewarding one. We're creating the best system online for researchers to get responses quickly and inexpensively for their studies, and trust in the answers they receive. We've already helped researchers around the globe get their results faster, and start writing up their results sooner, instead of spending endless months hunting down more participants, dealing with the hassles of collecting their responses, and compensating them.

Despite covering many of our strategies here, we have many more built into the system, and many more to come. Let us worry about this, and we'll let the researchers focus on their research.  We look forward to continuing to explore the challenges of running an honest, payable, and anonymous online pool of participants and sharing our findings with the research community.

- Mike

& The SocialSci Team
 

References:
[1] Self-relevance effect
Anthony G. Greenwald, The totalitarian ego: Fabrication and revision of personal history, American Psychologist, Volume 35, Issue 7, July 1980, Pages 603-618, ISSN 0003-066X, DOI: 10.1037/0003-066X.35.7.603.


The value of your start-up being mentioned on Reddit

Recently, SocialSci was recommended in a Reddit thread “Does anyone know of a reputable website where you can actually make money?

In less than 48 hours, we had over 2000 Reddit-reading visitors directed from the thread and just over 750 participant signups—a remarkable overnight feat since our signup requires users to authenticate via sms and answer 5 security survey questions.  It is worth noting that the thread never made it to the front page of Reddit! These new users helped our researchers answer over 180,000 scientific questions in 2 days!

What we think happened…

The subreddit, http://reddit.com/r/frugal, is a 68K person community, which means it has a serious following.  /r/frugal is a targeted subreddit where people are conscious about money saving techniques. This consciousness appeals to people who are motivated to take action and save/earn money. Its top rated threads will be instantly placed before, potentially, 68K people.

I first noticed that Socialsci.com was the top voted comment in a thread late Saturday night. This particular thread topic is common on /r/frugal and I assumed Socialsci would be buried. I awoke to a thread with 200+ upvotes and a healthy discussion about SocialSci. It’s the discussion, I feel, that added to our trustworthiness and motivated people to register for SocialSci.

The thread had many of our users chiming in, answering questions, and revealing how much they have actually made.  These users were most likely recruited from one of the many Reddit ads we have run in the past. The combination of user validation, top comment position, and the thread being in a popular subreddit led to a nice surge in participant signups at a 37% conversion. 

SocialSci.com offers a way for academic researchers to survey participants from around the world. We wanted to share these numbers to add to the overall knowledge of what being mentioned in a somewhat popular social thread could achieve.

-Leon 

& The SocialSci Team

 

 

 

The SocialSci Way: Part 1

Here's a likely scenario that a lot of academic researchers find themselves in:

Researcher X decides to post a market research study to an online community. To qualify for the study, a participant must be a female between the ages of 18-24 who uses cigarettes daily and has a valid email address. This ten-minute survey, when completed, rewards the user with cash or gift card incentives. The researcher believes he'll get a good random sampling of results, because the online community seems highly populated with active users representing various demographics. Seems simple enough.

This is what really happens. The study gets posted, and certain users with a more entrepreneurial bent decide to game the system and take the survey using multiple accounts. Fake email addresses are created, false data is entered into the survey, and the particular user is richer by multiples of the intended reward. Meanwhile, the researcher's data has been compromised, and the responses of other legitimate users are wasted. It's no wonder that academic researchers are skeptical of posting studies online.

We here at SocialSci have experienced the above problem first-hand as academic researchers -- and while it was aggravating, it drove us to create a solution. When researchers post studies to our website, they can feel confident that SocialSci users are not gaming the system. We raise the barrier to entry by requiring unique user identification, ensuring an individual can only create a single account. Once the user is in the system answering survey questions, our algorithms allow us to track that user's responses over time, forming the basis for our vetting system. For example, if a respondent claims to be a male one week and a female the next, our system takes notice, notifies us and the researcher, and lowers the user's quality score. Thus we are able to help researchers target the demographics relevant to their study, while simultaneously eliminating bias from the sample. Our vetting system also allows researchers to create studies without ever having to bias a respondent by asking qualifying questions -- ensuring valid responses. No longer does conducting research surveys online jeopardize your results or grant funding!

Contact us and try creating a survey on the site today. As always, we look forward to your feedback.

“Please take my survey!” No more!

Have you ever tried taking an academic survey? How did you find it? Were you attracted by the glitzy advertisements on the subway, or accosted by a messy undergraduate on the street, begging for thirty minutes of your time? Maybe you're a researcher, praying to the Research Gods that your sample selection will reflect the target population, doing wonders for your project and helping it get published in Science! With limited financial resources and strong pressures to publish, academic researchers use all kinds of methods to gather data, most commonly resorting to paper and mail-in surveys and rallying subjects into labs to complete computerized surveys. Though practical, traditional methods such as these have their own challenges, namely time, cost and poor response rates. Not so optimal!

This is where online surveys come in. Easy and quick to carry out, these surveys can be administered to a large number of participants in minutes, while eliminating the need to manually enter the responses into data-analysis software. On top of making the data easily accessible, survey websites like SocialSci also provide the researchers with analytical and statistical tools to analyze and process the gathered data immediately. All of this results in substantial savings of both time and money. Maybe the prospects of that article appearing in Science aren't so bleak after all.

Though online surveys have many advantages, they retain some of the challenges faced by traditional methods, mainly that of sample selection. Reviewers and readers are right to scrutinize the quality of the traditional participant pool, since settling for the readily available college age, white, male sample isn't really adequate. While the population connected to the Internet, numbering more than 75% of the US as a whole, may be more representative of the population at large, researchers still find it difficult to obtain the random sample population that they often require for their research. The sample selection problem could be solved if researchers had access to a central repository of users from which representative random samples could be selected. This is where services like SocialSci step in, providing researchers access to an exhaustive database of informed, consenting participants. Accessing less visible and decentralized groups has become easier!

Being academic researches ourselves, we have experienced the above challenges in our everyday research lives and have become frustrated with the status quo. Our motivation to make our own lives easier, while also furthering science helped us develop SocialSci as the solution. It provides both researchers and participants -on a global scale- a platform to engage with science and with each other. This commitment to science motivates us everyday to make our product better and make your experience with research and accessing science as seamless as possible. Enjoy our product, and let us know what you think.

Welcome!

Sociallogo

Hi, everyone!

To kick this blog off, we've decided to tell you a little more about our project and where we are heading.

SocialSci started off as an invention of undergraduate researchers at Yale. We were tired of the current participant recruitment process and wanted to have the ability to reach a global population, not just the limited pool of college students who wandered over to our survey booths. Therefore, we started SocialSci with the hope of creating an online platform for scientists to reach participants for scientific surveys and in-lab studies.

We are working on some revolutionary technology that will enable us to provide an anonymous, honest, and payable group of participants online. We are also tying this pool to cutting-edge survey technology and real-time analytics (actual analytics like ANOVAs, chi-squared tests, and correlations in the browser). We hope to become an academic software power-house and to further science by bringing outdated methods up-to-speed and online.

If you have any questions, comments, ideas, concerns, poems or anything really to say we would love to hear from you.

Thanks for taking a look, and we hope SocialSci will evolve into a service you are excited to use!

 

Sincerely yours,

-The SocialSci Team