Data Heroes is a blog series spotlighting data industry thought leaders that digs deeper into the benefits and challenges of managing, governing and leveraging corporate data every day.
I'm Erick Watson, VP of Corporate Development at Quantarium, a big data startup owned by Xome, which is in turned owned by Nationstar Mortgage, (currently re-branding itself as Mr. Cooper). We are the data science arm of Mr. Cooper, employing data scientists from all over the world to manage and parse real estate data.
We aggregate and curate residential real estate data to help buyers and sellers accurately price their homes. It's called an Automated Valuation Model or AVM, and it's a type of data model commonly used in the mortgage industry to predictively understand value of a property. If you've ever visited Zillow or Redfin, you’ve probably seen that you can look up the value of your own home. That estimate is provided by an algorithm like ours.
It's easy to get public data about a property, such as the last few people who purchased the home and how the value of it has changed over time. What’s more challenging is discovering the data about the people who have interacted with the property and predicting their behavior. For example, what were the credit ratings of the last few people who purchased the home, and are they likely to put their home up for sale soon?
Another significant challenge is in curating the data. Most people are familiar with the MLS that realtors use to list homes for sale. While the MLS provides a lot of data, it's (a) expensive and (b) in different formats from different regions. Therefore, a big challenge is normalizing these huge quantities of disparate data from different MLS’ and integrating it with other sources.
We aggregate millions of home ownership records from around the U.S., because in this business, it's all about coverage and accuracy. Think of a cell phone provider bragging about its network coverage – they must provide solid coverage nationwide, or they will lose in markets where they have less adequate coverage. AVMs are similar; it's like an arms race. We must cover most homes in the U.S. with a high degree of accuracy to be considered effective in meeting this minimum bar.
Fortunately, we have a bit of an edge thanks to our industry-leading data scientists. Our algorithm, the Quantarium Valuation Model (QVM) is a highly accurate AVM. In the latest available blind test conducted by one of the industry's leading independent AVM evaluators, Quantarium placed first in measurements of both accuracy and coverage. In addition to placing first overall, three of our models were in the top four in terms of accuracy, and three were rated in the top five in terms of coverage.
My background for the last several years has been product development. Perviously I was with another startup called Moodwire, which performs text analytics on enormous corpora of text. Moodwire gathers text from Twitter, Facebook, and tens of thousands of other sources, essentially converts that text into numbers for mathematical manipulation, creates a detailed knowledge graph of the concepts and things mentioned in the text, and adds analytics on top to more easily understand and sift through the data.
For example, the recent U.S. Presidential Election was a hot topic a few months ago. Moodwire’s data scientists actually predicted the outcome of the election months in advance – our data showed that Trump would win the election handily. However, at that time, this data went against general public perception, the mainstream press and almost every political poll released at that time, so we simply assumed we must have made a mistake somewhere in our data processing. Our lack of self-confidence caused us to stop short of publishing our results. Meanwhile, we kept looking for errors to see where we’d miscalculated in our data, but we simply couldn’t find it!
In retrospect, we should have just boldly published our predictions and dealt with the consequences. Lesson learned: if you’re confident in your data and your processes, you should stand by them.
What we did do was to publish a scientific paper that showed how the data changed over time. You could clearly see some patterns that correlated with the occurrence of the Presidential Debates, for example. The most interesting bit we gleaned from our data was that most people who made their voices heard publicly disliked Clinton to a much greater degree that they disliked Trump. So basically, neither one of the candidates was well loved, but one was essentially less hated than the other.
We also discovered some monkey business going on with Twitter manipulation, most likely bots used by both parties to get their points of view out to the public. The Trump Campaign made much more effective use of those technologies, which is why social media often pointed more favorably towards Trump. Overall, our data spoke to the general negative malaise that people have about the U.S. political system today, as well as to the candidates themselves. In general, people felt they were forced to choose between the lesser of two evils. You can see this more recently in the fact that Trump has the lowest favorability rating of any incoming U.S. President in many, many years.
The biggest challenge is understanding that intersection between peoples' personal lives and their use of property--and getting that understanding while still respecting individuals' privacy. Consider for a moment why someone might buy or sell a home. Typical reasons might include major life events such as marriage, divorce, death, a new job, or children leaving the home. All these events often trigger a person’s decision to buy or sell a home. Our job is to predictively figure out which homes are going to be bought or sold, and at what price, so that we may contribute to a more efficient marketplace and ease the burden of real estate transactions.
One key question is how to ethically and accurately get information from people about where they are at in their various life stages, while assuring that information isn't personally identifiable. For example, it's helpful to us to know how many people in a certain zip code are about to get married, which would indicate that some homes are likely soon to be bought and sold. We don't need to know who is getting married; we just need to know how many people in a certain area are to add that data to our model.
Our dream goal is to know enough about our customers to be able to recommend a list of ideal homes for them. Through a combination of our own analysis and private information that customers are willing to temporarily share with us, we can generate a list of potential homes or, if you're selling, optimize the price and conditions under which you should sell. The real estate market is highly regulated, so the process of buying or selling a home tends to be more complex and costly than most consumers would like. This regulatory burden is one of the reasons why Residential Brokers have become something of a priestly class of persons who intermediate between buyers and sellers. Consumers want lower transaction costs, and the Internet has had an incredibly democratizing effect on real estate data. In response to these trends, the real estate industry is compelled to move toward providing less expensive, more streamlined, and more transparent processes.
Recall when you shopped for your first home, you likely had a list of criteria— how much you could afford, the distance from your work, neighborhood amenities, etc. Historically, Residential Brokers were the intermediaries that helped you identify these criteria and make your purchase decision. Now, empirical data is available that can help you make an even better, more informed decision. It's kind of like TurboTax, which is software that enabled some people to replace their accountant and do their own taxes. Today in residential real estate, consumers now have that option: you can pay a professional realtor to do all the work for you, or you can use productivity software and do it yourself. Companies like ours are creating more choice for consumers in the residential marketplace.
In the future, I imagine a world where a customer could approach a house viewing with cell phone, wave it over lockbox to be identified and admitted, and then hear an annotated home tour through an app on their phone. Even now, there is a trend towards video showings, thanks to new panoramic camera apps that allow for personal narration on the part of the seller. The audio and video can stream to your phone as you walk through the house, and your location kicks off a new commentary about the features of that room.
One trend that was a big surprise to me personally was just how much information you can learn from an individual's digital breadcrumbs. We all leave these digital breadcrumbs around us, and most people don’t fully grasp the threat to their privacy because of it. Even as someone in the data business, I was shocked to see how much you can learn about someone legally, without that person even being aware of what they are giving up about their own privacy. It’s become very easy to find out quite a lot from the data we leave behind in our everyday transactions.
I’ve been fortunate not to encounter any bad actors yet, but I would like to raise people’s awareness about privacy concerns. Both data scientists and consumers are more concerned about getting whatever information or service they are after at any cost, and may not be fully aware of the privacy issues. The more we can do to protect and safeguard our privacy, the healthier we will be as an enduring liberal democratic society. Don’t take your privacy for granted!