Cars.com site analysts knew something was up about seven years ago when they saw traffic suddenly surge on BMW listings in Albany, N.Y.
A little digging found that the activity was no cause for celebration. It wasn't coveted in-market car shoppers who were driving up page views on dealers' inventory listings but rather Internet bots -- snippets of nefarious software code masquerading as human web users -- that had invaded the site and skewed its traffic metrics.
It was the opening salvo in a cyberwar that Cars.com is still fighting today, against an enemy that is only becoming more sophisticated.
"That's when we started doing something about it," William Swislow, Cars.com's chief information officer, said of the bot menace. "We spend a lot of time and creativity basically recognizing when someone who appears to be accessing the site is a real person or a bot."
The risk goes beyond skewed numbers. Site managers say bots can swipe inventory listing data for reuse by outside parties and can even skim off ad revenue that's generated from online clicks.
For shopping sites such as Cars.com, TrueCar and AutoTrader, whose listings of vehicles in dealer inventories attract millions of legitimate visitors a month, it is critical to weed out shady players and protect the integrity of the data they're reporting in order to sustain their relationships with dealers. Any distortion of the metrics could confound dealerships that are seeing heavy activity on the sites but not getting a proportionate number of leads.
The sites say the reports they provide to dealers -- with metrics such as how many e-mail leads and phone calls were generated by a listing and how many times vehicle detail pages were clicked -- are scrubbed clean of bot activity. But figuring out what the bots are after and how to beat them is a constant challenge.
"It's hard to know who they are in many cases and what they want to do with the data," said Scott Hernalsteen, AutoTrader's senior director of enterprise analytics.
"If you look at the success of AutoTrader over the years, we've built up a large number of listings," he said. "They could be trying to acquire [the information], accumulate it to repackage it and sell it to others. At the end of the day, we typically don't know."
Precisely how much online traffic is generated by bots -- and how many marketing dollars are wasted on these bogus users -- isn't known either. Estimates of the wasted ad spending range from $1 billion a year to nearly $12 billion.
Solve Media, a digital advertising and security firm, said its research found that suspicious web traffic rose 40% in 2013, with bot traffic peaking at 61% of all traffic during the fourth quarter.
Bogus sites, stolen dollars
A common template for bot activity is criminals creating a network of bogus web sites. They can be one-page junk sites, or more sophisticated sites that look legitimate because they've copied content from other sources, said Jeffrey Tognetti, product development team lead at DealerX, a digital marketing company.
The site operators create a bot and send it to car shopping or dealer sites with hopes that it will be recognized as a potential buyer. If that happens, an ad retargeting platform -- a service that shows ads to people on the web sites they visit based on their online shopping history -- could key in on the bot and follow it to the fake site, where it would display an ad.
As a result, "the person who owns the real estate, who owns that web site, gets paid . . . for the impression," Mr. Tognetti said. "And then often the bot itself clicks on the ad and gets paid for the click."
As the efforts to detect them have become more sophisticated, so have the bots themselves. AutoTrader's Mr. Hernalsteen said his site, which uses an array of filters if bots make it past their initial wave of defense, receives tens of thousands of attempts to copy content each day.
The bots sometimes originate in foreign locales, but their creators can disguise them as U.S.-based users to fool web sites that are equipped to block foreign traffic.
One clear warning sign for shopping sites is browsing speed: Seeing one IP address, or computer identification number, hit hundreds of pages within a minute is a dead giveaway that a bot is at work. But some bots are being designed to closely mimic human behavior by browsing slower to avoid getting flagged.
Blocks and limits
John Williams, TrueCar's senior VP-platform operations, says the site counters these bots with anomaly-detection software developed in-house. The software identifies patterns in the traffic, pointing out in some cases whether similar IP addresses appear to be working in concert. If addresses appear to be bots, Mr. Williams said, the site can block them, or impose strict rate limits on them when browsing the site.
A rate limit means there is a maximum rate at which a given set of addresses can load pages, Mr. Williams says. An error is returned to the user when the limit is reached.
TrueCar also spots suspicious patterns through data analysis when looking at factors such as how many car makes a user looks at before buying.
"We often have these things pop up where [we say] 'Hey, there are some small percentages who look at 300 different makes and they never buy anything.' That's not a person," Williams said. "We just go ahead and block those also."
Cars.com requires every e-mail lead submitted to go through an automated system that tries to discern if it's legitimate. Plus, Mr. Swislow said the fraud team manually reviews any lead that looks suspicious. He said his team has made significant progress against bot traffic over the past several years.
But even with dedicated security teams and technologies working to spot bots, he says, the enemy can't be eradicated.
"The nature of the internet is every time you take steps to defend your site against unauthorized use, there are going to be people who will try to get around it," Mr. Swislow said. "I wouldn't make the claim [that] we have no issues whatsoever, but we think we have a pretty good handle on it."
--Vince Bond Jr. is a reporter for Automotive News