More data, fewer restrictions
Though AI tools developed from open source code may be cheaper and simpler than licensed models, they can allow for better data integrations, said Natalie Lambert, founder and managing partner of GenEdge Consulting, which helps marketing organizations use generative AI.
Licensed enterprise models have restrictions in place that prohibit users from accessing and incorporating data from certain sources, said Lambert. But with open source code, these restrictions are less prevalent, enabling developers to plug in more data than would otherwise be possible. With the requisite programming, information stored in Salesforce and Marketo, for instance, can now be fair game for training a company’s tool.
Another opportunity offered by open source AI is a greater assurance of data security. Fears that LLMs can hoover an organization’s data for training purposes have caused significant hesitation towards AI technology, especially for companies that maintain highly sensitive data. Some entities are outright refusing to use AI if they are required to share this data outside of their private firewalls, Lambert said.
With AI built from open source code, this data never has to leave an organization’s ecosystem, because the AI itself is being developed within that same ecosystem.
“There’s a benefit for having a fully controlled environment,” Lambert said.
By seeing the specifics of the code, companies can also leverage greater transparency of the model’s inevitable imperfections. Biases developed via training data have been another cause for concern within the ad industry, compelling some advertisers to require from their agency partners tighter guardrails around AI. When these models are closed, users have no way of finding out what may be going wrong.
“Like going to a grocery store, you want to know the ingredients you’re putting into your body,” Lambert said.
However, Lambert noted the importance of remembering that open source models, like their closed counterparts, are better and worse at different things. This is why comparing multiple open source AIs could help a company find the best infrastructure off which to base their tool.
Moreover, “open source” is not an agreed upon term, meaning that some models’ code may be less accessible than others’. Meta, for example, made Llama 2 open source last year, but users noticed that the tech company did not reveal the data or code used to train the model. Llama 3’s open source code has its own restrictions, such as requiring a license from organizations with large amounts of monthly users. And a study published last August found that numerous AI providers were exaggerating the level of openness of their models to ingratiate themselves with regulators. These limitations in transparency suggest that “‘open’ can mean many things,” said Lambert.
Not for everyone
Leveraging open source code will only work for companies that have vast amounts of data and compute, said Inuvo’s Howe. The reasoning is that, even if a developer publishes the entirety of their model’s programming, this will only be a once-in-time snapshot of that model. The means for continually updating the model with training data—the lifeblood of generative AI—will still need to be furnished by the company using the code.
For instance, a crawler, which is the mechanism that retrieves training data for the model, may be missing from the code. Without a crawler, the model will lack many of the capabilities available in licensed tools, setting the company behind, said Howe.
And even if a company is successful in building an application on top of open source code, it will need substantial resources to devise a moat—or protective measures—to fend off bigger tech companies from its business, Howe said. Patents are one form of protection; another is focusing on building a product as niche as possible.
The freedom and customizability offered by open source AI also its price. Companies will need to decide, all on their own, the parameters that are typically taken care of, said Lambert. These include who has access to the model, what kinds of data are used and the specific functions that are possible with that data.
In short, building a tool from open source requires significant programming expertise and effort—barriers that Lambert predicts will be too high for many marketing companies.
“Open source may be more catered to an organization, but it can be a severe IT headache,” she said.