In many ways, times have never been better for open-source software. Open source code is everywhere, enterprises have a strong preference for open source, and open source has become central to the digital economy.
Yet there's a rapidly maturing technology that has the potential to create a huge new set of challenges for open source: generative AI. As generative AI technologies — like the ones behind Copilot and ChatGPT — become increasingly prevalent, open-source communities face a growing risk that they will lose influence within some of the most important sectors of the software economy.
Here's why generative AI poses an unprecedented threat to open source, and what open source communities can do in response.
The main reason why generative AI technology threatens open source isn't that the code behind most major generative AI tools isn't open (although it's not — solutions like ChatGPT are closed-source). Open source projects could easily write algorithms designed to emulate the ones behind generative AI tools — which they are already doing.
Instead, the problem is that most open-source communities don't have the resources necessary to train generative AI algorithms effectively. Producing generative AI software requires more than code. It also requires the ability to collect and analyze massive amounts of data. To do that, you need massive amounts of computing power, which necessitates massive amounts of money — something most open-source projects lack.
Closed-source generative AI companies aren't subject to this challenge because they have deep pockets or venture capital to pay for AI training. Tools like ChatGPT were trained by parsing millions and millions of records from the internet, which was feasible because OpenAI, the developer of ChatGPT, has raised billions of dollars in funding.
Thus, if there is no open-source alternative to ChatGPT, it will be because open-source communities lack the resources necessary to perform AI training. Open-source developers can produce the code behind generative AI, but that's only one of the two fundamental ingredients in modern generative AI technology.
In this respect, generative AI creates a fundamentally new problem that open source has never faced before — despite having contended with a variety of other challenges.
Originally, open-source developers had to prove that their projects could produce high-quality software that worked at least as well as closed-source alternatives. They achieved that by the 2000s when open-source platforms like Linux became widespread.
Then, with the rise of cloud computing, open source projects faced the challenge that cloud architectures undercut the freedoms that open source is supposed to ensure. Nonetheless, open-source communities have managed to become very influential in the cloud; although the major public cloud platforms are mostly closed-source, they rely heavily on key open-source technologies such as Kubernetes to deliver their services. And there are plenty of important open-source cloud platforms. Open source has conquered the cloud.
But will open source conquer generative AI? I have my doubts. We'll see plenty of open-source generative AI algorithms, but I'm not sure who — if anyone — is going to pay for the AI training that those algorithms need to go head-to-head with competing closed-source technology.
There is one scenario where I can envision open-source projects creating viable alternatives to closed-source generative AI technologies. It involves large businesses providing the funding or infrastructure that open-source coders need to train AI models. A company like Google or IBM, for example, might decide to support an open-source generative AI project by helping it complete training.
This approach could work in creating an open-source alternative to tools like ChatGPT. The caveat, of course, is that it would allow big companies to wield outside influence over the open-source versions of generative AI technology.
That's a trend that's already happening in other open-source projects; for example, Google has historically played a key role in Kubernetes development, which arguably gives Google influence over Kubernetes product direction and feature development, even though it's an open-source project.
There's nothing inherently wrong with this, but it does raise questions about how "open" open source really is when large companies toss around resources as a way of influencing what gets developed and what doesn't. I worry that open-source generative AI would lose much of its potential if it ends up under the heavy influence of certain companies, rather than being an organic, community-centred endeavour.
Maybe open-source communities will find creative ways to work past the challenges posed by generative AI. Open source has proved surprisingly resilient in the past, and a lack of financial resources didn't prevent projects such as Linux from becoming massively successful.
Still, the training requirements of generative AI mean that open-source projects focusing on this niche are operating in uncharted waters. They'll need to think strategically if they want to create usable generative AI tools without selling out to large businesses.
Follow us on LinkedIn
Kamal Rastogi is a serial IT entrepreneur with 25 yrs plus experience. Currently his focus area is Data Science business, ERP Consulting, IT Staffing and Experttal.com (Fastest growing US based platform to hire verified / Risk Compliant Expert IT resources from talent rich countries like India, Romania, Philippines etc...directly). His firms service clients like KPMG, Deloitte, EnY, Samsung, Wipro, NCR Corporation etc in India and USA.