lang icon En
Dec. 12, 2024, 9:04 a.m.
3561

Harvard to Release Dataset of 1 Million Public-Domain Books for AI Training

Brief news summary

Harvard University is planning to release a dataset featuring around 1 million public-domain books. These works, spanning various genres and languages, include authors like Dickens, Dante, and Shakespeare, and are no longer under copyright due to their age. The release date and method for this dataset are still unconfirmed. The books are sourced from Google's extensive book-scanning project, Google Books, and Google will aid in the distribution of this valuable collection. Harvard introduced the Institutional Data Initiative (IDI) in March, aiming to establish a reliable source of legal data for AI purposes. Today marks the formal launch of the IDI, revealing financial support from Microsoft and OpenAI. This initiative underscores the high costs associated with AI training data, often affordable only to large tech companies. The project seeks to make essential data more accessible, harnessing Google's collaboration to maximize the reach of this impressive dataset.

Training data for AI can be quite costly, often accessible primarily to wealthy tech companies. To counter this, Harvard University intends to publish a dataset of around 1 million public-domain books.

These books, written by authors like Dickens, Dante, and Shakespeare, are out of copyright due to their age and cover various genres and languages. The dataset isn't available yet, and details on its release remain unclear. The books come from Google's long-standing book-scanning project, Google Books, and Google will assist in making this "treasure trove" widely accessible. Harvard announced the Institutional Data Initiative (IDI) in March, which aims to provide a "trusted conduit for legal data for AI. " Until today, details were scarce, but now it's confirmed that IDI is supported financially by Microsoft and OpenAI.


Watch video about

Harvard to Release Dataset of 1 Million Public-Domain Books for AI Training

Try our premium solution and start getting clients — at no cost to you

Content creator image

I'm your Content Creator.
Let’s make a post or video and publish it on any social media — ready?

Language

Hot news

April 4, 2026, 10:27 a.m.

Bidview Marketing's Cameron LiButti Discusses the…

In recent years, the field of search engine optimization (SEO) has undergone significant changes, especially with the rapid advancements in artificial intelligence (AI).

April 4, 2026, 10:26 a.m.

Smmwiz.com Identified as the Leading SMM Panel In…

By 2026, social media stands as one of the most competitive and performance-focused digital arenas.

April 4, 2026, 10:22 a.m.

Perplexity AI Faces Class-Action Suit Over Secret…

Perplexity AI is facing a proposed class-action lawsuit filed in the U.S. District Court for the Northern District of California in San Francisco.

April 4, 2026, 10:18 a.m.

OpenAI and Anthropic Expand Sales Teams Amid AI M…

OpenAI expanded its enterprise sales team dramatically from 10 to 500 employees in under two years, with Anthropic rapidly following suit, targeting $20 billion to $26 billion in revenue by 2026.

April 4, 2026, 6:28 a.m.

Z.ai Goes Public on Hong Kong Stock Exchange

Z.ai, previously known as Zhipu AI, has reached a major milestone by becoming the first prominent large language model (LLM) company from China to be publicly listed on the Hong Kong Stock Exchange.

April 4, 2026, 6:15 a.m.

Gartner Predicts AI-Driven Sales Enablement Will …

A recent study by Gartner, Inc., a leading business and technology insights firm, reveals that sales organizations adopting AI-driven enablement functions are set to significantly speed up their sales processes.

April 4, 2026, 6:15 a.m.

Google Tests AI-Generated Headline Rewrites in Se…

Google has recently confirmed it is conducting a limited experimental test using artificial intelligence (AI) to generate rewritten headlines for traditional Search results.

All news

AI Company

Launch your AI-powered team to automate Marketing, Sales & Growth

AI Company welcome image

and get clients on autopilot — from social media and search engines. No ads needed

Begin getting your first leads today