How to Translate EPUB Files?

Loger
3 min readDec 8, 2023

I. What is an EPUB File?

  1. An EPUB file is a file format used by electronic publishing businesses.
  2. It is commonly used to store and present electronic publications such as e-books, magazines, newspapers, etc.
  3. It has structured content, including text, images, tables, etc.

II. Structure of an EPUB File

Brief structure:

META-INF Folder: This folder contains the metadata information of the EPUB file, such as container file, encryption information, version number, etc.

OEBPS Folder: The OEBPS (Open eBook Publication Structure) folder is the core content folder of the EPUB file, containing various components of the e-book.

  • content.opf file: This is the main metadata file of the EPUB file, containing the description of the e-book, chapter structure, references to text files and media resources, etc.
  • toc.ncx file: This is the table of contents file of the EPUB file, defining the chapters and directory structure of the e-book, providing navigation and positioning functionality.
  • HTML files: The content of the EPUB file is usually presented in the form of HTML files, with each HTML file representing a chapter or page.
  • CSS files: The EPUB file can contain CSS files used for styling and layout, controlling the appearance and typography of the e-book.
  • Image, audio, and video files: The EPUB file can contain embedded image, audio, and video files for enriching content and interactive elements.
  • Other files: The EPUB file may also include other auxiliary files such as font files, style sheets, script files, etc., for customizing and enhancing the functionality and appearance of the e-book.

EPUB files use the ZIP compression format and adopt open-standard technologies such as XML, HTML, and CSS to achieve structured content, layout, and presentation. This file structure makes EPUB files easy to create, edit, distribute, and read, and provides a unified reading experience across different e-readers and platforms.

III. Extracting Translatable Text from EPUBs

Translating EPUBs while preserving the display format unchanged can be achieved by locating the position of all the text and then translating and filling in the corresponding positions.

Third-party libraries required:

jszip: A library for creating, reading, and manipulating ZIP files in JavaScript.

Cheerio: A fast, flexible, and lightweight HTML parsing and manipulation library based on jQuery.

1. Reading EPUB Files Using jszip

const epubFile = 'epub file path'
const fileBuffer = fs.readFileSync(epubFile)
const zip = await JSZip.loadAsync(fileBuffer)

2. Locating All HTML Files Based on File Extension

for (let filePath of Object.keys(zip.files)) {
if (filePath.endsWith('.html') || filePath.endsWith('.htm') || filePath.endsWith('.xhtml')) {
const html = await zip.file(filePath)?.async('string')
if (html) {
// read the text from html
}
}
}

3. Reading Text Nodes from HTML Using cheerio and Translating

const $ = cheerio.load(html)
for (let selector of ['body', 'head']) {
$(selector)
.find('*')
.contents()
.each(
function () {
// nodeType === 3 is the text node
if (this.nodeType === 3 && this.data.trim() !== '') {
// user ChatGPT to translate this.data ...
this.data = translateWithChatGPT(this.data)
}
}
)
}

4. Replacing the Translated HTML in the Compressed File

await zip.file(filePath, $.html({ xml: true }), {
compression: 'DEFLATE',
compressionOptions: { level: 3 }
})

EPUB is very strict in validating the format of HTML. If the format is incorrect, it will result in an “Invalid document” error.

The {xml:true} option ensures that there won’t be any mismatched tags in the HTML format.

5. Optimization Suggestions

  • Translating EPUB metadata (which includes information such as the book title) using a similar approach as HTML since metadata is in XML format.
  • Translating text in the context of ChatGPT rather than translating each individual text in HTML. This optimization has a significant potential for improvement and requires careful handling.

If you are looking for a tool to help you translate EPUB files using ChatGPT, you can have a look on

I will continue to share some code about translating various file formats in the future. If you are interested, please follow me. Thank you.

--

--

Choose a job you love, and you will never have to work a day in your life.