How to Translate the Entire Excel File?

Loger
3 min readDec 9, 2023

I. What is an Excel file?

Excel is a spreadsheet software developed and released by Microsoft.

It provides powerful data processing and analysis capabilities, allowing users to manage and calculate data, create charts and reports, and perform various data analysis tasks.

An Excel file consists of one or more spreadsheets, where users can input and edit data in cells and utilize built-in functions and formulas for calculations.

Excel also supports functions such as data sorting, filtering, charting, as well as connecting and importing from external data sources.

It is widely used office software in various fields and industries.

II. Common Ways of Translating Excel

1. Translation Using Functions

The official Excel function translation tool allows the translation of specified text by calling translation functions. However, translating the entire file can be cumbersome.

2. Using Tools for One-Click Translation

If you want to create your own Excel translation program, the following text reading tips might be useful to you.

III. How to Read and Translate Text Used in Excel Files

A perfect Excel translation should extract and translate all text while retaining the original formats.

1. Problem

There are many libraries available in Node.js for reading xlsx files, such as xlsx, exceljs, and node-xlsx.

However, apart from Microsoft Office, no other program can guarantee full compatibility with all Excel features.

This means that using these libraries to read and regenerate Excel files may result in the loss of some functions.

2. Solution

The main issue lies in the format conversion process, which can cause changes to the document content.

If the location of the text in the file can be identified and replaced with the translated text, the original format and content, such as formulas, can be perfectly preserved.

Firstly, it is necessary to understand the structure of the .xlsx file.

3. Structure of the .xlsx File

An .xlsx file is essentially a compressed file in ZIP format, containing multiple directories and files.

The main directories and files include:

  • _rels directory: Contains files related to file relationships.
  • docProps directory: Contains files related to document properties, such as core properties and extended properties.
  • xl directory: Contains files related to the Excel workbook.
  • [Content_Types].xml: Defines the content types of various parts in the file.

Under the xl directory, there is a special file called xl/sharedStrings.xml. The “t” node inside this file contains all the text used in the Excel file.

4. Reading and Modifying Text in Excel

Third-party libraries required:

jszip: A library for creating, reading, and manipulating ZIP files in JavaScript.

xmldom: A lightweight XML parsing library for JavaScript that enables the creation, modification, and traversal of XML document nodes.

4.1. Reading Excel File Using jszip

const excelFile = 'excel file path'
const fileBuffer = fs.readFileSync(excelFile)
const zip = await JSZip.loadAsync(fileBuffer)

4.2. Reading Content of sharedStrings.xml Using xmldom

let xml = await zip.file('xl/sharedStrings.xml')?.async('string')
const doc = parser.parseFromString(xml, 'application/xml')

4.3. Accessing and Translating All Text Nodes

// Access all text nodes in order
const nodes = doc.getElementsByTagName('t')
for (let i = 0; i < nodes.length; i++) {
const node = nodes[i]
// Translate node.textContent using ChatGPT..., where translate represents the desired translation function that can be performed using ChatGPT
node.textContent = translate(node.textContent)
}

4.4. Replacing the translated sharedStrings.xml in the compressed file

const serializer = new XMLSerializer()
const modifiedXml = serializer.serializeToString(doc)
await zip.file('xl/sharedStrings.xml', modifiedXml, {
compression: 'DEFLATE',
compressionOptions: { level: 3 }
})

// gen the translated excel fiel
fs.writeFileSync("translated.xlsx", await zip.generateAsync({ type: 'nodebuffer' }))

By following these steps, you will obtain a perfectly translated file.

5. Optimization Suggestions

  • Translate the names of worksheets, which can be found in the workbook.xml.
  • sharedStrings.xml contains some application-specific names. When translating Excel files containing a large number of formulas, it is important to avoid translating relevant reference names.
  • Translate text using the contextual understanding approach of ChatGPT, rather than translating each instance of text separately. There is significant topic for optimization in this approach, but it requires a thorough process.

I will continue to share some code about translating various file formats in the future. If you are interested, please follow me. Thank you.

--

--

Choose a job you love, and you will never have to work a day in your life.