Split and Merge Translation XML

 

comment posted separate, translate, and import back in

Pablo 08-Dec-2011

I work for the translation department at my company. We normally receive monolingual XML files for translation. In such cases, we just make copies of these files and translate each set to the corresponding target languages. In this file, the text in each node <INFO> under <LANGUAGE>EN</LANGUAGE> should be translated into the rest of the languages (DA, DE, ES, FI, NL, NO, SV). What I need to do is the following:

  1. For each target language, extract all the strings in the source language into a separate XML. I would have one XML file per target language.
  2. The XML files are translated into the corresponding languages.
  3. The translated XML files are imported back into the original XML. In this way, I would obtain a multilingual XML, with the source strings and the corresponding translations for the different languages.
<?xml version="1.0" encoding="UTF-8" ?>
<INFORMATION>
  <FUNDOBJECTIVE>
    <FUNDOBJECTIVEDATA ID="1">
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>EN</LANGUAGE>
        <INFO>Text to be translated.</INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>DA</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>DE</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>ES</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>FI</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>NL</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>NO</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>SV</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
    </FUNDOBJECTIVEDATA>
    <FUNDOBJECTIVEDATA ID="2">
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>EN</LANGUAGE>
        <INFO>More text to be translated.</INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>DA</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>DE</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>ES</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>FI</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>NL</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>NO</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
      <FUNDOBJECTIVEDATAITEM>
        <LANGUAGE>SV</LANGUAGE>
        <INFO></INFO>
      </FUNDOBJECTIVEDATAITEM>
    </FUNDOBJECTIVEDATA>
  </FUNDOBJECTIVE>
</INFORMATION>

Could you please let me know if any of your products would allow me to do the steps 1 and 3 mentioned above? I know step 1 can be done, as I've seen similar examples. What about step 3? How complex would it be to import the translations back? I'm not a programmer -- I have a basic programming knowledge. Does your product require an advanced level of programming?

I wrote two scripts for Pablo (he reported back that they worked perfectly), one for split and one for merge afterwards. To try them out:

  1. install the free firstobject XML editor
  2. go to File New Program and paste the split function in and save the script as say "translate_split.foal"
  3. go to File New Program and paste the merge function in and save the script as say "translate_merge.foal"
  4. Adjust the folder value in both scripts to the folder where you wish to do your processing
  5. put your sample.xml in that folder and name it translate_input.xml
  6. Open the split script and press F9, you will see new files named translate_to_AA.xml
  7. Perform translations and gather files back to this folder
  8. Open the merge script and press F9, you will see the file named translate_merged.xml
split()
{
  str sFolder = [["C:\Temp\"]];
  CMarkup mInput, mOutput;
  mInput.Load(sFolder+"translate_input.xml");
  int nIDCount = 0;
  while (mInput.FindElem("//FUNDOBJECTIVEDATA"))
  {
    // Extract the ID for this data
    str sID = mInput.GetAttrib("ID");
    
    // The first item must be EN
    mInput.IntoElem();
    mInput.FindElem("FUNDOBJECTIVEDATAITEM");
    mInput.FindChildElem("LANGUAGE");
    if (mInput.GetChildData() != "EN")
      return "unexpected: first data item under ID " + sID + " is not EN";
    mInput.FindChildElem("INFO");
    str sInfo = mInput.GetChildData();

    // Generate data elements for subsequent languages
    ++nIDCount;
    while (mInput.FindElem("FUNDOBJECTIVEDATAITEM"))
    {
      mInput.FindChildElem("LANGUAGE");
      str sLang = mInput.GetChildData();
      if (! mOutput.RestorePos(sLang))
      {
        mOutput.ResetPos();
        mOutput.AddElem("INFORMATION");
        mOutput.IntoElem();
        mOutput.AddElem("FUNDOBJECTIVE");
        mOutput.SavePos(sLang);
      }
      mOutput.AddChildElem("FUNDOBJECTIVEDATA");
      mOutput.IntoElem();
      mOutput.SetAttrib("ID",sID);
      mOutput.AddChildElem("LANGUAGE", sLang);
      mOutput.AddChildElem("INFO", sInfo);
    }
  }

  // Output files
  int nFileCount = 0;      
  mOutput.ResetPos();
  while (mOutput.FindElem("INFORMATION"))
  {
    CMarkup mOutputLang = mOutput.GetSubDoc();
    str sLang = mOutputLang.FindGetData("//LANGUAGE");        
    mOutputLang.Save(sFolder+"translate_to_" + sLang + ".xml");
    ++nFileCount;
  }
  
  return "Generated " + nFileCount + " files, " + nIDCount + " data elements per file";
}
merge()
{
  str sFolder = [["C:\Temp\"]];
  CMarkup mInput, mOutput;
  mInput.Load(sFolder + "translate_input.xml");
  
  // Loop through all translated files
  CMarkup mInputFiles = EnvFindFiles(sFolder+"translate_to_*.xml");
  mInputFiles.ResetPos();
  while (mInputFiles.FindElem())
  {
    CMarkup mTrans;
    mTrans.Load(sFolder + mInputFiles.GetData());
    str sLang = mTrans.FindGetData("//LANGUAGE");
    if (sLang == "")
    {
      return "language not found in " + mInputFiles.GetData();
    }
    
    // Loop through data of input and bring in items from this language
    mInput.ResetPos();
    while (mInput.FindElem("//FUNDOBJECTIVEDATA"))
    {
      str sID = mInput.GetAttrib("ID");
      mTrans.ResetPos();
      if (mTrans.FindElem("//FUNDOBJECTIVEDATA[@ID='" + sID + "']"))
      {
        str sInfo = mTrans.FindGetData("//INFO");

        // Locate corresponding language item to place translation into        
        mInput.IntoElem();
        while (mInput.FindElem())
        {
          mInput.FindChildElem("LANGUAGE");
          if (mInput.GetChildData() == sLang)
          {
            mInput.FindChildElem("INFO");
            mInput.SetChildData(sInfo);
            break;
          }
        }
      }
    }
  }
  mInput.Save(sFolder+"translate_merged.xml");
}

The easiest way to adjust and customize the scripts is to press F10 and run them line by line and see how they do what they do. Then look up any additional functions you need either with F1 or searching firstobject.com. There are also ways to set these scripts up as command line calls, see using the firstobject XML editor from the command line.

See also:

Split XML file into smaller pieces

Video of XML splitter script for splitting XML files