*********************************************************************** IBM TEXT-TO-SPEECH TTS RUN TIME KIT Version 6.4.0.3 Readme (win32.readme.6.4.0.3.txt) Copyright IBM Corporation, 2002. All Rights Reserved *********************************************************************** CONTENTS -------- 1. Company 2. Product 3. Version 4. Description 5. Contact Information 6. Upgrade Information 7. What's New 8. Installation Requirements 9. End-User Installation Instructions 10. ISV Installation Instructions 11. Working with Concatenative Voices 12. Uninstall Instructions 13. General Limitations and Comments 14. Known Problems & F.A.Q. 15. Developer Notes 16. Memory and Performance Tools 17. Logging Utilities 18. Trademark Information 1. COMPANY ----------- International Business Machines Corporation (IBM) 2. PRODUCT ----------- IBM Text-to-Speech Run Time Kit 3. VERSION ----------- IBM Text-to-Speech TTS Run Time Kit, Version 6.4.0.3 4. DESCRIPTION ---------------- IBM Text-to-Speech Run Time Kit, provides the speech synthesis engine and components necessary for applications to produce speech. IBM Text-to-Speech Run Time Kit, Version 6.4.0.3 produces speech from recordings of units of human speech. These units (possibly phonemes, syllables, words, or phrases) are then combined (concatenated) according to linguistic rules formulated from analyzed text. When these recorded speech units are entire phrases or sentences, the output can be very natural, human-sounding speech. The components for the Text-to-Speech Run Time Kit include: Speech synthesis engine Data Sets (Per Language): Voice 1 Adult male 8 KHz Voice 2 Adult female 8 KHz Voice 4 Adult male 8 KHz for U.S English Only The Speech synthesis engine and data include capability for a concatenative voice dataset representation as well as for synthesized voice representation. The concatenative voice is derived from a professional speaker, speaking a particular language and dialect, recorded at a particular sampling rate. When a client program changes languages, and it is doing concatenative synthesis, a new voice dataset may have to be loaded into memory from disk, if it is not already cached in memory from previous usage. The system will automatically choose concatenative synthesis if a voice data set is available for the language, voice, and sample rate that you select. For example, if you are using English at 8KHz, with voice 1 and U.S. English voice 1 at 8Khz has been installed, then the system will automatically do concatenative synthesis. Otherwise, the system will do formant synthesis. When concatenation is being done, ECI voice selections appear to the concatenative engine as requests to switch between already-loaded voice datasets, while voice attribute settings appear as changes in the phonetic and acoustic data that it receives. 5. CONTACT INFORMATION ----------------------- Please visit our Web site for enhancements and updates to Text-to-Speech. http://www.software.ibm.com/speech/dev 6. UPGRADE PATH TO FULL VERSION -------------------------------- The full version is currently included. 7. WHAT'S NEW -------------- This version of Text-to-Speech includes support for custom filters. An e-mail filter is provided that will convert e-mail messages into a more natural format.Please refer to the Text-to-Speech SDK for more information on implementing and using custom filters. 8. INSTALLATION REQUIREMENTS ----------------------------- Hardware: Formant - Processor performance equivalent to Intel Pentium 133MHz with MMX with 256K L2 cache - 48MB of RAM in total - 10MB available hard disk space - Compatible 16 bit sound card - CD-ROM drive Note: Formant functionality is supported under: Windows 98 Windows 2000 Windows NT 4.0 Windows Millennium Windows XP Concatenative - Processor performance equivalent to Intel Pentium III 266MHz - 48MB of RAM plus 150MB of RAM per Concatenative Voice loaded - 10MB available hard disk space + 150 MB Per Concatenative Voice, except Chinese which requires 300 MB for each Concatenative Voice. - Compatible 16 bit sound card - CD-ROM drive Note: Concatenative functionality is only supported under: Windows 2000 with Service Pack 1 Windows NT 4.0 with Service Pack 6 Windows XP 9. END-USER INSTALLATION INSTRUCTIONS -------------------------------------- Run setup.exe from the installation media. Follow the instructions presented to you. You may be prompted to install concatenative voices. Select the voices to be used with concatenative voice synthesis. 10. ISV INSTALLATION INSTRUCTIONS ---------------------------------- If you are deploying applications using the IBM Text-to-Speech Run Time Kit, you must obtain a licence from IBM for redistribution. In addition, you will want to integrate our product installation with your product's installation program. You will need to copy the redistributable TTS driver to your installation media and invoke setup.exe. The IBM Text-to-Speech Run Time Kit installation program setup.exe, takes the following command line arguments: setup.exe [installPath] [/silent] [/hideaddremove] [/nr] [/ns] [/nl] [/nk] [-SMS] [/statusnone] [/statusold] [/concatall] [/concatnone] -lXXXX -l (Lower Case L) requires the the following XXXX language code 0003-Catalan 0005-Czech 0006-Danish 0007-German 0008-Greek 0009-English 000a-Spanish 000b-Finnish 000e-Hungarian 0010-Italian 0011-Japanese 0012-Korean 0013-Dutch 0014-Norwegian 0015-Polish 0019-Russian 001a-Croatian 001b-Slovak 001d-Swedish 001e-Thai 001f-Turkish 0021-Indonesian 0024-Slovenian 002d-Basque 0404-Chinese (Taiwan) 040c-French (Standard)0416-Portuguese (Brazilian) 0804-Chinese (PRC) 0816-Portuguese (Standard) 0c0c-French (Canadian) **Note due to an InstallShield limitation, if you are using DoInstall you must specify the same language as the parent installation. See IS document Q144122. can contain spaces and is a fully qualified path. No quotes should be placed around the path. Path will be ignored if TTS is already on the system. If a path is provided on the command line, the choose directory dialog will not be shown. /silent Prevent everything except the path dialog from appearing. If voice data is detected it too will ask which voices to install regardless of this parameter. /hideaddremove Deletes the Add/Remove program entry from the control panel. /nr No reboot message and subsequent reboot. If a calling application executes our install with a GUI, the calling install may perform additional logic. The calling install should then reboot if TTS requests. Please see appendix 2 for how to determine whether TTS requires a reboot. TTS functionality will not work until the requested reboot is carried out. If the /silent option is used /nr is redundant. [-SMS] This switch prevents a network connection and Setup.exe from closing before the installation is complete. The switch works with installations originating from a Windows NT server over a network. Please note that SMS must be uppercase. This switch is case-sensitive. /statusold By default, the TTS install will show a large progress bar dialog box. To display the small dialog box, use the /statusold option. /statusnone To turn off the status box altogether, use the option /statusnone. /concatall Install all concatenative voices. Check return codes for out of space. /concatnone To not install any of the concatenative voices. [Redundant but still supported for backwards compatibility] /nk do not hide add remove (now default behavior) /nl no license (no license now packaged). /ns (silent install) *Please note the language parameter is not optional. A minimal amount of change is required to make old installations work. 11. Working with Concatenative Voices -------------------------------------- During installation you may install concatenative voices from the selection presented to you. Due to disk space issues or for periodic updates, you may wish to add, remove, or relocate a concatenative voice. To add a voice, rerun the installation selecting the voice you wish to add.To remove a voice you must unregister the voice then manually delete it from the \voices\\ directory. To relocate a voice or update a voice from a downloaded file you must register the location of the voice using the inivoice.exe utility. inivoice.exe [-u] For example, to move voice 1 from TTS's default installation path to F:\TTSVoices\us\1. Move the data files and then invoke the following command: C:>inivoice.exe 1 "F:\TTSVoices\us\1at8000KHz_1_0\synthinfo" To unregister a voice with the system use the -u command. C:>inivoice.exe -u 1 "F:\TTSVoices\us\1at8000KHz_1_0\synthinfo" Note: Concatenative voices allow the following parameters to be adjusted at run time: - Volume - Pitch Baseline* - Speed - Pitch Fluctuation* * Applies only to some voices The following parameters are not changeable for concatenative voices: - Gender - Sample Rate (see section 4 above) - Head Size - Roughness - Breathiness If a change is executed to one of the above (not changeable parameters), no error will occur and the voice synthesis will not change. In concatenative TTS, when you change languages, the voice characteristics are set to the default values for the currently active voice. As a result, if you've modified the speed or volume, and do a language change, the speed and volume will revert to the default for the voice. 12. UNINSTALL INSTRUCTIONS --------------------------- To uninstall the Text-to-Speech Run Time Kit: Open Control Panel Select Add Remove Programs Select the entry for IBM Text-to-Speech Runtime (for the appropriate language) You will be guided through the uninstall process. 13. GENERAL LIMITATIONS AND COMMENTS ------------------------------------- This section contains information that is not specific to any particular element of the Text-to-Speech Run Time Kit but is general or generic in nature. It is very important to heed these warnings and follow the instructions given to avoid abnormal or unpredictable results. * Currently, only 8 KHz concatenative voices are provided. Application programmers requiring higher quality audio should upgrade their voice datasets. For more information visit the IBM Text-to-Speech home page. * Currently, Version 6.4.0.3 supports the following languages with formant voices (Note: languages with a * denote formant and concatenative voice support): Brazilian Portuguese* French* Canadian French* Finnish German* United States English* United Kingdom English* Spanish* Mexican Spanish Italian* Chinese Simplified* Chinese Traditional* Japanese* * Currently, the included e-mail filter is only available for the English language. * The email filter included with IBM Text-to-Speech recognizes the following keywords in an email message: Keyword Action ------- ------ Subject: Parse out the subject of the message and return a new subject string to the client application. To: Filter out lines until a recognized keyword is encountered. From: Parse out the sender of the message and return a new string to the client application. Date: Parse out the date that the message was sent and return a new string with that date to the client application. Sent: Parse out the date that the message was sent and return a new string with that date to the client application. Alternate-Recipient: Filter out the current line. Mime-Version: Filter out the current line. Return-Path: Filter out the current line. MR-Received: Filter out the current line. Content-Type: Filter out lines until a recognized keyword is encountered. Content-Transfer-Encoding: Filter out the current line. Posting-Date: Filter out the current line. Importance: Filter out the current line. Priority: Filter out the current line. Sensitivity: Filter out the current line. UA-Content-ID: Filter out the current line. X400-MTS-Identifier: Filter out the current line. A1-Type: Filter out the current line. Hop-Count: Filter out the current line. Content-Disposition: Filter out the current line. Delivered-To: Filter out the current line. X-Originating-IP: Filter out the current line. X-OriginalArrivalTime: Filter out the current line. Full-Name: Filter out the current line. X-Mailer: Filter out the current line. CC: Filter out the current line. Filetime= Filter out lines until a recognized keyword is encountered. X-Apparently-To: Filter out the current line. Content-Length: Filter out the current line. Auto-Submitted: Filter out the current line Status: Filter out the current line Received: Filter out lines until a recognized keyword is encountered. * The included e-mail filter will also filter the following "emoticons" from messages: (R) (C) :-) :-( :-] :) ;) :-#| :( :-> :-< :-\\ (-: >:-< :-| :-o :-c |-) |-O :-# :-% :-& :-'| :-)' :-)8 :-* :-/ :-: :-? :-@ (:I :-[ *:o) +-(:-).-) <:I @:I [:-|] 8-# 8:-) }(:-( :-{ :-{( :-} :-O :-6 :-8( :-9 :-D :-e :-i :-p :-t :-v ::-) 8-) :<| :=) :>) :~) ;-) %-) (-) (:-) )8-) *-( *<|:-)-:-) ;-\\ =:-) [:-) O-) 8-| {(:-){:-) * The eciUpdateFilter function for the included e-mail filter only supports changing the behavior for the "From:", "Date:", and "Subject:" fields. * The Text-to-Speech SDK includes a file "maildict.dct" that includes translations for common e-mail jargon and abbreviations. For best results when processing e-mail messages, this dictionary file should be used in conjunction with the included e-mail filter. ========= inifilter The inifilter tool registers and unregisters filters which are used as preprocessor addins for eci to modify text. inifilter [-ul] /filter:[filterNum] /path:[filterPath] /autoload:[y/n] /lang:[lang] /ECIINI:[IniPath] -u Disable specified filter -l Display statistics about specified filter filter Filter number path Fully qualified filename of filter autoload Filter is automatically loaded when language selected Valid values are: n Filter is not automatically loaded y Filter is automatically loaded lang Language/Dialect for the filter Valid language/dialect values are: 1.0 - US English 1.1 - British English 2.0 - Castilian Spanish 2.1 - Mexican Spanish 3.0 - Standard French 3.1 - Canadian French 4.0 - Standard German 5.0 - Standard Italian 6.0 - Mandarin Chinese 6.1 - Taiwanese Chinese 7.0 - Brazilian Portuguese 8.0 - Standard Japanese 9.0 - Standard Finnish 13.0 - Standard Norwegian 14.0 - Standard Swedish 15.0 - Standard Danish ECIINI Path to ECIINI file (not used on Windows platforms) ECIINI environment variable used on other platforms if ommitted NOTE: If -u is specified, only the language, filter and INI file may be specified. 14. KNOW PROBLEMS & F.A.Q. -------------------------- The following are known problems that are included in this release: * If you are upgrading from TTS version 4.7 to TTS Version 6.4.0.3, you will need to remove TTS version 4.7 prior to installing TTS Version 6.4.0.3. * On Windows XP, and Windows 2000 non-administrator users may receive error messages pertaining to the InstallShield engine not being able to register. You will need to have the proper access permissions to properly install. * On Windows XP, and Windows 2000 you must have proper access permissions to run the command line tools (inicache, inifilter, inivoice, and initrace). If you do not have the proper access permissions, there is no error message, and your changes will not be made. * Setting the pitch baseline after setting head size may return an error in certain situations. * The installation copies a large amount of data from the installation media. During the copy process, very little screen activity is visible. * If multiple versions of TTS are to be installed on the same system, you should install all versions of TTS to the same directory. F.A.Q ----- Q: Why is my application still synthesizing with format synthesis. A: When you install an 8KHz voice the system will produce concatenative synthesis for any application which requests synthesis at 8KHz. By default the system generates audio at 11KHz. In order to produce concatenative speech use eciSetParam to set the sample rate. Also, check that version 5.0 was not installed after version 6.4.0.3 if both version reside on the same machine. 15. DEVELOPER NOTES -------------------- * The Text-to-Speech SDK is a good starting point for developing applications. * Using SAPI programs with concatenative synthesis If you have an 8K concatenative voice installed, and you select a SAPI voice that has been optimized for the telephone ("tel" in the name and speaker fields, and 0x200 in the available feature field), you will experience a delay while the concatenative voice data is loaded into memory. This delay is considerably shortened the second and subsequent times that you access the same voice, as the IBM Concatenative Memory Manager (CMM) caches voices for a period of time before flushing them from memory. * Concatenative Memory Manager (CMM) cmmcmd Utility A support utility called cmmcmd was created to interface with the Concatenative Memory Manager (CMM). Note : This is a support tool and was not intended to be an end user utility. Invoke cmmcmd as follows: cmmcmd shutdown -- shuts down the CMM cmmcmd timeout ## -- sets the CMM timeout to ## seconds 16. Memory and Performance Tools ---------------------------------- Due to the computational complexity and amount of memory required to produce concatenative speech, IBM Text-to-Speech utilizes shared memory and speech caching to reduce the amount of system resources required. * The concatenative TTS engine requires more physical memory (to store the data required to produce natural speech synthesis) than formant synthesis. Since many processes on a server may require access to the same data, IBM Text-to-Speech loads and shares one instance between all the processes. In addition, IBM Text-to-Speech allows configuration of how long a data will remain loaded after the last access. By default, each concatenative voice remains loaded for 10 minutes. To configure and stop sharing the memory the Concatenative Memory Manager (CMM) utility, cmmcmd.exe, is provided: cmmcmd { shutdown | timeout [secs] } shutdown - shut down the server immediately. timeout [secs] - get/set the server time-out to the specified number of seconds. If secs is 0 or omitted the current shut down time-out is returned. * The concatenative TTS engine requires more computational power than formant TTS engine. Since the domains of many TTS applications are limited to a small vocabulary, IBM Text-to-Speech now provides a mechanism (speech caching) to bypass complex computations for text which has already been processed. The concatenative system can be configured, per language, to set a number of phrases 'to remember' as pre-synthesized phrases. In addition, the memory can be made persistent (that is, saved on exit and reloaded at voice initialization). By default no caching is performed. To enable and configure speech caching, the utility inicache.exe is provided: inicache [-ul] [-p][-n] lang [phrases] [INI] -u Disable voice caching -l Display current voice cache values -p Cache file is persistent (saved for future use) -n Cache file is not persistent lang Language/Dialect for the voice cache Valid language/dialect values are: 1.0 - US English 1.1 - British English 2.0 - Castilian Spanish 2.1 - Mexican Spanish 3.0 - Standard French 3.1 - Canadian French 4.0 - Standard German 5.0 - Standard Italian 6.0 - Simplified Chinese 6.0d - Simplified Chinese (dual language) 6.1 - Traditional Chinese 6.1d - Traditional Chinese (dual language) 7.0 - Brazilian Portuguese 8.0 - Standard Japanese 9.0 - Standard Finnish 13.0 - Standard Norwegian 14.0 - Standard Swedish 15.0 - Standard Danish phrases Maximum number of phrases in the voice cache INI Path to ECIINI file (not used on Windows platforms) ECIINI environment variable used on other platforms if ommitted NOTE: If -u is specified, only the language and INI file may be specified 17. Logging Utilities ---------------------- Often logs must be produced for auditing, technical support, and diagnostic purposes. The logging provided by IBM Text-to-Speech is extremely verbose and is primarily for technical support and diagnostic purposes. To enable and configure the logging utility the utility initrace.exe is provided: initrace level [file] [INI] level Tracing level [0 = off, 1 = on] file Name of trace file Do not specify trace file if level is 0 INI Path to ECIINI file (not used on Windows platforms) ECIINI environment variable used on other platforms if omitted NOTE: Paths that include spaces much be enclosed in double-quotes. 18. TRADEMARK INFORMATION -------------------------- IBM is a registered trademark or trademark of International Business Machines Corporation in the United States and other countries. Microsoft, Windows, Windows NT, Windows 95, Windows 98, Windows XP, and Windows 2000 logo are trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries. All other names are registered trademarks, trademarks or service marks of their respective companies. Doc Number: win32.readme.6.4.0.3.txt.050302