Is it good way to create two ways of 'dziewiec' as shown in my file? --- (Edited on 11/18/2009 am [GMT-0600] by johnyjj2) --- You can use both systems. It is possible to convert Ralf's Polish dictionary (PLS format) into Sphinx format. with Notepad :- remove the XML elements (with Search/Replace);- convert the e Speak phonemes into Arpabet phonemes (with Search/Replace).How should I enlarge .filler file to take into account mouth lapping, clearing one's throat etc.? If yes, can I simply use exactly the same transcriptions and audio files as in 'train'? There is no encoding problem because e Speak and Arpabet phonemes only consist of US-ASCII characters.

Your small Sphinx pronunciation dictionary doesn't have to catch every detail (= phones) of the Polish language.

You just have to catch the major characteristics (= phonemes). If you don't find the specific sound in the Arpabet table, then create your own "Arpabet" sound. You have to decide which Polish phonemes your dictionary should have (and which phones you want to omit).

If you are using just US-ASCII (for the polish words) and Arpabet (for the polish phonemes), you should be fine. When you are ready with your Polish pronunciation dictionary (Sphinx format), can you please post a link to it? Regards, Ralf --- (Edited on 2009-11-16 am [GMT-0600] by ralfherzog) --- And you should use only US-ASCII characters ("PIEC" instead of "pięć"). However for computer it wouldn't be good idea because it can lead to many disambiguities.

This approach is a very good start because you avoid encoding issues. For example there exist both words piec and pięć but they have different meaning.

I looked at docs of HTK and Julius and I found those better because they consist on one big pdf file.

