Pages in topic: < [1 2 3 4 5 6 7 8 9 10] > |
What's your opinion on machine translation and quality? Thread poster: Daniela Zambrini
|
Giles Watson Italy Local time: 12:11 Italian to English In memoriam Second-guessing a machine | Jul 8, 2014 |
Giovanni Guarnieri MITI, MIL wrote: but when the sentence is unusable, you apply the same thought process as in translation... in fact, you would develop a different skill, whilst preserving the original one... If the text is unusable, you need to look at the original, which you might as well have done in the first place without wasting time on gobbledegook generated by a lucky bag of algorithms. | | |
Not that much better than Google Translate - at least for Spanish | Jul 8, 2014 |
Kirti Vashee wrote: Source All studios are equipped with a small kitchen, fridge and separate bathroom. The hotels facilities include an outdoor swimming pool and a beauty parlour. Enjoy typical French cuisine in the traditional restaurant Aux Trois Cochons. The hotel is in the heart of the historical centre, near to all major attractions. The hotel is located at the heart of the Huangpu District, close to Nanpu Bridge. Apartments are in very good condition, well equipped and furnished to a very good standard. The rooms are also fully equipped with TV, Telephone, Air conditional, Refrigerator and mini bar. Spice Market Buffet offers a mix of oriental and western style cuisine. Contemporary and friendly, our Novotel Cafe will tempt you with its original and varied menu. The hotel always uses the flower arrangements in the lobby for its promotional activities. MT of above source sentences: Todos los estudios están equipados con una pequeña cocina, nevera y un baño independiente. Las instalaciones del hotel incluyen una piscina exterior y un salón de belleza. Disfrute de la típica cocina francesa en el restaurante tradicional Aux Trois Cochons. El hotel está en el corazón del centro histórico, cerca de todas las atracciones principales. El hotel está situado en el corazón del distrito de Huangpu, cerca de puente Nanpu. Los apartamentos están en muy buenas condiciones, bien equipados y amueblados a un nivel muy bueno. Las habitaciones también están completamente equipadas con TV, teléfono, aire acondicionado, nevera y minibar. Spice Market buffet ofrece una mezcla de cocina de estilo oriental y occidental. Coetáneo y amable, nuestra cafetería Novotel le tentará con sus originales y variados menús. El hotel siempre utiliza los arreglos florales en el vestíbulo para sus actividades promocionales. Google translate of the same: Todos los estudios están equipados con una pequeña cocina, nevera y baño separado. Las instalaciones del hotel incluyen una piscina al aire libre y un salón de belleza. Disfrute de la cocina típica francesa en el restaurante tradicional Aux Trois Cochons. El hotel está en el corazón del centro histórico, cerca de las principales atracciones. El hotel está situado en el corazón del distrito de Huangpu, cerca de Puente Nanpu. Los apartamentos están en muy buenas condiciones, bien equipadas y amuebladas a un muy buen nivel. Las habitaciones están totalmente equipadas con TV, teléfono, aire condicionado, nevera y mini bar. Spice Market Buffet ofrece una mezcla de cocina oriental y occidental de estilo. Contemporáneo y acogedor, nuestro Novotel Café ofrece un menú original y variado. El hotel siempre utiliza los arreglos florales en el vestíbulo para sus actividades promocionales.
Your MT is better than GT, but only marginally so, really. | | |
These samples are from a very small set of examples. On a large data set these small differences add up. It is one thing to say that there is little or no difference on a small sample set, and another thing to actually have a very accurate sense for how much more/less effort it would be to post-edit the different output for a large project. Travel is also one of Google's best domains as there is a lot of web content that can be crawled and use to learn patterns. Google is very com... See more These samples are from a very small set of examples. On a large data set these small differences add up. It is one thing to say that there is little or no difference on a small sample set, and another thing to actually have a very accurate sense for how much more/less effort it would be to post-edit the different output for a large project. Travel is also one of Google's best domains as there is a lot of web content that can be crawled and use to learn patterns. Google is very compelling on many domains for romance languages like ES, PT and even IT and explains why even Moses experiments can work for these languages. One key value of custom systems is that it is possible to correct specific error patterns and thus make the PEMT task easier and enhance the efficiency on very large projects. I am only able to show examples from systems where the clients allow it or from one our domain engines, but will try and get some other samples and post at a later date.
[Edited at 2014-07-08 21:59 GMT] ▲ Collapse | | |
Moses Experiments | Jul 8, 2014 |
Kirti Vashee wrote: Google is very compelling on many domains for romance languages like ES, PT and even IT and explains why even Moses experiments can work for these languages. Kirti, what do you mean with "Moses experiments", as I understand it, even the so called more advanced systems such as Asia Online or KantanMT are based on Moses. I do understand that you can only show us samples of engines where your clients agreed to publish them and I really appreciate that you are showing samples. For me the interesting part is, that I have now seen samples from various engines, from various "advanced" solutions (all EN -> DE) and they were all of similar quality. In my specialties (pharma and medical) and in my language pairs (En-DE, NL-DE), I can translate between 6-10 k words per day (using my CAT tool), producing a quality that is good enough that my customers kept coming back for years and are paying good rates. If I remember it correctly, a MT/PEMT system of publishable quality has a output of 8-12 k words a day (please correct me if I am wrong). So, why on earth should I switch to MT/PEMT or - coming from a different angle - would it not make sense to teach translators how to use their CAT tools better to help them improve their productivity. I am not arguing that MT has various useful applications, and I hope that it will soon be good enough to help me to increase my productivity, but up to now, I am kind of disappointed about the promises made for years by some MT marketing people and the actual results. | |
|
|
Moses and why bother with MT | Jul 9, 2014 |
Sigfried Moses is the very basic set of SMT build it your self tools that is widely used by NLP students at universities. While many commercial offerings are very closely related there are many other tools needed in addition to Moses to build successful MT systems. I have some definite ... See more Sigfried Moses is the very basic set of SMT build it your self tools that is widely used by NLP students at universities. While many commercial offerings are very closely related there are many other tools needed in addition to Moses to build successful MT systems. I have some definite opinions on this described here : http://kv-emptypages.blogspot.com/2011/12/moses-madness-and-dead-flowers.html You can also read the comments to hear other opinions. If you are doing 6-10K words per day with DE I think it will be some time before an MT system will offer you real benefits. The highest performance I have seen is in mature romance language systems where even 20K a day is possible. Very tightly focused (in terms of domain) DE systems could provide you with a boost but such a system takes time to develop and only makes sense if there is long-term work potential with it. But DE is considered a more difficult language to combine with EN. We have a customer who reported that he achieved 900 words/hour with an En to HU system which is even harder than DE, however they take great care to train the editors and also make sure the MT system reaches a state where the output makes this possible. You can hear him describe this at http://www.asiaonline.net/EN/Resources/Webinars/default.aspx#Webinars16 We have had good results with both DE to Slovenian and DE to Japanese. MT makes sense here as SME translators are harder to find in these kinds of language combinations. ▲ Collapse | | |
Some considerations... | Jul 9, 2014 |
The samples provided were very literally translated. If I heard those sentences I would obviously assume that it was spoken by someone who studied the language. Secondly, how many words can someone "post-edit/proofread" in one day? Because to me, post-editing/proofreading involves two steps: 1) Checking accuracy against original. 2) Removing any typographical errors, removing unnatural sounding expressions and ensuring the text has a smooth flowing style. ... See more The samples provided were very literally translated. If I heard those sentences I would obviously assume that it was spoken by someone who studied the language. Secondly, how many words can someone "post-edit/proofread" in one day? Because to me, post-editing/proofreading involves two steps: 1) Checking accuracy against original. 2) Removing any typographical errors, removing unnatural sounding expressions and ensuring the text has a smooth flowing style. Also when changing font size to accommodate for text so that it takes the same space given language expansion considerations, MT would not be able to recognize that and would give you "solamente" in Spanish when you could definitely use "sólo". In a hundred years it is possible that if only top translators, top editors, top programmers were involved in its development, I can see where it will make a real difference. MT vendors cater to big business and in order to make a profit hire not the best and most experienced translators (because expertise costs money) but young inexperienced translators eager to enter the profession or make a living during these tough economic times. MT has to watch out because "garbage in, garbage out"!!! Right now, MT is in its infancy even if MT companies refuse to admit it.
[Edited at 2014-07-09 08:17 GMT] ▲ Collapse | | |
neilmac Spain Local time: 12:11 Spanish to English + ... PS: Mindful MT | Jul 9, 2014 |
Here's the warning from the GT4T website: "Warning: do not use GT4T as a mindless machine translation tool. Use it to save key stroke, get translation options for phrases, and keep consistency." I find MT if used this way to be useful. Just another part of the tech I choose to use.
[Edited at 2014-07-09 09:41 GMT] | | |
Giles Watson wrote: Giovanni Guarnieri MITI, MIL wrote: but when the sentence is unusable, you apply the same thought process as in translation... in fact, you would develop a different skill, whilst preserving the original one... If the text is unusable, you need to look at the original, which you might as well have done in the first place without wasting time on gobbledegook generated by a lucky bag of algorithms. some of the text is unusable... some is usable... | |
|
|
Kirti Vashee wrote: We have a customer who reported that he achieved 900 words/hour I do 900 words/hour on some specific texts... and I'm not a machine... | | |
Clarification | Jul 9, 2014 |
Giovanni Guarnieri MITI, MIL wrote: Kirti Vashee wrote: We have a customer who reported that he achieved 900 words/hour I do 900 words/hour on some specific texts... and I'm not a machine... The person here was referring to a translator whose normal productivity is 250 words/hour was able to raise their output to 900 words/hour. Also this was for Life Sciences domain in English to Hungarian which is a very difficult language for MT | | |
How do computers "translate"? | Jul 9, 2014 |
This little video very quickly shows you how computers "translate" .. This is highly simplified but it very quickly explains how a computer learns https://www.youtube.com/watch?v=_ghMKb6iDMM You can see that training a computer to "translate" is really a data preparation and data analysis task at a corpus level rather than a segment level. You are looking for "good" p... See more This little video very quickly shows you how computers "translate" .. This is highly simplified but it very quickly explains how a computer learns https://www.youtube.com/watch?v=_ghMKb6iDMM You can see that training a computer to "translate" is really a data preparation and data analysis task at a corpus level rather than a segment level. You are looking for "good" patterns and trying to avoid "bad" patterns in large corpii. This process is much more complicated for some language combinations than others. So easy combinations would be Spanish to/from Italian since they have highly similar linguistic structures Difficult combinations would be English to Arabic, Chinese to English, English to Japanese since the two languages are so different in essential structures. Inflection and morphological differences are very hard to capture and going from a SVO to SOV language also requires special efforts. ▲ Collapse | | |
Samples from a Travel engine | Jul 9, 2014 |
English to Indonesian Bahasa Our well established restaurant serves a range of culinary delights. Cuisine options vary from Western to A La Carte menus. It is close to Mukden Palace and Liaoning Provincial Museum. Quest On Sturt is perfect for every type of traveller. Gili Villas are an exclusive resort of 4 stylish villas. All the 45 rooms are spread in the four stories of the building. For world class accommodation and amenities, stay at D... See more English to Indonesian Bahasa Our well established restaurant serves a range of culinary delights. Cuisine options vary from Western to A La Carte menus. It is close to Mukden Palace and Liaoning Provincial Museum. Quest On Sturt is perfect for every type of traveller. Gili Villas are an exclusive resort of 4 stylish villas. All the 45 rooms are spread in the four stories of the building. For world class accommodation and amenities, stay at Doubletree by Hilton Qingdao Chenyang. MT translation Restoran terkenal hotel ini menyajikan beraneka kelezatan kuliner. Pilihan masakan beragam dari barat hingga menu satuan. Hotel ini berada dekat dengan mukden palace dan liaoning provincial museum. Quest on sturt adalah tempat sempurna untuk semua jenis pelancong. Gili villas adalah sanggraloka eksklusif 4 vila bergaya. Semua 45 kamarnya tersebar di bangunan berlantai empat. Untuk akomodasi dan fasilitas kelas dunia, menginaplah di doubletree by hilton qingdao chenyang. and also for English to Thai Bankside Waldorf Apartments are located in the Central Business District of Auckland. Seashells Resort perfect for every type of traveller. Tan Son Nhat International Airport is 1 km away. Today, Bali is a favorite tourist destination. This boutique hotel offers 40 luxury villas with modern furnishings. Cello Hotel Songpa offers comfortable accommodation in a prime location for a reasonable price. Then relax and be pampered in the 'Red Spring Sauna'. MT แบงค์ ไซด์ วอลดอร์ฟ อพาร์ทเมนท์ ตั้ง อยู่ ใน ย่าน ศูนย์กลาง ธุรกิจ ของ ออ ค แลนด์ ซี เชลล์ รีสอร์ท เหมาะ สำหรับ นัก เดินทาง ทุก ประเภท สนามบิน นานาชาติ เติ่น เซินเญิ้ ตอ ยู่ ห่าง ออก ไป 1 km วันนี้ บาหลี เป็น จุดหมายปลายทาง ยอด นิยม ของ นัก ท่องเที่ยว บูติค โฮ เท็ล แห่งนี้ มี วิลล่า หรู พร้อม เฟอร์นิเจอร์ ที่ ทันสมัย 40 โรงแรม เซลโล ซง พา ให้ บริการ ที่ พัก ที่ สะดวกสบาย ใน ทำเล ที่ ตั้ง ชั้นเยี่ยม ด้วย ราคา สมเหตุสมผล แล้ว ผ่อนคลาย และ รับ การ ปรนนิบัติ ใน ' เรด สปริง ซาวน่า ' ▲ Collapse | |
|
|
Phil Hand China Local time: 18:11 Chinese to English Special pleading | Jul 10, 2014 |
Kirti Vashee wrote: ...Difficult combinations would be English to Arabic, Chinese to English, English to Japanese since the two languages are so different in essential structures. Inflection and morphological differences are very hard to capture and going from a SVO to SOV language also requires special efforts. I see this sort of claim all the time, and I have to call it. There's always some reason why the sample we're looking at is so bad: it's a difficult language pair, it's a difficult text type, whatever... According to your logic there, English to Chinese should be the easiest pair. English and Chinese share many common features: they're both SVO, they have roughly equivalent ways of expressing many sentence level features (adverbial clauses, time, etc.). Chinese has no (or very little) morphology, so the computer doesn't have to worry about that. But in reality, English to Chinese output is terrible. Usually completely unreadable. Now, I don't know about romance languages. But I'm often told that they are some of the hardest pairs to work between because you have to be so careful to avoid false friends. I'd like to hear other romance language colleagues chip in to tell us if Google or any other MT system can really achieve decent results in their pairs. PS. An illustration of why MT isn't advancing: what the hell kind of texts are these? What on earth does "from Western to A La Carte" mean?
[Edited at 2014-07-10 02:50 GMT] | | |
Kirti Vashee wrote: The person here was referring to a translator whose normal productivity is 250 words/hour was able to raise their output to 900 words/hour. Also this was for Life Sciences domain in English to Hungarian which is a very difficult language for MT Working pairs and domains don't matter... they are your working languages and your domains... so you should be able to reach a good output, not 250 words/hr! Get a good translator instead! | | |
Giles Watson Italy Local time: 12:11 Italian to English In memoriam A matter of style | Jul 10, 2014 |
Phil Hand wrote: Now, I don't know about romance languages. But I'm often told that they are some of the hardest pairs to work between because you have to be so careful to avoid false friends. I'd like to hear other romance language colleagues chip in to tell us if Google or any other MT system can really achieve decent results in their pairs. It's not just a question of a few false friends. Romance languages tend to have stylistic expectations about sentence structure and the organisation of thought that contrast with English. For example, Italian - but the comment also applies to other Romance languages - likes its sentences to look solid. Forms and notions balance or offset each other and the ideas often tend to be organised in (nested) pairs. English, in contrast, generally seeks to engage the reader's attention by imparting a sensation of movement. Readers expect sentences to flow and triplets are more common. If you want an analogy, it's a bit like listening to a tango (2/4 time) and trying to transcribe it as a waltz (6/8 time). You can, of course, calque the organisation of thought in the Italian but the English will plod and the translation will be far less effective than the original. MT doesn't even address this issue, except by imposing its own tone-deaf rhythms on the target texts. If and when MT begins to hear language with a native ear (or humanity loses its ability to enjoy language's sounds), it will be time for translators to step down and let the 'puters take over.
[Edited at 2014-07-10 13:39 GMT] | | |
Pages in topic: < [1 2 3 4 5 6 7 8 9 10] > |