How do different CAT tools do discount analyses?
Vestluse postitaja: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Madalmaad
Local time: 17:34
Liige (2006)
inglise - afrikaani
+ ...
Oct 15, 2009

G'day everyone

[Sorry for calling it "discount analyses"... I mean of course the type of statistics that show how many segments have what kinds of fuzzy matches against a TM or against some standard.]

How do different CAT tools count those matches? I'm not concerned here about whether a match may be 55% in one CAT tool but 45% in another CAT tool, but whether the CAT tool counts internal fuzzy matches or not. In other words, if there is no TM, but many segments in the
... See more
G'day everyone

[Sorry for calling it "discount analyses"... I mean of course the type of statistics that show how many segments have what kinds of fuzzy matches against a TM or against some standard.]

How do different CAT tools count those matches? I'm not concerned here about whether a match may be 55% in one CAT tool but 45% in another CAT tool, but whether the CAT tool counts internal fuzzy matches or not. In other words, if there is no TM, but many segments in the source file are similar, what does the CAT tool's analysis say?

Not all CAT tools produce such statistics, but for those that do, I'd like you to help me see how the different tools do the analyses. I only have Wordfast and OmegaT, so I can contribute only those two, but I hope that other ProZians with other tools can tell me what their CAT tools' analyses look like.

Please do this:

1. Create a file with the following text in it:

The quick brown fox jumps over the lazy dog. That quick brown fox jumps over the lazy dog. The slow brown fox jumps over the lazy dog. The quick green fox jumps over the lazy dog. The quick brown cat jumps over the lazy dog. The quick brown fox sails over the lazy dog. The quick brown fox jumps under the lazy dog. The quick brown fox jumps over one lazy dog. The quick brown fox jumps over the dead dog. The quick brown fox jumps over the lazy fish. The rain in Spain falls mainly on the plains. Little rain in Spain falls mainly on the plains. The snow in Spain falls mainly on the plains. The rain on Spain falls mainly on the plains. The rain in Mars falls mainly on the plains. The rain in Spain drops mainly on the plains. The rain in Spain falls largely on the plains. The rain in Spain falls mainly under the plains. The rain in Spain falls mainly on those plains. The rain in Spain falls mainly on the trees.


2. Do an analysis against no TM (or against an empty TM if your CAT tool doesn't allow you to do an analysis without a TM). Tell me what the statistics are (A).

3. Translate the first sentence (or put the first sentence in the TM), and do the analysis again. Tell me what the statistics are (B).

I hope the results are interesting. Thanks!
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Madalmaad
Local time: 17:34
Liige (2006)
inglise - afrikaani
+ ...
TOPIC STARTER
Wordfast 5.5 (WFC) Oct 15, 2009

For Wordfast Classic (WFC) 5.5:

A. Empty TM

Analogy Segments Words Characters Percentage
---------------------------------------------------------
Repetitions 0 0 0 0%
100% 0 0 0 0%
95%-99% 0 0 0 0%
85%-94% 0 0 0 0%
50%-84% 0 0 0 0%
00%-49% 20 180 899 100%
Total 20 180 899

B. One segment in TM

Analogy Segments Words Characters %
---------------------------------------------------------
... See more
For Wordfast Classic (WFC) 5.5:

A. Empty TM

Analogy Segments Words Characters Percentage
---------------------------------------------------------
Repetitions 0 0 0 0%
100% 0 0 0 0%
95%-99% 0 0 0 0%
85%-94% 0 0 0 0%
50%-84% 0 0 0 0%
00%-49% 20 180 899 100%
Total 20 180 899

B. One segment in TM

Analogy Segments Words Characters %
---------------------------------------------------------
Repetitions 0 0 0 0%
100% 1 9 44 5%
95%-99% 0 0 0 0%
85%-94% 4 36 177 20%
50%-84% 5 45 221 25%
00%-49% 10 90 457 50%
Total 20 180 899
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Madalmaad
Local time: 17:34
Liige (2006)
inglise - afrikaani
+ ...
TOPIC STARTER
OmegaT 2.0.5 Oct 15, 2009

For OmegaT 2.0.5:

A. Empty TM

x Segments Words Characters (without spaces) Characters (including spaces)
Repetitions: 0 0 0 0
Exact match: 0 0 0 0
95%-100%: 0 0 0 0
85%-94%: 0 0 0
... See more
For OmegaT 2.0.5:

A. Empty TM

x Segments Words Characters (without spaces) Characters (including spaces)
Repetitions: 0 0 0 0
Exact match: 0 0 0 0
95%-100%: 0 0 0 0
85%-94%: 0 0 0 0
75%-84%: 0 0 0 0
50%-74%: 0 0 0 0
No match: 20 180 739 899

B. One segment in TM

x Segments Words Characters (without spaces) Characters (including spaces)
Repetitions: 0 0 0 0
Exact match: 1 9 36 44
95%-100%: 0 0 0 0
85%-94%: 9 81 326 398
75%-84%: 0 0 0 0
50%-74%: 10 90 377 457
No match: 0 0 0 0
Collapse


 
Vito Smolej
Vito Smolej
Saksamaa
Local time: 17:34
Liige (2004)
inglise - sloveeni
+ ...
SITE LOCALIZER
on trados (freelance 2007) Oct 15, 2009

Hi Samuel:

Samuel Murray wrote:
2. Do an analysis against no TM (or against an empty TM if your CAT tool doesn't allow you to do an analysis without a TM). Tell me what the statistics are (A).

20 no-matches.

Samuel Murray wrote:
3. Translate the first sentence (or put the first sentence in the TM), and do the analysis again. Tell me what the statistics are (B).

1 100%
9 85% - 94%
10 no-matches

I'll charge you nothing for 100%, but we will have to discuss the 9 in the 85% - 94% slot (g).

How's the stemming in OmegaT doing on the subject of quick foxes and lazy dogs? Any changes in the ranking of matches (should try it myself...)?

Regards


 
Samuel Murray
Samuel Murray  Identity Verified
Madalmaad
Local time: 17:34
Liige (2006)
inglise - afrikaani
+ ...
TOPIC STARTER
@Vito Oct 15, 2009

VitoSmolej wrote:
How's the stemming in OmegaT doing on the subject of quick foxes and lazy dogs? Any changes in the ranking of matches (should try it myself...)?


OmegaT's discount analysis system does take tags into account (hence my test here with a plaintext file) but does not take stemming into account (so says the developer on the OmegaT mailing list).

[Edited at 2009-10-15 09:26 GMT]


 
Boris Sigalov
Boris Sigalov
Local time: 18:34
inglise - vene
MemoQ 3.0.29 Oct 15, 2009

A. Empty TM

Type Segments Source words Source chars Percent
All 20 180 739 100
Repetition 0 0 0 0
101% 0 0 0 0
100% 0 0 0 0
95%-99% 0 0 0 0
85%-94% 0 0 0 0
75%-84% 0 0 0 0
50%-74% 0 0 0 0
No match 20 180 739 100

B. One segment in TM

Type Segments Source words Source chars Percent
All 20 180 739 100
Repetition 0 0 0 0
101% 1 9 36 5
100% 0 0 0 0
95%-99% 0 0 0 0
... See more
A. Empty TM

Type Segments Source words Source chars Percent
All 20 180 739 100
Repetition 0 0 0 0
101% 0 0 0 0
100% 0 0 0 0
95%-99% 0 0 0 0
85%-94% 0 0 0 0
75%-84% 0 0 0 0
50%-74% 0 0 0 0
No match 20 180 739 100

B. One segment in TM

Type Segments Source words Source chars Percent
All 20 180 739 100
Repetition 0 0 0 0
101% 1 9 36 5
100% 0 0 0 0
95%-99% 0 0 0 0
85%-94% 0 0 0 0
75%-84% 9 81 326 45
50%-74% 0 0 0 0
No match 10 90 377 50
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How do different CAT tools do discount analyses?







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »