Abby FineReader 11 Misreadings
Thread poster: BrianHayden
BrianHayden
BrianHayden
United States
Russian to English
Jan 5, 2014

I've just installed a copy of Abby FineReader 11, and I've noticed something odd. I'm scanning the pages of a Russian-English dictionary, trying to convert it into a Word file, and it consistently misreads и with an accent mark (that is, и́) as й. It apparently doesn't read the accent marks above other letters (so, о́ shows up as о, е́ as e, etc.) I don't really need the accent marks, but I do need the dictionary to read и as и, and not as й. Any way to fix this?

 
Natalie
Natalie  Identity Verified
Poland
Local time: 10:09
Member (2002)
English to Russian
+ ...

Moderator of this forum
SITE LOCALIZER
Edit language properties Jan 5, 2014

You can add accented characters to a language through the Language editor.

Here is how to do this for English; for Russian, do the same, but add all letters you want to be recognized.

1) Go to the Tools Menu and click Language Editor. You can also use the keyboard shortcut Ctrl+Shift+L to open up the Language Editor window while in FR.

2) In the Language Editor Window click on the "New" button.

3) In the "New Language or Group" Tab s
... See more
You can add accented characters to a language through the Language editor.

Here is how to do this for English; for Russian, do the same, but add all letters you want to be recognized.

1) Go to the Tools Menu and click Language Editor. You can also use the keyboard shortcut Ctrl+Shift+L to open up the Language Editor window while in FR.

2) In the Language Editor Window click on the "New" button.

3) In the "New Language or Group" Tab select the first option (Create a new language based on) and make sure English is selected. What this does is create a new dictionary file based on your previous English Dictionary.

4) In the "Simple Language Properties" Tab:

1. Fill in a 'Language Name' of your choice. - Preferably make a name up that will remind you as to why you made it.
2. Leave the "Source Language" as is
3. Now at the "Alphabet" section, highlight the entire LINE with the mouse (or use the Keyboard shortcut Shift+End) and then tap the Delete Key on your Keyboard (leaving nothing in this area) replacing the empty area with this line of characters (remember to keep it all on one line)

™!"#$%&'()*+,./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¢£¥§©®±µ¿ÀÁÂÃÄÅÆÇ ÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùú ûüýþ

5) Keep the "Built-in dictionary" button marked, click okay, exit out of the program windows and you are all set.

6) Now you can use the new Language you have created from the drop down list on the Finereader toolbar under "User Languages".

Note: when you add a word to the dictionary that contains non-English letters a warning box will pop up, just click it ok. It is only warning you about foreign characters. Then add the word to your dictionary as you normally would.
Collapse


 
BrianHayden
BrianHayden
United States
Russian to English
TOPIC STARTER
Still not working... Jan 5, 2014

Natalie wrote:

3. Now at the "Alphabet" section, highlight the entire LINE with the mouse (or use the Keyboard shortcut Shift+End) and then tap the Delete Key on your Keyboard (leaving nothing in this area) replacing the empty area with this line of characters (remember to keep it all on one line)

™!"#$%&'()*+,./0123456789:;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~¢£¥§©®±µ¿ÀÁÂÃÄÅÆÇ ÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùú ûüýþ

Note: when you add a word to the dictionary that contains non-English letters a warning box will pop up, just click it ok. It is only warning you about foreign characters. Then add the word to your dictionary as you normally would.


This is what's giving me trouble. I deleted the existed alphabet. Then I entered in all the unaccented letters of the Russian alphabet, then all the vowels, lowercase and uppercase, with accent marks. But... it only registers the unaccented letters. What now?


 
esperantisto
esperantisto  Identity Verified
Local time: 11:09
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
No way Jan 5, 2014

No, there's no way to fix the problem. If you can find a copy of FineReader 8 or 9, install and use it. You see, they changed the recognition algorithm and made it much worse even for plain Russian Cyrillic. And with accented Cyrillic letters, it's become a total disaster. Perhaps, OmniPage performs better, but I haven't tested.

 
BrianHayden
BrianHayden
United States
Russian to English
TOPIC STARTER
I wouldn't say "total"... Jan 5, 2014

esperantisto wrote:

No, there's no way to fix the problem. If you can find a copy of FineReader 8 or 9, install and use it. You see, they changed the recognition algorithm and made it much worse even for plain Russian Cyrillic. And with accented Cyrillic letters, it's become a total disaster. Perhaps, OmniPage performs better, but I haven't tested.


Well, it is reading the other accented letters correctly -- it just leaves out the accents when it converting it into a Word file. The only thing it completely misreads, so far as I can tell, is accented и as й. That is incredibly annoying, though.


 
Václav Pinkava
Václav Pinkava  Identity Verified
United Kingdom
Local time: 09:09
Czech to English
+ ...
Read with training Jan 5, 2014

I use ABBYY Professional 11 with great success for Czech, which is an accented language. Sometimes, it needs a bit of help.

Basically, once you have added the respective letters to the character set of your detection language (or made sure it is already there), you should have no problem training the OCR to match it for a given font. I have created a variant language based on Czech, which I call Czech extended, to cater for some obscure symbols.
BTW it is not a good idea to h
... See more
I use ABBYY Professional 11 with great success for Czech, which is an accented language. Sometimes, it needs a bit of help.

Basically, once you have added the respective letters to the character set of your detection language (or made sure it is already there), you should have no problem training the OCR to match it for a given font. I have created a variant language based on Czech, which I call Czech extended, to cater for some obscure symbols.
BTW it is not a good idea to have more characters available in the table than you need, it just gives ABBYY more options to confuse itself among.


Under Tools/Options/ tick Use Built-in and User patterns and Read with Training, then read a selection of your text containing the problematic letters. (or create a special training section/document with all such).
When a given letter is misread, point it to the right choice in your character table, using the [...] option button next to the reading window to reveal the character table.

You may just need to use the double chevron button to broaden or reduce the catchment area around the letter being read and train the system that way, - this should make it notice all the accents.

There are various sources of reference out there on the training process, e.g. http://www.youtube.com/watch?v=CnMRw23bDAI



[Edited at 2014-01-05 19:41 GMT]
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Abby FineReader 11 Misreadings






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »