![]() To do this I didn’t use MultiTerm Convert. However, I’m naturally suspicious of these things so I decided that one million lines would be a decent sample I could use to extract the definition file and then I could be sure I had the same structure as the one that was actually used in the TBX. This link on the IATE website gives you enough information to manually create a termbase definition for MultiTerm to match the way the TBX is structured. I saved this file with a new name and then tackled how to get it into Multiterm! Structuring your Termbase So for my first test I manually removed everything in the TBX below line number one million… ish. After a few tests later on I found what I thought would be a comfortable size to work with was around 1,000,000 lines. So the downloaded TBX itself contains 60,587,834 lines. This is explained here:Īs long as you keep complete termEntry elements and the information within them then you can remove as many as you like to create the desired file size for import. It has a subject field at entry level and it has two data fields at term level for “termtype” and “reliabilit圜ode” which are some of the fields IATE use to help structure their terminology. Two languages in this one, Czech and Polish, with a unique ID of “IATE-127”. So for example, the structure for a single term might look like this. Each one of these elements has a unique ID, some other fields and the term itself in at least one language, but some times 24 languages. The terminology sits inside an element called the termEntry element. The overall structure: I don’t know what you’d call this part but below the header the entire contents sits within two elements and is finally closed off with the element at the end: The header: this part always needs to be at the top of every file you recreate. So thumbs up for the same developers who created RegexBuddy, another tool I mention from time to time. It was very pleasing when I could open the TBX completely with this and browse the entire contents. Patience, because when you ask your tools to handle files like this it can involve a lot of waiting, and in some cases a lot of waiting that still never succeeds at all! So the first additional tool I used was EditPadPro which is a text editor I already own. Was it really impossible? To test it I cut out a section of the TBX to test with, and I had a couple of goes with this to get the size right for my tools, and for my patience. I decided to have a go at tackling this TBX after seeing a few comments from users who tried it and failed. So you have a couple of operations to handle here and they are going to involve handling files of a substantial size. Because if you want to only work with some of the languages then you have another problem because the specialist translating or terminology tools you have are probably unlikely to be able to handle this file either… at least not without chopping them into something smaller too. Do you really want all 24 EU langauges in the file, or do you just want two, three or four to help you with your work. That’s quite a whopper and you’re never going to able to handle this in MultiTerm or any other desktop translating environment without tackling it in bite sized chunks.īut before reaching for your keyboard to find a better text editor you should also give some thought to what you want from this TBX. If you even have a text editor capable of opening a file like this (common favourites like NotePad++ can’t even open it) then you’ll see that this equates to over 60 million lines and goodness knows how many XML nodes. Great idea… but unzipped it’s a 2.2Gb TBX file containing around 8 million terms in 24 official EU languages. ![]() The file is only 118 Mb zipped so you can import it straight into your favourite CAT and use it as a glossary to help your productivity! You can go and get a copy right now from here – Download IATE . ![]() ![]() The analogy for me was brought home this week when I discovered that IATE ( Inter Active Terminology for Europe) had made a download available for the EU’s inter-institutional terminology database. The equipment they’ve got, from the boat to the fishing rods, is all perfectly suitable for their usual weekend activities but hopelessly inadequate for handling something like this! Little do they know that the whopper under the surface is going to give them a little more trouble when they try to bring him on board! She’s hooked a whopper and he casually responds in the way he always does when she occasionally catches a fish on Sunday morning. I love this cartoon with the husband and wife fishing on a calm weekend off. ![]()
0 Comments
Leave a Reply. |