![]() ![]() It won't produce perfect results all the time, but it's more likely to get it right than tackling the problem at a lower level. At present I would say that importing the GEDCOM into a program such as Gramps and using the reporting features there would be the simplest way to work around the problem. The genealogy programs have more users so more reason to get it right. Understanding the varied GEDCOM formats is a hard problem that most of the simple parsers are not very good at. Sources are equally a mess, with a wide variety of alternative implementations that have to be considered in any parser. Maybe there is a parser out there than can cope with all those options, but a comprehensive test suite would be the only way to find out, given the lack of documentation for most. It would work for a large number of the GEDCOMs out there.īut as Louis Kessler points out, it's possible for the place to be described in a few other ways as well, by different software. This initially looks trivial - grep ' PLAC ' my.ged would give you all PLAC lines. This venerable program from the early days of computerised genealogy appears to be capable of being scripted to extract information from GEDCOMs (possibly not the more recent ones) but has a steep learning curve (too steep for me!), basically a new language to learn. I should also mention Tom Wetmore's Lifelines here. This parser is better documented that most, with a list of what real-world extensions to GEDCOM it uses. It's debatable whether that's any easier to deal with (but the parser does handle the CONT and CONC lines well). For example a placename in the GEDCOM like: 1 PLAC This turns GEDCOM into JSON, but doesn't add much intelligence to the output. The inconsistencies are not (entirely) the fault of each parser, there are ambiguities in the specification(s), and many vendor-specific extensions.įew parsers have much in the way of reporting (such as "list all places"), so to use them you need to understand GEDCOM at quite a low level.įor example, there is an online parser using the Java parser by Dallan Quass. They mostly don't follow any documented process to extract data (other than what's in their source code!), they don't have any common output format, and they each have their own errors and omissions. There are GEDCOM parsers for most programming languages, of varying vintage. I'm not aware of a language-independent documented algorithm for extracting data from a GEDCOM. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |