Naming is a Problem By Cris Fitch In the following paper, I discuss why I think Naming is so important in computer programming, and how one might tackle the problem of creating a sophisticated program to generate names for use in programming, creative writing, or perhaps most importantly, for use on the Web. Why Naming is Important It is my contention that Naming is an important topic to cover in teaching programming. A name can serve to direct the conception of the entity you're constructing. A generic name, such as foo, can be used to hold the place of a more appropriate name that may come along later. A very descriptive name, however, may serve to distract both the creator and the maintainer of the code, since the description may have gone stale. A creative and unique name may enliven an entity, although care should be taken to use these sparingly. A popular country music song of my youth told the story of a "Boy Named Sue". With a girl's name, the boy grows tough, having to continually respond to the taunts of others. As a he becomes a man, he tracks down the man who gave him this name, his father. After finding this man, and beating him in a vicious fight, he demands to know why a man would name his son "Sue". The father explains that, as he knew he would not be able to look after his son, the name would force him to be tough and thus be able to survive the world's harshness. In the process of designing and coding computer software, the creation of names is a frequent activity. Once created, a name tends to remain an integral part of the program throughout that code's lifetime. For code to be self-documenting and easy to maintain, these names end up being key. One can name something in a variety of different ways. Normally you want to choose something that is both unique and appropriate. Perhaps even descriptive. As in the story, a bad name can cause one much grief. On the other hand, it can lay the foundation for meaningful reference, perhaps the ultimate purpose of a name. The Power of a Name When one knows the name of something or someone, it gives them a certain power over that something or someone. It is the power of reference. With a single token, a name, one can grab hold and manipulate the essence of an entity, without having to describe in detail the parts therein. Most people know the story of Rumplestilskin. In it, a dwarf saves a maiden from certain death when she is unable to weave straw into gold, as demanded of her by the king. In return for a bit of magic, the dwarf gets the girl to agree to give him her first born child a year hence. The king marries the girl, and happily she has a beautiful child. When the dwarf shows up to collect his payment, the girl refuses. He gives her the option of guessing his name, instead. This she learns by secretly overhearing him. When, on the next day, she successfully provides his name, his power is broken and he disappears, never to be seen from again. In this era of the World Wide Web, the story becomes highly relevant as people attempt to find things in an ocean of knowledge. Search engines for the Web index tens of millions of pages, and searches must often hinge on key strings. With a name like "John", how likely are you to be found? In considering the construction of a program whose goal is appropriate naming of entities, there are challenges in how to measure appropriateness and how to generate potential candidates. Both of these are highly dependent on the reference material which is consulted. Where do Names Come From? Naming needs to have some references upon which it is based. A lexicon, a collection of common-use words, is an important foundation herein. Each word within the lexicon may or may not have a unique string which indicates it. Associated with a given string, as in WordNet, is one or several concepts. These have a part of speech, a place in a hierarchy of concepts, and other various properties. Extending our reach outside the lexicon, there are a world of associations that common-use words and names have. When used in a new fashion, as part of a program for example, a name may come to acquire new associations. Yet those old associations remain and can either help or hinder the new role you desire a name to play. A subtle distinction should be made between single and multi-word names. Whereas a single word name can either appear or not appear in the lexicon, the multi-word name has a constructive aspect to it. With a finite lexicon, multi-word constructions can be infinite. For single word names not found in the lexicon, a different associative mechanism kicks in. Pronunciation and the similarity of appearance to items which are in the lexicon has an impact. Gwity may be a new word. Yet the ending (morphology) and its similarity to Gritty or Witty may be evoked. In other words, no name is completely without association. Yet the reason behind a completely new name (not in the lexicon) is the emphasize uniqueness and to build up a totally separate identity for your construct. Within a standard American lexicon, there are words that tend to be names, and those that tend to be common use English words. Names themselves break up into several categories, as strings which tend to be appropriate for places, people, companies, etc. Some of the shortest names are acronyms, such as IBM or GM, although these can often lose their uniqueness, such as NRA, PLA or PRC. Even if not present in a dictionary or encyclopedia, a name can be present in the popular literature or culture. Consider the use of names from the Simpsons or Tolkien. With the use of such a name, associations may be helpful, either for marketing, memory, or extension, or they may be cryptic, obscure or irrelevant. Obviously an English parsing lexicon is a good place to start, since it serves to map many commonly used strings. It might also make sense to have a large set of commonly used names, and some sense of where they have been applied. Associations between strings gets to be a difficult thing to tackle because of the immensity of the knowledge involved. However, the existence of the Web serves to further this cause much more than we might have been able to consider even a few years ago. A Naming Mechanism The first thing that comes to mind when asked to write a program which names things is to build a random string generator. I have seen Cognitive Science experiments which display random English-like words for measuring recognition speeds. These generators used statistical frequencies to adjust the kinds of output. For example, the letter 'e' appears more frequently at the end of a word than at the beginning. But a name isn't simple a new, unique string. I call my cat 'Siberia', which uniquely identifies her from other cats. But it also has additional meaning and implications. When asked why I had named her that, I tell them that her fur has the same color as the bear on a PBS documentary about Siberia which was showing at the time she adopted me. And when I took her to the Vet, and they wrote her name down, they wrote it as 'Cyberia', with a whole separate set of implications. Returning to the random selection approach, we wonder whether it might be possible to just pick a random item in the lexicon. Sometimes this might be a good thing to do, and in the case of my cat, it was nearly the case. Names and Meaning Is a name random, though? Once established, associations develop and justifications are put in place. A story may be formed to explain the choice of a given name, in reaction to people's need to have an explanation. Yet for the selection of names for software entities, this approach alone could cause chaos. It denies the pre-existence of meaning for items within the lexicon. If the cat came from a sister living in Cincinatti, one wouldn't want to call the cat 'Chicago' or 'Buffalo'. True, the set of potential names is large. But appropriateness has some relation to the pre-existing circumstances of this thing without a name. It is important to note which items have names. One doesn't uniquely name the speakers hooked up to your stereo or the loaf of bread you bought at the store today. Why is that? In the days of yore, peasants were often named simply by what they did. Baker, Smith, and many others (my own included) were job descriptions as much as names. In programming this is true of iteration variables, for example (for i=0 to 10). It doesn't deserve a globally unique name. It is perhaps best that the name of the item is something so simple, so common. And by having that simple name, when you encounter it, you already know so much about its place in the world. I've found in my own code that there is a time and a place for a unique name. It serves to balance the blandness of so many generically named items. Especially when the meaning of the entity has yet to be established and yet will cover a large section of new territory. The most obvious case is the name of the overall program itself. Many of the utilities I write have unique overall names, because that way they can develop their own personalities, independent of some foolish initial perconception. The major alternative to a unique name is a functional description. Would it have been that strange if Kodak had instead been named General Camera? Instead of calling my new utility program "Waxito" I might name it "FilmConvert". It takes less time and effort to create a unique name, and it doesn't become obsolete when the program evolves away from a narrow original intent. Yet to the uninformed, it has little relevant meaning and requires that a new entry be created and maintained in the lexicon. A functionally descriptive name is "self-documenting".