![]() |
|||||||||
|
PennNames Generate AlgorithmFollowing is a discussion of how PennNames are generated when using the PennNames Generate command.
Terminology
In this discussion of the Generate command algorithm the following terminology will be used:
OverviewA good-looking username consists of a large portion of one part of a user's name, either followed by or preceeded by a small portion of another part of his/her name. For example, let's consider the user whose PennCommunity FIRST_NAME is "Ziggy", whose PennCommunity LAST_NAME is "Bozo", and who doesn't have a PennCommunity MIDDLE_NAME. It is likely that Ziggy would prefer the following names:
In practice, some of those names will be unavailable since they might be assigned to some other person on campus or reserved by another sponsor. Likely secondary choices for Ziggy might be:
Of course there are a lot of variables. There's no guarantee that we'll have a middle initial. Even if we do, it's possible that the user prefers to go by their second name, the user could have two middle names, the PennCommunity information might be inaccurate, and so on. So a wide variety of names should be generated to try and cover as many of these possibilities as is practical. Previous versions of the PennNames name generation algorithm were considered to have too high a weight on the middle name since the first name, last name and middle name had been weighted equally. The current version of the algorithm tries to deduce which name is the middle name, and give it a lower priority than other seed material. Sources of the NamesThe implemented algorithm has three sources of potential Name-Material. They are :
Basic generation of the namesThe basic premise is that all seed material falls in to one of three categories: high, medium or low weight. The PennCommunity-Fullname is examined, and middle names are given a low weight. The optional Seed, if provided, is given a high weight. Titles and suffixes (e.g. Mrs., Jr., Sr.) are discarded. Everything else is given a medium weight. The namestream is a series of "lazy enumeration" functions which generate more results on demand. These functions are stacked on top of each other such that the generate results from the high priority material first; then the high and medium priority material; then the high, medium and low material in conjunction. The lazy enumeration will only do as much work as it needs to in order to return the target number of suggested names. There is also a time limit placed on the namestream generator. If the time expires then the result list will be cut short. This prevents a runaway server or malfunctioning service from denying access to the PennNames service. The mixing of names is done in stages:
No more than three pieces of material are ever considered for a generated name. In our simplest example where the PennCommunity FIRST_NAME is "Ziggy", the PennCommunity LAST_NAME is "Bozo", the PennComunity MIDDLE_NAME is blank, and no User-Supplied-Name or Seed has been supplied, the Name-Material will be "Bozo Ziggy" (note: last name first) and we will generate these names:
Those are the only 44 possible names generated from the given source material. Extended generation of the namesAs a final resort, the lazy enumerator will begin to postpend numbers if the material is completely used up. The names generated will look like this:
You'll notice that this starts with the first results from the initial generate, above. The order in which the seeds themselves are consulted is guaranteed, but the point at which they're used is not. In this case there are two uses of 'boz o', then one of 'ziggy'. This may not be true of all cases. In general, this number-postpending mechanism is considered a mechanism of last resort so it is hoped that one need not rely on the efficacy of the extension; if you get here, nothing that you want is avaiable anyway. Name-material weightingAs mentioned above, not every piece of Name-Material carries the same weight. For example, if we add a middle name ("Quartermaine") for Ziggy, we then have the Name-Material "Bozo Ziggy Quartermaine" and generate these names:
Note that the middle name is not consulted until the other Name-Material is nearly exhausted (at position 41). This means that typical generate request which is usually for 15-25 names will not generally use the middle name at all. Name validation and resultsSince this algorithm uses a lazy iterator, the results are vetted immediately upon generation. Names which are not available are immediately discarded. This results in a significantly faster algorithm than the previous iteration; generating 1000 names takes approximately 8 seconds with the new algorithm, whereas we saw generates of 100 take over 30 seconds with the previous algorithm. This should help avoid server deadlock under high load. The list of results is returned to the PennNames client, up to but no more than the maximum number of names that were requested. This iterator should never run out of material (there are an endless supply of numbers), and so should always return the exact number of requested names. Even so, do not assume that asking for 25 names will return 25 names. It will return up to 25 names. Some cases wind up returning only one name, or only the set of names that are already held for a given PennID. It is only guaranteed that the server will not return more results than what you requested. |
![]() |