PennNames Generate Algorithm


ISC Networking & Telecommunications
PennNames Services Home

	PennNames Client Commands
	PennNames Generate Algorithm
	Sample Perl Script
	PennNames Web Client
	Request Web Client Access
	Renaming a PennName

Service Alerts

Current Status

PennNames Generate Algorithm

Following is a discussion of how PennNames are generated when using the PennNames Generate command.

Terminology
Overview
Sources of the Names

Basic Generation of the Names

Extended Generation of the Names

Name-material weighting
Name Validation and Results

Terminology

In this discussion of the Generate command algorithm the following terminology will be used:

Term Description Example

PennCommunity-Fullname This is the full name of the user as stored in PennCommunity, in the format, LAST_NAME FIRST_NAME MIDDLE_NAME. Smith, John H

User-Supplied-Name This is any name information supplied as an optional argument to the Generate command in the <FULLNAME> variable. This may be the full name of the user, or it may be something completely arbitrary. Jack Smith

Seed This is a string which the user prefers for his/her username. This information is supplied as an optional argument to the Generate command in the <SEED> variable, the string that is prepended by a colon when issuing the Generate command. Please note that the colon is not factored into the algorithm. It has the highest weight. bozo

Name-Material This is the combination of data from PennCommunity_Supplied_Fullname plus the Seed or the User-Supplied-Name plus the Seed and is used to generate a list of potential usernames. Any punctuation is stripped out so that "John H. Smith, Jr." becomes "John H Smith Jr". bozo Jack Smith

Overview

A good-looking username consists of a large portion of one part of a user's name, either followed by or preceeded by a small portion of another part of his/her name.

For example, let's consider the user whose PennCommunity FIRST_NAME is "Ziggy", whose PennCommunity LAST_NAME is "Bozo", and who doesn't have a PennCommunity MIDDLE_NAME. It is likely that Ziggy would prefer the following names:

bozo
ziggy
zbozo
bozoz
ziggyb

These should be considered the best possible names for this user.

In practice, some of those names will be unavailable since they might be assigned to some other person on campus or reserved by another sponsor. Likely secondary choices for Ziggy might be:

bozo2
ziggy2
zbozo2
bozoz2
ziggyb2

If the names postpended with "2" are taken, we might suggest the same names ending with 3, and so on.

Of course there are a lot of variables. There's no guarantee that we'll have a middle initial. Even if we do, it's possible that the user prefers to go by their second name, the user could have two middle names, the PennCommunity information might be inaccurate, and so on. So a wide variety of names should be generated to try and cover as many of these possibilities as is practical.

Previous versions of the PennNames name generation algorithm were considered to have too high a weight on the middle name since the first name, last name and middle name had been weighted equally. The current version of the algorithm tries to deduce which name is the middle name, and give it a lower priority than other seed material.

Sources of the Names

The implemented algorithm has three sources of potential Name-Material. They are :

the PennCommunity-Fullname
an optional User-Supplied-Name
an optional Seed

Basic generation of the names

The basic premise is that all seed material falls in to one of three categories: high, medium or low weight. The PennCommunity-Fullname is examined, and middle names are given a low weight. The optional Seed, if provided, is given a high weight. Titles and suffixes (e.g. Mrs., Jr., Sr.) are discarded. Everything else is given a medium weight.

The namestream is a series of "lazy enumeration" functions which generate more results on demand. These functions are stacked on top of each other such that the generate results from the high priority material first; then the high and medium priority material; then the high, medium and low material in conjunction. The lazy enumeration will only do as much work as it needs to in order to return the target number of suggested names.

There is also a time limit placed on the namestream generator. If the time expires then the result list will be cut short. This prevents a runaway server or malfunctioning service from denying access to the PennNames service.

The mixing of names is done in stages:

Each piece of material by itself
Mixed pairs of material
Mixed triplets of material

No more than three pieces of material are ever considered for a generated name.

In our simplest example where the PennCommunity FIRST_NAME is "Ziggy", the PennCommunity LAST_NAME is "Bozo", the PennComunity MIDDLE_NAME is blank, and no User-Supplied-Name or Seed has been supplied, the Name-Material will be "Bozo Ziggy" (note: last name first) and we will generate these names:

bozo

ziggy

bozoz

ziggyb

bziggy

zbozo

bozozi

ziggybo

bz

zb

boziggy

zibozo

bozozig

ziggyboz

bzi

zbo

boz

zib

bozziggy

zigbozo

bozozigg

zboz

bzig

zibo

bozi

zigb

bozz

ziggbozo

bzigg

ziboz

bozig

zigbo

bozzi

ziggb

bozigg

zigboz

bozzig

ziggbo

bozzigg

ziggboz

bo

zi

zig

zigg

Those are the only 44 possible names generated from the given source material.

Extended generation of the names

As a final resort, the lazy enumerator will begin to postpend numbers if the material is completely used up. The names generated will look like this:

bozo1
bozo2
ziggy1
bozo3
ziggy2
bozo4
ziggy3
bozo5
ziggy4
bozo6
ziggy5
bozo7

ziggy6
bozo8
ziggy7
bozo9
ziggy8
bozo10
ziggy9
bozo11
ziggy10
bozo12
ziggy11
bozo13

ziggy12
bozo14
ziggy13
bozo15
ziggy14
bozo16
ziggy15
b1
bozo17
ziggy16
b2
z1

bozo18
ziggy17
b3
z2
bozoz1
bozo19
ziggy18
b4
z3
bozoz2
bozo20
ziggy19

b5
z4
bozoz3
ziggyb1
bozo21
ziggy20
b6
z5
bozoz4
ziggyb2
bozo22
ziggy21

You'll notice that this starts with the first results from the initial generate, above. The order in which the seeds themselves are consulted is guaranteed, but the point at which they're used is not. In this case there are two uses of 'boz o', then one of 'ziggy'. This may not be true of all cases. In general, this number-postpending mechanism is considered a mechanism of last resort so it is hoped that one need not rely on the efficacy of the extension; if you get here, nothing that you want is avaiable anyway.

Name-material weighting

As mentioned above, not every piece of Name-Material carries the same weight. For example, if we add a middle name ("Quartermaine") for Ziggy, we then have the Name-Material "Bozo Ziggy Quartermaine" and generate these names:

bozo

ziggy

bozoz

ziggyb

bziggy

zbozo

bozozi

ziggybo

bz

zb

boziggy

zibozo

bozozig

ziggyboz

bzi

zbo

boz

zib

bozziggy

zigbozo

bozozigg

zboz

bzig

zibo

bozi

zigb

bozz

ziggbozo

bzigg

ziboz

bozig

zigbo

bozzi

ziggb

bozigg

zigboz

bozzig

ziggbo

bozzigg

ziggboz

bozoq

ziggyq

bo

zi

qu

qbozo

qziggy

zig

qua

bozoqu

ziggyqu

zigg

quar

bq

zq

qb

qz

bozozq

bozoqz

ziggybq

ziggyqb

quart

qubozo

quziggy

quarte

bozoqua

ziggyqua

quarter

bqu

zqu

qbo

qzi

bozozqu

bozoqzi

ziggybqu

ziggyqbo

quarterm

boq

ziq

qub

quz

bziggyq

zbozoq

qbozoz

qziggyb

quabozo

quaziggy

bozoquar

qboz

bqua

zqua

qubo

qzig

bozozqua

bozoqzig

qziggybo

boqu

ziqu

quab

quzi

bziggyqu

zbozoqu

ziggyqub

qbozozi

bozq

zigq

quarbozo

quaz

bozoziq

bozoquz

ziggyboq

zqbozo

qzbozo

quboz

bqziggy

qbziggy

bquar

quabo

qzigg

boqua

zquar

quarb

quzig

zqb

qbozozig

qzb

bozqu

ziqua

quazi

bozoquzi

zbozoqua

zigqu

quaboz

quarz

bozoziqu

bqz

qbz

bquart

ziggq

quarbo

bzq

zbq

zqbo

qzbo

boquar

quartb

quzigg

quziggyb

bozqua

zquart

quazig

ziquar

quarboz

quarzi

bqzi

zqboz

qbzi

qzboz

bquarte

zigqua

quartbo

quartz

qubozoz

boquart

ziggqu

quarteb

bozquar

quazigg

bzqu

zbqu

zqubozo

qzibozo

zquarte

quartboz

quarzig

boziggyq

bqzig

zibozoq

qbzig

bquarter

ziquart

quartebo

quartzi

qubozozi

boquarte

zigquar

quarterb

quartez

bozoquaz

zqub

qzib

bozquart

ziggqua

bquziggy

ziqbozo

qboziggy

quzbozo

quarzigg

bqzigg

qbzigg

zquarter

quartzig

zqubo

qzibo

ziquarte

quartezi

bzqua

zbqua

ziqb

quzb

zigquart

quarterz

bquz

zibozoqu

ziggquar

bozozigq

boqziggy

zquboz

qubziggy

qziboz

ziqbo

quzbo

bquzi

qbozi

boqz

ziqboz

qubz

quzboz

bzquar

zbquar

bquzig

zquabozo

qbozig

qzigbozo

boqzi

qubzi

bziq

zboq

quabozoz

zquab

qzigb

bquzigg

ziqubozo

qbozigg

quzibozo

boqzig

qubzig

zquabo

qzigbo

ziqub

qbozz

quzib

bzquart

zbquart

zigqbozo

quazbozo

boqzigg

zquaboz

qubzigg

qzigboz

ziqubo

quzibo

bziqu

zboqu

zigqb

qbozzi

quazb

bquaz

zibq

ziquboz

quziboz

zigqbo

quazbo

qbozzig

bquazi

qubozi

boquz

zigqboz

quabz

quazboz

bzquarte

zbquarte

qbozzigg

qziggb

bquazig

qubozig

bziqua

boquzi

zboqua

quabzi

bozqz

zibqu

qziggbo

zigbozoq

zquarb

quzigb

bquazigg

qubozigg

boquzig

quabzig

qziggboz

bozqzi

zquarbo

quzigbo

ziquab

qubozz

quazib

boquzigg

zquarboz

quabzigg

quzigboz

bozqzig

ziquabo

quazibo

zigqub

qubozzi

quarzb

bziquar

bquarz

zboquar

zibqua

ziquaboz

quaziboz

bozqzigg

zigqubo

quarzbo

zbozq

ziggqb

qubozzig

bquarzi

quabozi

boquaz

zigquboz

quarbz

quarzboz

ziggqbo

quziggb

bquarzig

quabozig

boquazi

ziggqboz

quarbzi

bozquz

quziggbo

bziquart

zboquart

zquartb

quazigb

zibquar

boquazig

quarbzig

bozquzi

zbozqu

zquartbo

quazigbo

bzigq

ziboq

ziquarb

quabozz

quarzib

bozquzig

ziquarbo

quarzibo

zigquab

quabozzi

quartzb

bquartz

zigquabo

quartzbo

zibquart

ziggqub

bquartzi

quarbozi

boquarz

zbozqua

quartbz

bzigqu

ziboqu

ziggqubo

boziq

zigbq

quaziggb

boquarzi

quartbzi

bozquaz

zquarteb

quarzigb

bozquazi

ziquartb

quarbozz

quartzib

zbozquar

bzigqua

ziboqua

boziqu

zigbqu

bozzq

zigquarb

quartezb

bquartez

ziggquab

boquartz

quartebz

bzigquar

ziboquar

boziqua

zigbqua

bozzqu

bozquarz

bziggq

zibozq

boziquar

zigbquar

bozzqua

bziggqu

zibozqu

bozigq

zigboq

bozzquar

bziggqua

zibozqua

bozigqu

zigboqu

bozziq

ziggbq

bozigqua

zigboqua

bozziqu

ziggbqu

boziggq

zigbozq

bozquarz

bziggq

zibozq

boziquar

zigbquar

bozzqua

bziggqu

zibozqu

bozigq

zigboq

bozzquar

bziggqua

zibozqua

bozigqu

zigboqu

bozziq

ziggbq

bozigqua

zigboqua

bozziqu

ziggbqu

boziggq

zigbozq

Note that the middle name is not consulted until the other Name-Material is nearly exhausted (at position 41). This means that typical generate request which is usually for 15-25 names will not generally use the middle name at all.

Name validation and results

Since this algorithm uses a lazy iterator, the results are vetted immediately upon generation. Names which are not available are immediately discarded. This results in a significantly faster algorithm than the previous iteration; generating 1000 names takes approximately 8 seconds with the new algorithm, whereas we saw generates of 100 take over 30 seconds with the previous algorithm. This should help avoid server deadlock under high load.

The list of results is returned to the PennNames client, up to but no more than the maximum number of names that were requested. This iterator should never run out of material (there are an endless supply of numbers), and so should always return the exact number of requested names. Even so, do not assume that asking for 25 names will return 25 names. It will return up to 25 names. Some cases wind up returning only one name, or only the set of names that are already held for a given PennID. It is only guaranteed that the server will not return more results than what you requested.



Information Systems and Computing University of Pennsylvania Comments & Questions