Guide to Traits
Sometimes the data you need isn’t easy to find. These data may be in an incorrect Darwin Core field, buried in a remarks field, or recorded cryptically. VertNet’s goal is to unlock these data and to make them easy to discover. One “hidden” data type, that is often digitized, is trait information, and recently, we have developed tools that help these data see the light of day. These tools will be expanded over time.
What is a trait?
“A trait constitutes any measureable or observable morphological, structural, behavioral, physiological or phenological characteristic of an organism. Traits are expressed as phenotypes, which define key aspects of organismal fitness and ecological roles in communities.” - The importance of digitized biocollections as a source of trait data and a new resource: VertNet Traits [paper in review].
Traits include, but are not limited to:
- Length and mass of a whole specimen
- Length and mass of a part of a specimen, including the beak, limb, wing, gonad, neck, or tail.
- Color of a specimen, including that of eyes, hair, feathers, or fins.
In the VertNet portal, trait and attribute data is often expressed from our providers in myriad ways. For example, in a 2014 review of DwC:sex field content of 2.7 million records in VertNet, we determined that there were 189 distinct and understandable values for male, 184 for female, and 331 for unresolved. In a standardized field, such as sex field in Darwin Core (which we will sometimes refer to as DwC:sex, with “DwC” meaning Darwin Core), this is stunning heterogeneity, as there are only six (6) terms that should be present; unknowable, undetermined, female, male, hermaphrodite, and gynandromorph ( http://terms.tdwg.org/wiki/dwc:sex). We expect that non-standardized fields, such as dynamicProperties, occurrenceRemarks, and fieldNotes have much greater diversity in content.
Where did we find trait data and how did we aggregate it?
To help users of the VertNet portal find and make sense of these data, we have built and implemented some new tools and upgrades to the VertNet search interface. We’ve written a paper detailing our efforts, which you can soon read to learn about the details (but we’d like to wait until it is through the review process fully), but below are the bare minimum details to get you started.
To date, we have focussed our efforts on two traits and two attributes critical for proper interpretation of those traits: body length measurements and body mass measurements (traits), and sex and life stage (attributes). We focussed our attention on three Darwin Core fields, in which we expected to discover trait data available in records aggregated in VertNet: dynamicProperties, occurrenceRemarks, and fieldNotes. As we perfect this process, we’ll include additional traits, and additional Darwin Core fields, over time. The following is an overview of the process we implemented to discover buried trait data and to make it discoverable in the VertNet portal.
- Review of all of the records in VertNet (18.2 million, when we began). Of these, 6.6 million contained content in three key Darwin Core fields; dynamicProperties, occurrenceRemarks, and fieldNotes.
- Examine subsets of the 6.6 million records and categorize the common expressions of body length and body mass characteristics, and common abbreviations and synonyms. For example, with length, we sorted the records into informal subclasses that we observed in the data, such as “total length”, “snout-vent length”, “standard length”, and “fork length”. From this we collated key measurement approaches, and also shorthands used related to those measurement practices. As an example, we were able to take abbreviations such as SVL and determine this meant “snout-vent length”, a length measurement almost always used by herpetologists, but not by those working in other vertebrate taxonomic communities.
- Once we had a list of these terms, we developed software for identification and extraction of trait data from the sorted records using customized Regular Expressions (regex), searching first for common terminology, such as “total length” and then again for more infrequent and irregular terms, such as “TL” or “ToL”.
- Once we had initial extractions, we built tools to help harmonize those outputs to standard reporting practices. This yielded output in consistent terminology e.g. “total length”, “snout-vent length” along with standard measurement values and standard units, such as millimeters and grams, where possible.
- Manual validation of the output, standardized terms, values, and units, to confirm accuracy of the regex and standardization processes. Again and again, until we were sure the error rates were low.
- Finally, we returned these extracted and validated data back into the VertNet portal along with novel fields such as “hasLength” and “hasMass”, and associated those data to their original specimen records, using a python script that we named Traiter.py ( https://github.com/rafelafrance/traiter). A new index of trait data was created and associated with the portal so that these data are now searchable. All length and mass measurements and units are indexed so that they will return in queries that include “Has length”, “Has mass”. We also provide a means to query on ranges of values in the portal. Don’t forget that we extracted sex and life stage values, which were subjected to the same validation process, to help complete the picture given by DwC:sex and dwc:lifeStage fields. We’ll explain how these are integrated into the portal in examples below.
Using the VertNet Advanced Search for Traits
Now that hidden trait data for length, mass, sex, and life stage are available in the VertNet portal, let’s explore a couple of examples that demonstrate how to build queries to find them and how to interpret the portal results.
A Sample Search for Traits
Now you can search for records that contain measurements for length and mass and descriptions of life stage and sex. Please note that when searching for length, the measurement that will be returned reflects the single measurement for, effectively, a whole organism length, or the closest value we could identify as the whole organism length (e.g., the length measurement returned would be a snout-vent length). We intend to provide a future means for users to search for parts of organisms, such as ear, beak, hind foot, claw, and tail.
From the main search page, click on the advanced search options link to see more search options.
Advanced Search Options
When the Advanced options open, you’ll find the “These traits” section toward the middle of the window. This section contains two types of inputs. The check boxes, “Has Length”, “Has mass”, “Has sex”, and “Has lifestage” when clicked will return records that contain content that describe those traits. The text fields for “length in mm” and “mass in g” allow you to enter a range of values for length and mass.
Let’s say you’re looking for records of rodents that have a length measurement. The first thing to remember when searching Rodentia is to never trust a rodent. When you have accepted this wisdom, you can begin your search by typing Rodentia into the “Darwin Core terms: Order” field. Next, click the “Has length” check box and then click the blue button with the magnifying glass icon to enter your query.
For this query, VertNet will return more than 10,000 records that are both Rodentia and contain a length measurement.
Has Length Results
For some people, this result might be just what they need, but others may need to focus their search. Let’s say you need rodents with a Total Length (or TL) within a particular range - not R.O.U.S. lengths (win a free rodent from us if you know the reference!) - but something between 400 and 800 millimeters in length. To narrow your search, simply click on the advanced search options link again. When the search options are displayed, you’ll see that “Has length” is still checked and order:Rodentia is present. Now, click the “length type” drop down menu and select “total length” (you’ll see all of the different length options available when you click the drop down). Next, type 400 into the first “length in mm” text field and 800 in the second “length in mm” field.
NOTE: If you want a specific length, such as 410 millimeters, you can type 410 into both “length in mm” fields. It works similar for mass, and you would enter 410 (or another value) into both of the “mass in g” fields.
VertNet will return 9,374 records that are both Rodentia and have a total length between 400 and 800 millimeters.
Specific Lengths Results
In addition to rodents with a particular length, you may also need to know the life stage of these rodents. To add this, click the Advanced search link once more. Next, click the “Has lifestage” check box, and then return your search.
Your results have been reduced from 9,374 records to 1,765 records.
Finally, you might decide that your rodents all need to have a mass between 400 - 1000 grams. You can focus your search even more by clicking the advanced search options one last time. When the search options are displayed, you’ll see that “Has length” is still checked with the range of lengths between 400 and 800 millimeters, order is still “Rodentia”, and the lifeStage box is still highlighted. To add a mass measurement, simple type 400 into the first “mass in g” field, 1000 into the second field, and enter your query.
The results now include 418 very important Rodentia records that meet your criteria of having a total length between 400 and 800 millimeters, a mass between 400 and 1000 grams, and contain life stage information.
Record Detail Interpretation
When you view records returned as a result of a query (whether you are searching for traits or not), there are some new fields that are worth some explanation.
First, when you click on any record in the VertNet portal from a list of search results, you’ll be taken to the Record Details page. In this example, we’ll look at a single rodent record we found in our queries above, http://portal.vertnet.org/o/cumv/mammal-specimens?id=http-arctos-database-museum-guid-cumv-mamm-18075-seid-2100213.
When viewing the metadata associated with a given record, you will find three (3) tabs at the top of the page; Record detail, Data set, and Index.
This document will highlight the Record detail tab, but note that the Data set tab provides publisher and citation data, while the Index tab contains IDs, boolean values, and other data that VertNet uses to return a record when queried.
The Record detail tab, contains all of the data that the data publishers, in this case, the Cornell University Museum of Vertebrates, has elected to publish.
As you scroll down this page, you’ll see all Darwin Core and some of the extra terms we’ve added, and their corresponding values. There can be quite a lot of terms depending upon the completeness of the record, so we’ve built in category views to allow you to review specific terms easily. Simply click on the Record detail tab and the list of categories will appear. To find the terms related to traits, click on Trait.
Record Detail Categories
The page you will see contains all of the trait and attribute terms that VertNet has identified in the record. Some will be Darwin Core terms (sex, lifeStage) and others will be trait terms added to the index by VertNet (Length in mm, Length type, Are length units inferred?, Mass in grams, Are mass units inferred?, LifeStage From Source, and Sex From Source).
Sex and lifeStage
Some institutions include values for sex and lifeStage in the corresponding Darwin Core fields, but not all of them do. These data, if recorded, can be found in many different data fields. If values are not present in DwC:sex or DwC:lifeStage, then VertNet looks through three additional Darwin Core fields for data; dynamicProperties, occurrenceRemarks, and fieldNotes.
In our sample record, the value “male” was present in the original record in the DwC:sex field. Thus, in the Trait Term tab, Sex = male. Also in this tab you will see the term, Sex From Source. In this case, Sex From Source = male. When the term Sex From Source is present, that means that the DwC:sex field was populated with a value and VertNet did not search for a value in any other fields.
sex and lifestage
For a record in which DwC:sex = male, but the term Sex from Source is NOT present, that should indicate to you that there was no value in DwC:sex in the original record and that VertNet found a value in another field (dynamicProperties, occurrenceRemarks, or fieldNotes) that we have determined to represent the sex of that record. As a result, VertNet has populated the DwC:sex field with the value. In this case, it is recommended that you review these other fields in the record to verify to your satisfaction that VertNet’s derivation is correct.
The same is true for values in the DwC:lifeStage field. In those cases, if a value is present in DwC:lifeStage in the original, then the Trait Term tab will include the term LifeStage From Source. If there is a value in DwC:lifeStage, but the term LifeStage From Source is NOT present, then VertNet has populated that field with a value found in another field from the record.
Length and Mass
Measurements for length and mass do not have specific Darwin Core terms that they can call home. Instead, these values are often aggregated into DwC:dynamicProperites or included in other fields, such as occurrenceRemarks and fieldNotes. VertNet uses tools we have developed to search these fields for values that represent the length and mass of a specimen and, when found, to populate some additional data fields.
In our example, the dynamicProperties field (shown below, but which can be viewed in the All Terms category under the Record detail) contains the measurements and units for a total length, (or TL, T.L., etc.) and a mass. It also contains measurements for a variety of other traits, not all of which are searchable yet (see the NOTE below).
When searching for length and mass measurements, VertNet’s trait tools found the value “total length=555 mm”. The result in the Record detail is that the term “Length in mm = 555”. The trait tools also found “total length” in that string of values, thus “total length” is listed as the Length type. An additional field is also present, “Are length units inferred?”. In this case, “Are length units inferred?” = No, because the units (mm) were given in dynamicProperties. In the case where “Are length units inferred” equals “Yes”, it means we assume millimeters as the units given common practice. In cases where units ARE inferred, we recommend proceeding with more caution.
**Please note, that at as of September 2016, VertNet only provides one harmonized and searchable measurement for length despite cases where there are multiple length measurements reported on the original record. Multiple, original measurements occur because a specimen can be measured more than once, or because of different kinds of length measurements being recorded, e.g. "total length" and "snout-vent length. We look for patterns signifying a “total length” first, and therefore it has priority when extracted, potentially leaving other measures undiscovered. Therefore, in cases where there is both a “total length” measurement and say, “snout-vent length” measurement, we report “total length”. As a result, we recommend that you explore the records that have extracted length measurements carefully for additional data - there may still be lots of data to be discovered. We hope to do further work to provide the full variety measurements from the published records in the future.
length and mass
The same process occurs for mass measurements. In this example, Vertnet’s trait tools found the value “weight=913 g”, thus VertNet assigns the value 913 to the term “Mass in grams”. Because the units (g) were present, the term “Are mass units inferred?” = No. If the “Are mass units inferred?” equals “Yes”, this means that no units type was reported but we assumed grams, which is the common unit type for mass measurements.
If you have a question about the values in any record, please feel free to contact us directly using our feedback form or go and get yourself a GitHub username and submit an issue directly from the Vertnet portal by clicking on the green button at the top of each record page.
If you have any questions about this document, please contact VertNet's support team.
Visit our Help page for more resources created for the VertNet project.
Orig Release, 19Sep2016 (David Bloom)