Resources

Portal Syntax: Field-specific Keyword Search

The Basics

In some cases, it may be helpful for you to use a field-specific keyword search to discover relevant records within the VertNet portal. This process is similar to the standard full text/keyword search available using the VertNet Portal. Rather than just typing in one or more keywords, which will return all records that contain one or more of those terms, you can direct the portal to look only within specific Darwin Core fields.

To do this, simply type the name of the Darwin Core field you want the portal to search followed by a colon (:) and the term you want to identify - make sure you don’t use any spaces. The Darwin Core fieldnames should be lowercase.

darwincorefieldname:term

With this syntax you are instructing the portal to limit your keyword search to match the your search term within the specific Darwin Core field. No other fields within the record will be returned - unless you add additional terms or filters to your search string.

For example, if you’re looking for records the describe the black-footed ferret, Mustela nigripes, you can find them by searching for its scientific name explicitly. This can be accomplished by typing genus:mustela specificepithet:nigripes into the full text/keyword search field in the portal.

The records returned will contain all of the records within VertNet that contain the term mustela in the dwc:genus field AND nigripes within the dwc:specificepithet field.

You can see how this will look within the portal, along with the results of the query by using the following URL: http://portal.vertnet.org/search?q=genus:mustela+specificepithet:nigripes

Available search fields

The following fields are indexed and available for searching (we’re always looking to add more fields when possible — these are available as of 19Sep2016):

RECORD-LEVEL
institutioncode
collectioncode
catalognumber
dctype (dcterms:type)
license (dcterms:license)
iptlicense (eml:intellectualRights)
haslicense (dcterms:license or eml:intellectualRights has a license designated) {'0','1'}
basisofrecord {PreservedSpecimen, FossilSpecimen, MaterialSample, Occurrence, MachineObservation, HumanObservation}
isfossil (dwc:basisOfRecord is FossilSpecimen or collection is a paleo collection) {'0','1'}
hasmedia (has dwc:associatedMedia) {'0','1'}

OCCURRENCE
iptrecordid (same as dwc:occurrenceID)
recordedby
recordnumber
fieldnumber
establishmentmeans
wascaptive (dwc:establishmentMeans or occurrenceRemarks suggests it was captive) {'0','1'}
wasinvasive (was the organism recorded to be invasive where and when it occurred) {'0','1'}
sex (standardized sex from original sex field or extracted from elsewhere in the record)
lifestage (lifeStage from original sex field or extracted from elsewhere in the record)
preparations
hastissue (has dwc:preparation that suggests tissue is available) {'0','1'}
reproductivecondition

EVENT
eventdate
year
month
day
startdayofyear
enddayofyear

LOCATION
continent
country
stateprovince
county
municipality
island
islandgroup
waterbody
locality
geodeticdatum
georeferencedby
georeferenceverificationstatus
location (a Google GeoField of the dwc:decimalLatitude, dwc:decimalLongitude)
mappable (has valid dwc:decimalLatitude, dwc:decimalLongitude) {'0','1'}

GEOLOGICAL CONTEXT
bed
formation
group
member

IDENTIFICATION
typestatus
hastypestatus (dwc:typeStatus is populated) {'0','1'}

TAXON
kingdom
phylum
class
order
family
genus
specificepithet
infraspecificepithet
scientificname
vernacularname

TRAIT
lengthinmm (length measurement extracted from the record) {number}
massing (mass measurement extracted from the record) {number}
hasmass (was a value for mass was extracted?) {'0','1'}
haslength (was a value for length was extracted?) {'0','1'}
haslifestage (does the record have life stage) {'0','1'}
hassex (does the record have sex) {'0','1'}

DATA SET
gbifdatasetid (GBIF identifier for the data set)
gbifpublisherid (GBIF identifier for the data publishing organization)
lastindexed (date the record was most recently indexed into VertNet) {'YYYY-MM-DD'}
networks {MaNIS, ORNIS, HerpNET, FishNet, VertNet, Arctos, Paleo}
migrator (the version of the migrator used to process the data set) {'YYYY'-'MM'-'DD'}
orgcountry (the country where the organization is located)
orgstateprovince (the first-level administrative unit where the organization is located)

INDEX
rank (a higher number means the record is more complete with respect to georeferences, scientific names, and event dates) (see rec_rank() in https://github.com/VertNet/post-harvest-processor/blob/master/lib/vn_utils.py) {1-12}
vntype {specimen, observation}
hashid (a value to distribute records in 10k bins) {0-9998}
verbatim_record (the whole record)

You can always check on the terms for which this type of function is possible in our GitHub repository wiki for the VertNet webapp. The fields listed on that wiki page are indexed and available for searching directly. The list is about half-way down the page in Section: Field-specific keyword search. The fields in standard font are Darwin Core fields. The fields in italics are fields added by VertNet to increase opportunity for data users to discover useful records. The fields presented in the Traits section above are described in detail on our Traits Guide. The fields in bold are described below.

Examples and Syntax

As part of VertNet’s harvest and index processes, records that meet specific criteria are flagged. For example, for mappable, we’re looking for the presence of coordinates in the decimallatitude and decimallongitude fields. If coordinates are indeed present, the record is flagged as mappable. If a record contains a value in the associatedmedia field, it is flagged as containing media. For a record to become flagged as containing tissue, the record must contain at least one term that matches a predetermined list of terms within the preparations field. So for any of the flag fields, 0 = no content/no match and 1 = content/match. Thus, if you want records with media you can type hasmedia:1, or, if you want records that are not mappable you can type mappable:0.

In most cases, all you need to know is the term you want to search for within one of those specific fields. For example, you could search for class:amphibia or occurrenceid:95d1e8f1-1ed8-11e3-bfac-90b11c41863e or stateprovince:Colorado. In some cases, such as eventdate, you might want to know the format of the content for which you want to search (e.g., 1975-05-15 vs 4/1/32 vs 1961).

For those terms in the above list in bold, the portal is simply searching for the presence of content (any content) within those fields, thus if you want records with media you can type hasmedia:1 (where 0 = a field without that content and 1 = a field with that content) or records that are not mappable using mappable:0.

In many cases, we've provide you with an opportunity to do this automatically using the Advanced Search by by typing in a Darwin Core term box (e.g., InstitutionCode or Genus) or by clicking an option box (e.g., Has media or Is mappable) - see VertNet’s Guide To Advanced Search Options For Filtering to learn more about Advanced Search.

For queries in which you are searching for more than one term from the same Darwin Core field (e.g., two institutions or multiple catalog numbers) the following syntax should be used in the main keyword/full text field, for example, if all records from ISM and HSU are required then type: institutioncode:ISM OR institutioncode:HSU (The OR operator instructs the portal to look for all records with the institution code ISM or HSU. The OR operator must be in capital letters.) Only one term can be used in the Darwin Core fields provided in the Advanced Search options.

IMPORTANT: Regardless of how you conduct your search, it is critical that you know, or be willing to guess at, all of the possible variations of a given term. For example, if you are searching for all records within the United States, entering the term United States, in the Darwin Core field in Advanced Search or using the keyword syntax country:United States, will return only those records that contain the term United States in the Darwin Core Country field - it will not return records with US or U.S. It is worth noting that this query will also return all records that contain United States of America because the term United States is contained within it. The reverse is not true.

To discover all of these records you can either conduct an independent search and download for each term or you can create a compound query in the main keyword/full text field of the portal. This search syntax would look like country:”United States” OR country:US OR country:”U.S.” OR country:USA. Using the OR operator you instruct the the portal to identify all records that contain the term United States or US or U.S. or USA within the Country Darwin Core field.

NOTE: It is recommended that you surround any compound term or term that contains punctuation with quotation marks, otherwise the portal may interpret the term United States as two distinct terms, because of the space, or a period (.) or comma (,) as a query operator. If you are using the Advanced Search options, the portal will add the quotation marks for you.

Other similar queries include:

  • Multiple version of the same individual in the RecordedBy field: Grinnell vs. J. Grinnell vs. Grinnell, J P vs. JGrinnell - Note that in this case, a search just for Grinnell will return all of the records with Grinnell, J. Grinnell, Grinnell, J P, etc.., but it will not find JGrinnell because that is interpreted as a unique term.
  • Multiple names and acronyms in fields such as stateprovice: CA vs. Cali vs. California
  • Multiple spellings for the same taxonomic field: getula vs. getulus

The syntax provided above will allow you to capture multiple possible terms within a single query for any of the field-specific options available in the portal. Oh, and yes, you can search for multiple fields at the same time. For example, all records at the California Academy of Sciences and Museum of Vertebrate Zoology recorded by Joseph Grinnell with catalog numbers 5238 and 5241 would be entered as:

(institutioncode:CAS OR institutioncode:MVZ) (recordedby:grinnell OR recordedby:jgrinnell) (catalognumber:5238 OR catalognumber:5241)

It even works!

http://portal.vertnet.org/search?q=(institutioncode:CAS+OR+institutioncode:MVZ)+(recordedby:grinnell+OR+recordedby:jgrinnell)+(catalognumber:5238+OR+catalognumber:5241)

If you have any questions about this document, please contact VertNet's support team.

Visit our Help page for more resources created for the VertNet project.


Orig Release, 25Sept2014 (David Bloom) Updated, 20Sep2016 (Laura Russell)