Skip to the content.

Question:

Which research fields are connected to Genomics by research terms in Google Scholar?

Answer:

We analysed more than 33,000 researcher profiles to find out:

Crawling Google Scholar research terms starting from Genomics.
Figure 1: Crawling Google Scholar research terms starting from Genomics.

Click the figure to enlarge or download PDF

The area of each circle is proportional to the number of Google scholar profiles using the term that were included in the analysis. Each of the smallest circles represent 200 highly-cited researchers. The biggest circles represent over 1000 highly-cited researchers. The total analysis involved data from more than 33,000 highly-cited researcher profiles. Edges are formed if two terms often appear together on highly-cited Google scholar profiles. The threshold used for this graph was 9.5%. The underlying data is very high-dimensional, so the threshold is set high enough to get a reasonably flat depiction of the core relationships between terms. There is asymmetry in connections because some terms are more commonly used than others, so the number of profiles with a pair of terms is a smaller percentage of the profiles containing the more common term. Hence common terms tend to get incoming edges (Neuroscience, Evolution and Ecology, Genomics, Bioinformatics), and rarer terms get outgoing edges.

Pseudocode:

MAX_LABELS_IN_NETWORK <- 230; 
TOP_RESEARCHERS_PER_LABEL <- 200;
labels <- {"Genomics"}; // one or many 
researchers <- top("Genomics", TOP_RESEARCHERS_PER_LABEL); 
while (size(labels) < MAX_LABELS_IN_NETWORK) { 
  // gets the next most common label not yet in the labels list 
  next_label <- get_next_label(researchers, labels) 
  researchers <- union(researchers, top(next_label, TOP_RESEARCHERS_PER_LABEL); 
} 
// labels is now size MAX_LABELS_IN_NETWORK and has the most common labels 
// number of researchers is now <= MAX_LABELS_IN_NETWORK * TOP_RESEARCHERS_PER_LABEL 
graph <- createGraphWithEdges(labels, researchers); 
plot(graph) 

The plot uses igraph in R.

Data:

The data used to generate a pool is compressed into several zip files: gs-terms-a-z.zip. It will take time to create a big pool, so that two pools pool-500.RData and pool.genomics.RData are provided for convenience.

We recommend to use the command line, such as unzip src.zip, to uncompress zip files.

More Applications:

Who is using BEAST or Geneious? Can any relation be revealed?

Citation:

Alexei Drummond and Dong Xie, Scholar Relational Network, https://github.com/alexeid/gscholar-networks.