← Back to Blog
Research

Reproducibility and Validity: Building Trust in Performance Research

By Luiz Soares
β€’2024-12-04β€’10 min read
πŸ“°

A transparent discussion of study limitations, validity threats, and step-by-step instructions to replicate the Go vs Java, gRPC vs REST experiment.

No empirical study is complete without an honest assessment of its limitations. Transparency about validity threats not only builds trust in the findings but also helps practitioners understand how to apply the results to their specific contexts.

Hardware and software optimizations represent our first validity threat. While we controlled the experimental environment (Ubuntu Server minimal installation, consistent priority levels, dedicated hardware), factors like JIT compilation warming, garbage collection timing, and CPU cache effects could influence individual measurements. However, these factors affect all implementations equally, and our repetition design helps average out transient effects.

Payload representativeness is another consideration. While StdSize (~0.76 KB) was designed to represent typical web API requests based on industry data, and LargeSize (~781 KB) tests extreme conditions, the definition of 'typical' varies across applications. Your specific use case might have different payload distributions.

The simplified application design was a deliberate choice to isolate communication and language effects. However, real applications involve database queries, business logic, and concurrent processing. Future work should explore how these factors interact with our findings.

To support reproducibility and enable validation of our findings, the complete experimental artifacts are publicly available. The repository contains all four implementations (Go+gRPC, Go+REST, Java+gRPC, Java+REST), the client application for load generation, collected timing data, R scripts for statistical analysis, and documentation for environment setup.

Future research directions include: testing additional payload sizes to identify precise crossover points between gRPC and REST advantages; varying request frequencies to understand behavior under different load patterns; evaluating different infrastructure configurations (cloud vs. bare metal, different CPU architectures); and incorporating compute-intensive handlers to measure language-specific concurrency benefits.

We encourage the community to replicate our experiments, challenge our findings, and extend the research. Science advances through verification and extension, and performance research is no exception.

#Validity#Reproducibility#Performance#Open Science
LS

About Luiz Soares

Full-Stack TechLead specializing in AI products, RAG systems, and LLM integrations. Passionate about building scalable solutions and sharing knowledge with the tech community.

Get in touch β†’