Those who receive the results of modern data analysishave limited opportunity to verify the results by direct observation.Users of the analysis have no option but to trust the analysis,and by extensionthe software that produced it.Both the data analyst and the software provider therefore have a strong responsibilityto produce a result that is trustworthy,and, if possible,one that can be shown to be trustworthy.
…Both the data analyst and the software provider therefore have a strong responsibility to produce a result that is trustworthy, and, if possible, one that can be shown to be trustworthy…
The Problems with LLMs
Data provenance: Training data is proprietary and undisclosed
Data lineage: Cannot trace how inputs become outputs
Data contracts: No guarantees about output format or correctness
Observability: Decision-making process is opaque
Data dictionaries: No clear mapping of concepts or definitions used
Monitoring: No way to detect when model behavior changes
Versioning: Model updates are released without change logs
Unit tests: Non-deterministic outputs cannot be reliably tested
Code review: Cannot inspect the “reasoning” that led to an answer
Disaggregation: Cannot break down how confidence is distributed
Metadata: Limited information about training, capabilities, or limitations
Confidence intervals: Provides fluent answers with no statistical uncertainty