By BSD –

Curated by Business Science Daily — peer-reviewed sources, human-verified. Learn more

About Our Curation Process

Business Science Daily curates academic research in business and economics. Each featured study is selected from reputable, peer-reviewed journals, institutional repositories, or working papers (e.g., Elsevier, Sage, NBER, SSRN).

Articles are carefully summarized to ensure clarity and accuracy, with direct citations or links to original sources. Our process emphasizes transparency, academic integrity, and accessibility for a broader audience.

Learn more in our Editorial Standards & AI Policy.

The use of AI in the workplace comes with clear benefits — but also trade-offs. The key question to ask is not simply whether employees use AI, but how they use it and which parts of their work are actually affected.

In fact the bigger picture of the research gap is not about the existing studies objectives in examining AI in controlled laboratory settings or focus on narrow, repetitive tasks. But focusing on different aspects of knowledge work. Relevantly, a consultant, lawyer, or analyst doesn’t perform the same task repeatedly—they juggle creative ideation, data analysis, persuasive writing, strategic reasoning, and client communication, often within a single project.

Therefore, it should be ask how does AI perform across this range of activities? And more importantly, how do humans collaborate with AI when the tasks themselves shift constantly?

The following working paper by Dell’Acqua, McFowland, Lifshitz-Assaf, Kellogg, Rajendran, Krayer, Candelon, and Lakhani looks closely at the specific tasks employees rely on AI for and how this changes the way they perform their jobs. It explores the practical reality of AI adoption: what workers use it for, how it shapes their workflow, and how it influences outcomes.

Business Science Daily — Research Analysis

Working Paper 2023 Harvard Business School & BCG Dell’Acqua, McFowland, Mollick, Lifshitz-Assaf, Kellogg, Rajendran, Krayer, Candelon, Lakhani AI GPT-4 Knowledge Workers Productivity Jagged Frontier Field Experiment

Navigating the Jagged Technological Frontier: Effects of AI on Knowledge Worker Productivity and Quality

Field experimental evidence with 758 BCG consultants shows AI inside its capability frontier boosts productivity by 12.2% and quality by 40%+, but outside the frontier it decreases accuracy by 19 percentage points.

        Why this matters: The public release of Large Language Models has sparked intense interest in how humans will use AI for knowledge work. This pre-registered randomized experiment with 758 consultants (∼7% of BCG’s individual contributor cohort) reveals a “jagged technological frontier”—some seemingly similar tasks are easily done by AI while others are outside current capabilities.
    

“One set of consultants acted as ‘Centaurs,’ dividing and delegating their solution-creation activities to the AI or to themselves. Another set acted more like ‘Cyborgs,’ completely integrating their task flow with the AI and continually interacting with the technology.”

Summary

Inside Frontier

Outside Frontier

Collaboration Strategies

Implications

The Jagged Technological Frontier

WHAT IS THE JAGGED FRONTIER?

AI capabilities are uneven—tasks that appear similar in difficulty can be on different sides of AI’s capability boundary. Inside the frontier, AI dramatically boosts performance. Outside, it causes errors.

Inside frontier examples: Creative writing, brainstorming, drafting memos, idea generation

Outside frontier examples: Tasks requiring hidden context integration, nuanced data+interview synthesis, problems with traps

Core Findings:

Inside the frontier: Consultants using AI completed 12.2% more tasks, worked 25.1% faster, and produced 40%+ higher quality outputs compared to control group.
Outside the frontier: Consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI.
Skill distribution: Below-average performers improved by 43% with AI, above-average by 17%.
Collaboration patterns: “Centaur” (strategic division of labor) and “Cyborg” (tight integration) approaches emerged.
Homogenization effect: AI-assisted ideas were higher quality but less diverse across participants.

Methodology:

Sample: 758 BCG strategy consultants (∼7% of global individual contributor cohort)
Design: Pre-registered randomized experiment with baseline task, then random assignment to control, GPT-4 access, or GPT-4 + prompt engineering overview
Inside frontier task: Creative product innovation (footwear) with 18 subtasks
Outside frontier task: Business problem-solving with quantitative data and interviews containing hidden trap

Sample Size

758

BCG consultants

Quality Increase

+40%

Inside frontier

Accuracy Drop

-19pp

Outside frontier

Inside the Frontier: Quality & Productivity Booster

Control Group Quality

4.1

(1-10 scale)

GPT Only

+1.56

38% increase

GPT+Overview

+1.75

42.5% increase

Completion Rate

82% → 93%

+12.2% more tasks

Time Saved

23 minutes

27.6% faster

Performance by Skill Level:

Bottom 50% Performers

+43%

Improvement in experimental task compared to baseline

Top 50% Performers

+17%

Improvement in experimental task compared to baseline

AI benefits lower performers more, narrowing the skill gap.

Homogenization Effect:

AI-assisted ideas were higher quality but less variable across participants. Semantic similarity analysis showed reduced diversity of ideas among AI users.

Human Only

High diversity of ideas
Lower average quality

AI-Assisted

Higher quality ideas
But everyone produces similar ideas

Key Results Table:

Condition	Quality Score	vs Control	Completion Rate
Control (no AI)	4.10	—	82.4%
GPT Only	5.66	+38%	91.4%
GPT + Overview	5.85	+42.5%	93.5%

All effects significant at p<0.001. Quality measured on 1-10 scale by human graders.

Outside the Frontier: Quality Disruptor

THE OUTSIDE-FRONTIER TASK DESIGN

Consultants analyzed a business case with financial data and interviews. The spreadsheet alone suggested one conclusion, but careful reading of interviews revealed the opposite answer. GPT-4 typically missed this nuance.

📊 AI’s approach: Look at spreadsheet numbers → pick Channel A

👤 Correct human approach: Read interviews → Channel B has hidden advantages

Correctness Results:

Control Group

84.4%

correct answers

GPT Only

70.5%

-13.9pp

GPT+Overview

60.0%

-24.5pp

The Quality Paradox:

Correctness

-19pp

AI groups were significantly less accurate

Recommendation Quality

+25%

But their memos were rated more persuasive

Critical finding: AI generates fluent, persuasive text even when factually wrong. Humans often fail to catch errors because the output “looks good.”

Detailed Results Table:

Condition	Correctness	vs Control	Recommendation Quality
Control	84.4%	—	5.86
GPT Only	70.5%	-13.9pp	6.91 (+1.05)
GPT+Overview	60.0%	-24.5pp	7.34 (+1.48)

Time Savings (outside frontier):

Condition	Time Spent	vs Control
Control	37.7 minutes	—
GPT Only	30.9 minutes	-18%
GPT+Overview	26.4 minutes	-30%

“For a task selected to be outside the frontier, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI.”

Collaboration Strategies: Centaurs & Cyborgs

TWO MODELS OF HUMAN-AI COLLABORATION

Analysis of user logs revealed two distinct patterns among successful consultants:

Centaur Strategy

Named after the mythical half-human/half-horse creature—users strategically divide tasks between human and AI based on relative strengths.

Division of labor: Clear handoffs between human and AI
Human leads: Data analysis, strategic thinking, core recommendations
AI supports: Drafting, refining, polishing, formatting

Example workflow: “Human analyzes financial data → Human decides recommendation → AI drafts memo to CEO → Human reviews and edits”

Cyborg Strategy

Named after science fiction hybrid beings—users tightly integrate with AI through continuous back-and-forth iteration.

Tight integration: Subtask-level collaboration
Practices include: Assigning persona (“act as a consultant”), requesting editorial changes, teaching through examples, validating outputs, demanding logic explanations, pointing out contradictions

Example interaction: “Act as a consultant… → AI responds → Revise that, focus on X → AI revises → Explain your logic → AI provides reasoning → Point out contradiction → AI adjusts”

Comparison of Strategies

Feature	Centaur	Cyborg
Relationship	Strategic division of labor	Tight integration
Handoffs	Clear, task-level	Continuous, subtask-level
AI role	Tool/assistant	Collaborator/partner
Human role	Director, decision-maker	Co-creator, validator

Key Insight:

Both patterns emerged among successful users. The choice may depend on task type, user skill, and familiarity with AI. Some users switched between modes depending on the subtask.

“I did the thinking, AI did the writing. Perfect division.” — Centaur user

“We went back and forth until it got it right. It felt like a partner.” — Cyborg user

Retainment Analysis:

Average retainment (copying AI output directly) was 0.87 on a 0-1 scale. Higher retainment correlated with higher quality (coefficient 1.21, significant). Training increased retainment.

Implications & Future Research

KEY TAKEAWAYS

+40% quality inside frontier -19pp accuracy outside 43% gain for bottom performers Centaur & Cyborg strategies Homogenization risk

Theoretical Contributions:

Jagged technological frontier: AI capabilities are uneven—tasks of similar perceived difficulty may be on different sides of the frontier.
Human-AI collaboration patterns: Identifies Centaur and Cyborg behaviors as distinct integration strategies.
Performance heterogeneity: AI benefits bottom performers more, potentially democratizing expertise.
Quality vs. correctness tradeoff: AI can improve persuasiveness while decreasing accuracy—a dangerous combination.

Organizational Risks:

Training deficit: Firms may stop giving junior workers “inside frontier” tasks, stunting skill development.
Homogenization: Everyone produces similar high-quality output → less innovation.
Persuasive errors: AI makes wrong answers look convincing → harder to catch mistakes.

Practical Implications:

Training matters: Prompt engineering overview improved performance inside frontier but increased over-reliance outside frontier—training must include awareness of limitations.
Task selection is critical: Organizations need to map which tasks are inside vs. outside AI’s current frontier.
Diverse AI ecosystem: Consider using multiple LLMs or human-only involvement to counteract homogenization.

Complete Results Summary Table:

Experiment	Metric	Control	GPT Only	GPT+Overview
Inside Frontier	Quality (1-10)	4.10	5.66***	5.85***
	Completion %	82.4%	91.4%***	93.5%***
	Time (minutes)	50.0	27.6***	29.5***
Outside Frontier	Correctness %	84.4%	70.5%***	60.0%***
	Recommendation Quality	5.86	6.91***	7.34***
	Time (minutes)	37.7	30.9***	26.4***

*** p<0.001, ** p<0.01, * p<0.05. All comparisons vs control group.

Policy Implications:

Responsible AI: Need for safeguards when AI is used for high-risk tasks.
Education: Formal training needed to build frontier-navigation skills.
Diverse AI ecosystem: Use multiple models to counteract homogenization.

“Overall, AI seems poised to significantly impact human cognition and problem-solving ability. Similarly to how the internet dramatically reduced the marginal cost of information sharing, AI may be lowering the costs associated with human thinking and reasoning, with potentially broad and transformative effects.”

References

Dell’Acqua, F., McFowland III, E., Mollick, E., Lifshitz-Assaf, H., Kellogg, K.C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K.R. (2023). Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Harvard Business School Working Paper 24-013.

Key References: Brynjolfsson, E., Li, D., & Raymond, L.R. (2023). Generative AI at work. NBER Working Paper w31161. • Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2023). GPTs are GPTs: An early look at the labor market impact potential of large language models. • Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. • Lebovitz, S., Lifshitz-Assaf, H., & Levina, N. (2022). To engage or not to engage with AI for critical judgments. Organization Science, 33(1), 126–148.

Data Source: Randomized field experiment with 758 BCG consultants. Pre-registered design with baseline task, random assignment to control, GPT-4 access, or GPT-4 + prompt engineering overview.

Acknowledgement: Funding provided in part by Harvard Business School.