2026
8
- Building a Financial Hallucination-Suppression Dataset Pipeline
- The Architecture of Financial Intelligence: A Comprehensive Analysis of Large Language Models in Finance
- MCP 还没做完,又要 CLI 了 — The Token-Budget Debate Behind AI Agent Interfaces AReaL — Asynchronous RL Infrastructure for LLM Agents QED — A Multi-Agent Math Proof Pipeline Built on CLI Subprocesses From Wire Format to Design Philosophy — Why Claude Code Skills Look The Way They Do
- How Claude Code Skills Work Under the Hood
- Test Time LLMs
2024
6
2023
17
- env
- Soft Actor-Critic (SAC) Deep RL Off-policy and multi-step learning Approximate Dynamic Programming RL - step7 - RLstep7 rl-Convergence and divergence RL - step5 - RLstep5
- RL - step4 - RLstep4
- RL - step3 - RLstep3
- RL - Intro - RLIntro Exploration and Exploitation
- Topics
- Image Processing Notes personal
- Entropy
- ID & OOD