Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Skills Officially Comes to Codex (developers.openai.com)
72 points by rochansinha 5 hours ago | hide | past | favorite | 33 comments




Skills, plugins, apps, connectors, MCPs, agents - anyone else getting a bit lost?

In my opinion it’s to some degree an artifact of immature and/or rapidly changing technology. Basically not many know what the best approach is, all the use cases aren’t well understood, and things are changing so rapidly they’re basically just creating interfaces around everything so you can change flow in and out of LLMs any way you may desire.

Some paths are emerging popular, but in a lot of cases we’re still not sure even these are the long term paths that will remain. It doesn’t help that there’s not a good taxonomy (that I’m aware of) to define and organize the different approaches out there. “Agent” for example is a highly overloaded term that means a lot of things and even in this space, agents mean different things to different groups.


It reminds me of llm output at scale. Llms tend to produce a lot of similar but slightly different ideas in a codebase, when not properly guided.

They’re all bandaids

All marketing names for APIs and prompts. IMO you don't need to even try to follow, because there's nothing inherently new or innovative about any of this.

I already was doing something similar on a regular basis.

I have many "folders"... each with a README.md, a scripts folder, and an optional GUIDE.md.

Whenever I arrive at some code that I know can be reused easily (for example: clerk.dev integration hat spans frontend and backend both), I used to create a "folder" of the same.

When needed, I used to just copy-paste all the folder content using my https://www.npmjs.com/package/merge-to-md package.

This has worked flawlessly well for me uptil now.

Glad we are bringing such capability natively into these coding agents.


It's so nice that skills are becoming a standard, they are imo a much bigger deal long-term than e.g. MCP.

Easy to author (at its most basic, just a markdown file), context efficient by default (only preloads yaml front-matter, can lazy load more markdown files as needed), can piggyback on top of existing tooling (for instance, instead of the GitHub MCP, you just make a skill describing how to use the `gh` cli).

Compared to purpose-tuned system prompts they don't require a purpose-specific agent, and they also compose (the agent can load multiple skills that make sense for a given task).

Part of the effectiveness of this, is that AI models are heavy enough, that running a sandbox vm for them on the side is likely irrelevant cost-wise, so now the major chat ui providers all give the model such a sandboxed environment - which means skills can also contain python scripts and/or js scripts - again, much simpler, more straightforward, and flexible than e.g. requiring the target to expose remote MCPs.

Finally, you can use a skill to tell your model how to properly approach using your MCP server - which previously often required either long prompting, or a purpose-specific system prompt, with the cons I've already described.


Perhaps you could help me.

I'm having a hard time figuring out how could I leverage skills in a medium size web application project.

It's python, PostgreSQL, Django.

Thanks in advance.

I wonder if skills are more useful for non crud-like projects. Maybe data science and DevOps.


Skills are not useful for single-shot cases. They are for: cross-team standardization (for LLM generated code), and reliable reusability of existing code/learnings.

you could for example create a skill to access your database for testing purposes and pass in your tables specifications so that the agent can easily retrieve data for you on the fly.

Oooooo, woah, I didn't really "get it" thanks for spelling it out a bit, just thought of some crazy cool experiments I can run if that is true.

This is great. At my startup, we have a mix of Codex/CC users so having a common set of skills we can all use for building is exciting.

It’s also interesting to see how instead of a plan mode like CC, Codex is implementing planning as a skill.


I’m probably missing it, but I don’t see how you can share skills across agents, other than maybe symlinking .claude/skills and .codex/skills to the same place?

Nothing super-fancy. We have a common GitHub repo in our org for skills, and everyone checks out the repo into their preferred setup locally.

(To clarify, I meant that some engineers mostly use CC while others mostly use Codex, as opposed to engineers using both at the same time.)


one thing that I am missing from the specification is a way to inject specific variables into the skills. If I create let's say a postgres-skill, then I can either (1) provide the password on every skill execution or (2) hardcode the password into my script. To make this really useful there needs to be some kind of secret storage that the agent can read/write. This would also allow me as a programmer to sell the skills that I create more easily to customers.

I have no clue how you’re running your agents or what you’re building, but giving the raw password string to a the model seems dubious?

Otherwise, why not just keep the password in an .env file, and state “grab the password from the .env file” in your Postgres skill?


Yes! I was raving about Claude Skills a few days ago (vide https://quesma.com/blog/claude-skills-not-antigravity/), and excited they come to Codex as well!

We’ve made a zero shot decision tree

Are we sure that unrestricted free-form Markdown content is the best configuration format for this kind of thing? I know there is a YAML frontmatter component to this, but doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process? I would like my agents to be inherently evaluable, and free-text instructions do not lend themselves easily to systematic evaluation.

>doesn't the free-form nature of the "body" part of these configuration files lead to an inevitably unverifiable process?

The non-deterministic statistical nature of LLMs means it's inherently an "inevitably unverifiable process" to begin with, even if you pass it some type-checked, linted, skills file or prompt format.

Besides, YAML or JSON or XML or free-form text, for the LLM it's just tokens.

At best you could parse the more structured docs with external tools more easily, but that's about it, not much difference when it comes to their LLM consumption.


The modern state of the art is inherently not verifiable. Which way you give it input is really secondary to that fact. When you don't see weights or know anything else about the system, any idea of verifiability is an illusion.

Sure. Verifiability is far-fetched. But say I want to produce a statistically significant evaluation result from this – essentially testing a piece of prose. How do I go about this, short of relying on a vague LLM-as-a-judge metric? What are the parameters?

Would a structured skills file format help you evaluate the results more?

At least MCPs can be unit tested.

With Skills however, you just selectively append more text to prompt and pray.


At any HR conference you go, there are two overused words: AI and Skills.

As of this week, this also applies to Hacker News.



anyone using this in agentic workflow already? how is it?

Agent Skills let you extend Codex with task-specific capabilities. A skill packages instructions, resources, and optional scripts so Codex can perform a specific workflow reliably. You can share skills across teams or the community, and they build on the open Agent Skills standard.

Skills are available in both the Codex CLI and IDE extensions.


Thanks to Anthropic.

What are your favourite skills?

A very particular set of skills.

nunchuck skills

The only skill that matters



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: