Astro - Hacker News

36 comments

why_at 20 minutes ago

My first impression coming away from this is skepticism.
Anything with voice controls for routine use is a pretty tough sell. Doing this when you're not completely alone would be annoying to everyone around you.
Most of their examples seem like they could have been done with a right click drop down menu so they don't really need to "re-invent the mouse pointer".
So is this thing talking to Google's servers all the time for the AI integration? So it won't work if you're not connected to the internet? Privacy concerns are obvious; now Google wants to have an AI watching literally everything you do on your computer?
Does it cost the user anything for the LLM use? If it's free will it stay free forever? That's quite a lot to give away if they're expecting people to use it to change a single word like in one of their examples. I guess they're expecting to make the money back by gathering data about literally everything you do on your computer.
There might be a killer app for AI integration with personal computers that has yet to be invented, but this doesn't look like it.
[-]
- AirMax98 a few seconds ago
  
  Right — it does seem cool but the voice is patching over a major gap. If I'm talking already, why wouldn't I just describe what I'm looking at and have the AI grab it for me?
- nolist_policy 15 minutes ago
  
  The "Edit an Image" Demo at the bottom is pretty fun. Maybe this is just Google flexing their LLM inference capacity.
juancn a minute ago

Please don't.
I like text selection exactly how it is. I want precise controls.
It's fine for a touch interface like a phone, but on a computer I expect precision. As much as I can get.
arjie 5 minutes ago

Oh interesting, this is very cool. At first I thought it was just focus-follows-mouse but it's more interesting. You have certain keywords trigger "add to prompt". Ignoring the voice functionality (which is admittedly crucial currently because other inputs currently take over focus), I've often wanted to just have a continuous conversation with the LLM as I 'point and click' (or tab over and select) at various things. Might be neat to have text input focus continue to go to the LLM where I'm typing text etc.
Sometimes I go to a different page to take a screenshot and other times I'm browsing for a file, and other times I'm highlighting some log lines. Cursor did this well, with selecting text in the terminal auto-focusing the Cursor agent textbox so you could talk to the agent and then select some text and you didn't have to re-select the original agent textbox again. The agent is a top-level function in that system not "just another app I have to switch to" to take my context with.
I have some small amount of bias because I've always felt input-constrained on computers. I have to move my hands to go places and that's exasperating. I've tried head tracking, had a vim pedal for a while, and used tiling WMs, and things like this to aid but while my vim-fu is pretty good and I function inside things very well with it, my cross-application interface isn't.
In the end, perhaps we all have our home offices with our Apple Vision Pros and we talk to them like this to maneouvre faster through our machines and get our ideas into them.
Cool research. I wonder what we'll end up with.
kjellsbells 29 minutes ago

I sense a privacy problem brewing.
It reminds me of Microsoft Recall in the sense that some portion of the screen is going to be continuously transmitted outside of the users control.
What happens when someone browses something very private (planning a surprise engagement. looking at medical data. planning a protest)? All that data gets slurped to google and subject to a warrant or discovery or building your advertising fingerprint.
Maybe the idea is that the data is sent to AI only when you right click, but that seems like a very thin firewall that a product manager will breach in the interests of delivering "predictive AI" via some kind of precomputed results.
nolist_policy 33 minutes ago

Wiggle at CAPTCHAs, wiggle at Termux, wiggle at Emacs, wiggle at the Godot Editor, wiggle at my remote desktop.
(Not going to happen)
tintor 43 minutes ago

Of course, it isn't a Google Demo, if you can't use it to book a table at restaurant. (shown at the bottom of the page)
loaderchips an hour ago

It's beautiful how the human mind can take something very obvious but overlooked and make it into this fantastic innovation. Fab stuff.
hmokiguess 11 minutes ago

Don't build these things, instead build protocols and expose system level APIs for application developers to build things.
jpatten 32 minutes ago

Reminds me of Put That There https://m.youtube.com/watch?v=RyBEUyEtxQo
jaccola 38 minutes ago

This seems like one of those things that is usable infrequently enough to be forgotten/poorly developed/never used. (Even before accounting for the actual failure rate of the LLM which will be none-zero).
Perhaps a text box and file upload isn’t the perfect interface for every use case but it is versatile which is a huge barrier to overcome.
AbuAssar an hour ago

so Google will be monitoring whatever on the screen continuously or only when the user say the magic words (this, that, here, there)?
[-]
- EdgeExplorer an hour ago
  
  Indeed. "AI-enabled pointer" is misdirection. This isn't an AI-enabled pointer; it's sending screen to AI, which yes, includes pointer position. The AI doesn't live in the pointer. The AI lives, apparently, so thoroughly in the system that it can see and do anything, and the pointer is just a way of giving it context.
- OtomotO 34 minutes ago
  
  Google Recall. Hey, it's all about the marketing.
iridione an hour ago

Interesting! I wonder how UI will evolve in the long-term? If there are browser-use/computer-use and clicky-clones automating pointer actions, do we really need complex UI anymore? If yes, when?
[-]
- Ancapistani 31 minutes ago
  
  I've been playing with writing a visionOS app that allows an AI agent to be aware of what you're looking at at any given time.
  At some point I fully expect eye tracking (or attention tracking) to be common enough to be a first-class input method.
SirFatty an hour ago

It only took Google and their AI offering to come up with Graffiti.
strgrd an hour ago

No thanks
mcookly an hour ago

I wonder what sort of monstrous power would be unleashed if Google used Plan9 as a foundation.
[-]
- bitwize 25 minutes ago
  
  They'd half-finish it then bury it, like they did with Fuchsia which is heavily Plan-9-inspired.
Joker_vD 15 minutes ago

Just seven hours ago there was a plea on HN [0] to please not do this. Seriously, what are they smoking at Google right now?
[0] https://news.ycombinator.com/item?id=48107027
pmarreck 12 minutes ago

There's already a product that does this lol
Aaaaand now I can't remember the name of it
jinkuan 37 minutes ago

being able to make precise edits would be huge for AI
mvdtnz an hour ago

Both of the text based demos would have been simpler and faster with traditional mouse and keyboard interactions. What is the AI adding?
[-]
- hyperhello 34 minutes ago
  
  They’re going to take your abilities to do anything and spread it across many places so you have to run around to do them, same as all the moneyed technology.
- dfxm12 25 minutes ago
  
  It tracks what's on the screen and sends it back to Alphabet. If you're watching a video about BBQ, enjoy a bunch of ads for Omaha steaks and big green egg in your Gmail.
  On a less serious note, the audience for this is people who want to optimize for what seems like the least amount of effort.
- slopinthebag an hour ago
  
  It feels like everything modern is like this. No value added, just the appearance of it.
simondw 29 minutes ago

Maybe I'm misunderstanding, but what is new about the pointer itself? Seems to be functionally the same as selecting + tooltips / context menus.
[-]
- kwertyoowiyop 24 minutes ago
  
  Shush, how is anyone going to get promoted with that kind of talk!?
- DaiPlusPlus 13 minutes ago
  
  > but what is new about the pointer itself?
  I'm hoping for a const-reference joke.
OtomotO 35 minutes ago

Like a dream come true...
Nightmares are dreams as well and this is a nightmare like Windows Recall.
Technically wonderful though.
LocalH 38 minutes ago

do not want
themafia an hour ago

> We’ve been exploring new AI-powered capabilities to help the pointer not only understand what it’s pointing at, but also why it matters to the user.
We couldn't quite track you well enough before. So we're fixing that under the guise of "AI powered capabilities."
brgsk 17 minutes ago

what the hell is going on at google
SirMaster 29 minutes ago

Thanks, I hate it