• 0 Posts
  • 15 Comments
Joined 1 year ago
cake
Cake day: June 9th, 2023

help-circle

  • I thought about the indexing situation in contrast to the user paywall. Without thinking too much about any legal argument, it would seem that NYT having a paywall for visitors is them enforcing their right to the content signaling that it isn’t free for all use, while them allowing search indexers access is allowing the content to visible but not free on the market.

    It reminds me of the Canadian claim that Google should pay Canadian publishers for the right to index, which I tend to disagree with. I don’t think Google or Bing should owe NYT money for indexing, but I don’t think allowing indexing confers the right for commercial use beyond indexing. I highly suspect OpenAI spoofed search indexers while crawling content specifically to bypass paywall and the like.

    I think part of what the courts will have to weigh for the fair use arguments is the extent to which NYT it’s harmed by the use, the extent to which the content is transformed, and the public interest between the two.

    I find it interesting that OpenAI or Microsoft already pay AP for use of their content because it is used to ensure accurate answers are given to users. I struggle to see how the situation is different with NYT in OpenAI opinion, other than perhaps on price.

    It will be interesting to see what shakes out in the courts. I’m also interested in the proposed EU rules which recognize fair use for research and education, but less so for commercial use.

    Thanks for the reply! Have a great day!


  • The issue is that fair use is more nuanced than people think, but that the barrier to claiming fair use is higher when you are engaged in commercial activities. I’d more readily accept the fair use arguments from research institutions, companies that train and release their model weights (llama), or some other activity with a clear tie to the public benefit.

    OpenAI isn’t doing this work for the public benefit, regardless of the language of altruism they wrap it in. They, and Microsoft, and hoovering up others data to build a for profit product and make money. That’s really what it boils down to for me. And I’m fine with them making money. But pay the people whose data you’re using.

    Now, in the US there is no case law on this yet and it will take years to settle. But personally, philosophically, I don’t see how Microsoft taking NYT articles and turning them into a paid product is any different than Microsoft taking an open source projects that doesn’t allow commercial use and sneaking it into a project.


  • I do agree with you, to an extent. I think much of the support, or at least lack of criticism from within higher ed was precisely because they/we/I didn’t want to be lumped in with the right wing attacks or give them an inch. At the same time, that is like the stereotype of the abusive couple who form a united front against a third party.

    I also know that people saying that no one really cares about the research issues also isn’t true. People in higher ed care about these things. The president of Stanford resigned recently over these sorts of issues (though the data issues there were more troubling). There were also Harvard academics recording malcontent with Dr. Gay; they just didn’t go and put it in the paper.

    Ultimately, it sounds like what ultimately tipped things over for her was two fold: the latest round of accusations, coupled with submitting a plan to the board that apparently didn’t convince them all that she was responding with appropriate urgency to the widening media pr issue. Which is a very common failing in higher ed leaders who are used to going slow and resisting calls to move faster. Unfortunately, university presidents need to control the narrative by at least creating the impression of frenetic energy to fix something, even if it is intractable in the short term.

    You might find this NYT article interesting (gift link).

    How Harvard’s Board Broke Up With Claudine Gay https://www.nytimes.com/2024/01/06/business/claudine-gay-harvard-corporation-board.html?unlocked_article_code=1.ME0.srWq.9lxOxV9UwF1g&smid=nytcore-android-share

    Ultimately, I think the board and the community wanted to help her hold out against the right wing attacks, but something about her internal plan or communications and follow up led the board to wilt in the face of persuasion from those around them.




  • I work in academia and am used to these sorts of issues of primacy, attribution, intellectual honesty, etc. While there are many examples of research dishonesty or sloppiness in higher ed at large, there is also an expectation that people who take leadership positions lead by example. Faculty led institutions expect that their leaders can walk the walk. I don’t think it is unfair to expect the president of the top rated university in the world to not have engaged in this sort of sloppiness. I also think it is fair that leaders are able to “rise to the moment” commensurate with the prominence of their role. She wasn’t the president of a local community college (nothing against them, but you have different expectations).

    The politically motivated and racist attacks against Dr. Gay are abhorrent. It is only unfortunate that they ended up finding purchase in very real issues of attribution, and in a leadership failing to navigate and control the narrative around their testimony and comments.

    Dr. Gay was hired after the shortest search for a Harvard president in recent memory, and already had a slight publication record compared to past leaders. That there are multiple elements of sloppiness in her work just further errodes her ability to lead the worlds top university.

    Additionally, it is true that Harvard is currently ranked at the very bottom of the campus free speech index, with the university of Pennsylvania second to last. At least MITs lawyerly answers were somewhat backed by the history of their institution trying to balance speech. That two ousted university presidents only felt the need to go to bat for first amendment rights now, of all times, and without addressing the potential hypocrisy of the position given their universities track record, as them leading a new change of direction, was shockingly bad judgement.

    So Dr. Gay doesn’t deserve the hate and attacks that have come her way. But she failed to deliver on the promise of any president of a top, R1 university. If you can’t publish to the highest standards, and navigate the most difficult of public relations situations, you shouldn’t be in the top leadership role of these universities.






  • I’ll play devil’s advocate.

    The author is basically complaining that search results aren’t tailored to their own search habits, and for all we know they are using tools to prevent Google data collection for personalized search.

    Using the search term “YouTube downloader” and having the success criteria being the return of a fork of a command line Python tool is an insane test for the general public. How many of your family members who are looking to download a YouTube video would be helped by that result?

    I searched “YouTube downloader” and received the usual ad-ridden websites that let you download a video. Then I searched “YouTube downloader Linux” and the top result was ytdl-org on GitHub. Seems reasonable.

    I’ve seen many people complain about Google search lately. I wonder how many of them either have unrealistic expectations, never learned to use scoping keywords, or who stopped search personalization and lost benefits they didn’t know they were getting. And expecting a fork of a command line tool to be the top result for YouTube downloader is definitely unrealistic.

    Anecdotally, I’ve used more or less the same search strategy for 30 years, and it still brings up relevant results. And while I agree that seo gamification can make certain keywords harder than others to use, this article and test really wasn’t testing search scenarios the average non-technical user of these search engines would have.


  • I feel this take misses the picture a bit in terms of the strengths and weaknesses of FOSS vs commercial software. FOSS is great at building tools for common or popular problems, but starts to run into challenges in solving problems that are some combination of niche, unfun, too big, or too hard.

    For example, I needed to export thousands of scheduled jobs off of an IBM mainframe and onto a different platform as part of switching to a COTS ERP. Should I task an internal team of developers to write a one off? Would any open source solution exist? There aren’t a ton of zOS open source projects out there, in part because there just aren’t a lot of zOS systems programmers out there. They’ve all been frozen in carbonate to solve the Y3K problem, lol. No, in this case I light a pile of money on fire and pay Computer Associates for their commercial tool which generated an XML file almost a million lines long (just the jobs and scheduling parameters/dependencies). And it just worked, insofar as any ERP migration just works.

    Another factor is the time it can take for a FOSS project to mature. No one would try and say that Octave is a 1-1 replacement for Matlab. Indeed, it wasn’t until jupyter notebooks came along in Python that I felt I really had a good Matlab alternative, and even then, some less common packages don’t have a good FOSS alternative in Python. I still remember the first time getting some of the open source convex analysis packages going on Linux. It was a nightmare of dependencies and didn’t have all the capabilities of the commercial solutions, because that type of mathematical software development is really, really hard.

    Additionally, commercial software is helpful at supplying services with ongoing costs. E.g., office 365 with OneDrive would require rolling my own NextCloud with libre office or something similar to get anywhere near the functionality I get from a family Microsoft 365 account out of the box.

    I’m all for FOSS, but a tool is a tool, and sometimes commercial software fills needs that just aren’t going to realistically attract a developer community. However, my favorite client tools are usually open source and I like being able to pilfer the code for my own projects.

    Edit: I also wanted to add that I did commercial software development at one point, and I got to solve some really difficult, deep technical issues. It may sound really lame, but the work I did on search optimization for a commercial tool was really rewarding. Taking something that lots of people had to use and improving the execution time of arbitrary queries from minutes to seconds was a blast and important to the org, and something I was able to take the time to get right. I’ve also just had to bang out some CRUD code too, so of course it varies, but not every commercial code outfit it terrible to work for. And the hardware and OS on tools like Palo Alto firewalls just wouldn’t get made through open source alone.


  • To add on to your point, you publicly support allies while having private conversations counseling them on prudent courses of action. They don’t listen to you if you call them out publicly, which is usually a sign that privately articulated red lines have been crossed. I’m sure Biden is pressing them privately to have a more measured response, and is likely to have more traction than if he was publicly trashing them.

    Just like you don’t use all available sanctions out of the gate with an adversarial state, to leave room to negotiate and leave some channels open. Diplomacy is more nuanced than “saying it like it is” all the time.


  • I don’t think articulating a concern for any civilians on any side is taken poorly, and I don’t think that the majority of the media has skewed any calls for humanitarian aid and adherance to international warfare rules as anti-semitism. In fact, the new york times has published both investigative and opinion pieces that are very sympathetic to Palestinian civilians, and calling out Israeli disproportionate response.

    I think part of the problem in discussing the issue is that the events of today are inextricably woven into the events of the

    • 1948 founding of Israel by the UN at the end of the British mandate.
    • the invasion of the five armies and the 1949 armistice.
    • the six day war, and the loss of the Sinai peninsula.
    • the eventual recognition of borders by Egypt and Jordan.
    • the results of the shelling of Beirut after the Hezbollah attack in 2006.

    But that is a lot of history, but the back and forth of tragedies, including disproportionate response is driven by these events.

    When most people online seem to confuse the history of Gaza with that of the West Bank, or conflate Hamas and Hezbollah, it is no wonder that discussion breaks down.

    Unfortunately I was in a debate elsewhere on the fediverse where the other person said there is no legitimate response to the Hamas attack for Israel because Israel’s existence is the source of the problem.

    That sounds like the Hezbollah general who yesterday called this a “war of existence” in that either Israel exists or the Arab alliance exists. So how do you reason with that position, and how many people objecting to Israel’s use of force are really all that knowledgeable of the history?

    I also think that people underestimate how you reason with allies. If Biden hadn’t shown solidarity with Israel, then his visit today wouldn’t have resulted in the opening of humanitarian aid. You influence allies by showing solidarity publicly, and having frank conversations on private.

    Anyway, sorry for the long post. Have a great evening!