Recent quotes:

Fitbit creates research library with Fitabase, publishes results of corporate wellness study | MobiHealthNews

The library currently has 163 different published studies that mention using a Fitbit (or a few of them) as part of their study design. The pace of research using the wearables has been accelerating every year, Ramirez said, posing what Fitabase believed was a need for a comprehensive library. “So we wanted to make it a public resource where anyone who wants to explore Fitbit research can have a one-stop shop. It’s meant to be a library down the street, and it will continue to grow as people do more research.”

Instagram photos reveal predictive markers of depression

Using Instagram data from 166 individuals, we applied machine learning tools to successfully identify markers of depression. Statistical features were computationally extracted from 43,950 participant Instagram photos, using color analysis, metadata components, and algorithmic face detection. Resulting models outperformed general practitioners' average diagnostic success rate for depression. These results held even when the analysis was restricted to posts made before depressed individuals were first diagnosed. Photos posted by depressed individuals were more likely to be bluer, grayer, and darker. Human ratings of photo attributes (happy, sad, etc.) were weaker predictors of depression, and were uncorrelated with computationally-generated features. These findings suggest new avenues for early screening and detection of mental illness.

How Vector Space Mathematics Reveals the Hidden Sexism in Language

The team does this by searching the vector space for word pairs that produce a similar vector to “she: he.” This reveals a huge list of gender analogies. For example, she;he::midwife:doctor; sewing:carpentry; registered_nurse:physician; whore:coward; hairdresser:barber; nude:shirtless; boobs:ass; giggling:grinning; nanny:chauffeur, and so on. The question they want to answer is whether these analogies are appropriate or inappropriate. So they use Amazon’s Mechanical Turk to ask. They showed each analogy to 10 turkers and asked them whether the analogy was biased or not. They consider the analogy biased if more than half of the turkers thought it was biased.

New Depression Model Outperforms Psychiatrists

Data mined from clinical trials may soon help doctors tailor antidepressant therapy to their patients, the authors say. Currently, only about 30% of patients get relief from the first drug they are prescribed, and it can often take a year or more before doctors find the right medication to alleviate symptoms of depression. The Yale team analyzed data from a large clinical trial on depression and pinpointed 25 questions that best predicted the patients’ response to a particular antidepressant. Using these questions, they developed a mathematical model to predict whether a patient will respond to Celexa after three months of treatment. “These are questions any patient can fill out in 5 or 10 minutes, on any laptop or smartphone, and get a prediction immediately,” explained Adam Chekroud, Ph.D. candidate in the Human Neuroscience Lab and lead author of the paper.

@elerianm says too much data dilutes the experience

Multiple metrics can confuse rather than enlighten; and they can add to a sense of underachievement. My Fitbit, even though it’s the most basic model, goes beyond measuring steps and miles. It also claims to be able to tell me how many calories I have burned and the number of “active minutes” in the day -- and it sets a daily target for each. I have no idea how I am supposed to internalize all these data points, including their order of importance. So I find myself pursuing multiple objectives that are highly correlated but, frustratingly, are not sufficiently linear in their relationship -- adding to the potential for performance anxiety.

Apple to Spend $1.9 Billion Building Two Europe Data Centers

Apple Inc. plans to spend 1.7 billion euros ($1.9 billion) building data centers in Ireland and Denmark in its biggest-ever European investment[…] The centers, located in Athenry, Ireland, and Viborg, Denmark, will be powered by renewable energy[…] The project lets Apple address European requests for data to be stored closer to local users and authorities, while also allowing it to benefit from a chilly climate that helps save on equipment-cooling costs.

Edge cases are expensive to solve

The old saying in the machine learning community is that “machine learning is really good at partially solving just about any problem.” For most problems, it’s relatively easy to build a model that is accurate 80–90% of the time. After that, the returns on time, money, brainpower, data etc. rapidly diminish. As a rule of thumb, you’ll spend a few months getting to 80% and something between a few years and eternity getting the last 20%. (Incidentally, this is why when you see partial demos like Watson and self-driving cars, the demo itself doesn’t tell you much — what you need to see is how they handle the 10–20% of “edge cases” — the dog jumping out in front of the car in unusual lighting conditions, etc).

Analytics moves from last touch to holistic

They were doing that analysis for some time actually with a method called “last touch.” That means identifying the last thing that the customer did before they bought—as in they clicked an ad and then they bought whatever. The company figured it must’ve been that ad that caused the customer to buy the print. Or someone got a direct mail campaign message and then they bought the calendar. That was the motivation. [Shutterfly] looked at that process and said, “You know, that’s a good model. It’s a good approximation, but it would be better to look at everything touching the user before their last purchase and since the purchase before that.” This greatly expanded the data that they had to consider to do the analysis, so the process became very slow. It took two days to compute the likely marketing channels for all their orders.

The Internet Archive tries to remember

Right now, the archive holds around 20 petabytes of data, including 500,000 pieces of software, more than 2 million books, 3 million hours of TV, and 430 billion web pages. In a single day, they digitize more than 1,000 books. They capture TV 24 hours a day. In a week, they save more than 1 billion URLs. As of 2013, only 8 percent of the archive was uploaded by users, some 53,000 people who have accounts with the archive. In order to continue the work of creating “universal access to all knowledge,” as is the archive’s mission, they want to get as many people working on the project as possible.

Biggest data

In a collective farm, a pig gave birth to three piglets. The Party committee was convened and decided that to report about only three piglets would make a bad impression in the district Party committee. So, they reported that five piglets were born in the farm. The district Party committee reported to the Region Party committee that seven piglets were born in the collective farm. In their report to the Ministry of Agriculture, the Region Party committee advised that the socialist obligation to increase the number of pigs by twelve, has been successfully fulfilled. To please comrade Brezhnev, the Ministry reported that twenty piglets were born, ahead of the planned date. "Very good," comrade Brezhnev said. "Three piglets you'll give to the workers of Leningrad. Three you'll give to the heroic city of Moscow. Five you'll put aside for exports. Five you'll send to the starving African children. The rest you store as a strategic food reserve. Nobody shall touch it!"

The Milky Way's location

Back in 2010, I signed up for the email lists of 70 advocacy groups.  I collected over 2100 emails from them over a six-month period, and hand-coded each of them.  I also watched Rachel Maddow and Keith Olbermann every night and recorded the topics of the two shows.  The data analysis was tedious and left me with a wicked caffeine addiction.  But it also left me with an unmatched understanding of e-mail membership activation strategies. So that’s why I hand-code all my own data.  Call me the crotchety old guy of the “big data” age.  While everyone else is learning hadoop and python, I’m still futzing around with Excel.  But there’s a method to the madness.  It’s thought-work, which leads to insights, which improve my other methods.  Coding my own data gives me a feel for the research topic.
We call these new generation of companies as “Mediata Companies.” Chalk us up as one of them, as we attempt to answer some of these questions above in building out Skift. Other startups like Mattermark, Indicate, Pricenomics, SuperData Research, and even Nate Silver's 538 are trying subtle variations of the mediata theme, some more scalable than the others.All of us are focused on competitive intelligence, one way or the other. All of us look back to the original media-meets-data inspiration, Bloomberg, built on proprietary closed systems for a very targeted set.But the new generation of mediata companies are looking beyond the traditional business media companies in their set, and instead looking at the best of consumer web and mobile product-driven companies, creating mashups in the best sense of that phrase: hybrid, curated, lean, API-hungry, open and multi-faceted, built at the intersection of design and user-experience.
Over the past several years, Purdue University has been experimenting with a data-driven solution way to find kids who are at risk for dropping out, or who--in a critical mass--might indicate which classes or majors have inadequate instructors. Administrators call it a “student success algorithm,” but it’s official name is Course Signals--and if it works, it could change the way modern universities are run. Incorporating data-mining and analysis tools, Course Signals not only predicts how well students are likely to do in a particular class, but can also detect early warning signals for those who are struggling, enabling an intervention before problems reach a critical point.
Companies are using big data sources like TalentBin and tools like LinkedIn Recruiter to go out and find passives, learn about them, and use contact points like a shared connection on LinkedIn, a colleague who went to school with a candidate, or data on their interests to make a more engaging approach. In fact, the data and services LinkedIn sells to recruiters is one of its biggest businesses, accounting for 57% of its revenue and growing 80% year-over-year.
Editd has 22 employees at its office in the Silicon Roundabout, an area of East London now known as a hub for tech innovation. Each work day Editd’s software gathers online information for a huge variety of garments and accessories and amasses 300,000 comments from social media ranging from what’s on store racks to indications about how long the passion for leopard print will last. The information is transformed into data, compiled and repackaged into analyses that illustrate competitors’ product assortments, pricing, consumer mood and emerging trends for clients that include Asos, Gap and Target. (Editd’s fees begin at $2,500 a month for a small retailer in a single market, but rise sharply for larger clients who want more complex services.)
Library e-book circulation data is a source of potentially priceless, actionable business intelligence for the publishers, if they can stop focusing on gouging libraries on price and focus on cooperating with them instead. Libraries could provide publishers with daily circulation figures, broken down by city, for every book, along with correlations between books (‘‘this book was checked out with that book’’). Provided the data is sufficiently aggregated, it would not pose a risk to individual patron privacy. This has to be managed carefully, of course, but if there’s one group that can be relied upon to treat this issue with the care it is due, it’s librarians.
Gus Hunt, the chief technology officer of the CIA, said as much earlier this year. "The value of any piece of information is only known when you can connect it with something else that arrives at a future point in time,” he said at a Big Data conference. Thus, “since you can't connect dots you don't have … we fundamentally try to collect everything and hang on to it forever." The end of theory, which Chris Anderson predicted in Wired a few years ago, has reached the intelligence community: Just like Google doesn't need to know why some sites get more links from other sites—securing a better place on its search results as a result—the spies do not need to know why some people behave like terrorists. Acting like a terrorist is good enough.