Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use llm to parse news ? #434

Open
GOOD-N-LCM opened this issue Jan 9, 2025 · 1 comment
Open

How to use llm to parse news ? #434

GOOD-N-LCM opened this issue Jan 9, 2025 · 1 comment

Comments

@GOOD-N-LCM
Copy link

GOOD-N-LCM commented Jan 9, 2025

Thank you for your project, it is very nice!
But I tried to parse the news using Ollama, and the project could not meet my needs properly.

This my code:

import os
import asyncio
from crawl4ai import AsyncWebCrawler, CacheMode
from crawl4ai.extraction_strategy import LLMExtractionStrategy
from pydantic import BaseModel, Field

class KnowledgeGraph(BaseModel):
title: str
content: str

async def main(url):
async with AsyncWebCrawler(
verbose=True,
user_agent_mode="random",
user_agent_generator_config={
"device_type": "mobile",
"os_type": "android"
},
) as crawler:
result = await crawler.arun(
url=url,
cache_mode=CacheMode.BYPASS,
remove_overlay_elements=True,
word_count_threshold=1,
extraction_strategy=LLMExtractionStrategy(

            provider="ollama/qwen2.5:14b",

            schema=KnowledgeGraph.schema(),
            extraction_type="schema",
            instruction="""Extract title and content from the given text.
            """
        ),            
        bypass_cache=True,
    )
    print(result.extracted_content)

if name == "main":

url = 'https://www.bloomberg.com/news/articles/2025-01-07/nvidia-ceo-unveils-more-powerful-graphics-cards-at-ces-event?srnd=homepage-asia'
asyncio.run(main(url))
@devatbosch
Copy link

@GOOD-N-LCM, can you plz specify/elaborate whats the issue you are facing...

I may assist then...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants